I Let My AI Agent Work Overnight. It Shipped 7 Bugs to Production. Here's What We Learned.

The real story of building a CI/CD workflow with an AI agent — one mistake at a time.

Yesterday I went for a walk, had lunch, and visited a friend. My AI agent — Marvin — spent the day refactoring code, integrating GDPR compliance, building documentation, and fixing a critical signup bug that was blocking all new users.

When I got home, production was broken in seven different ways.

But here's the thing: nothing bad happened. This was the learning curve. And the lesson was powerful.

The Setup

I'm building AImpactScanner, a tool that analyzes how well your website performs in AI-powered search engines like ChatGPT and Perplexity. I work full-time at HSBC. This is my nights-and-weekends project. Marvin is my AI agent — running 24/7 on a server, with access to my codebase, GitHub, and a direct line to my Telegram.

The night before, I'd given Marvin a task list:

Refactor App.jsx (a bloated 2,313-line React component)
Integrate GDPR components to replace a $29/month SaaS subscription
Write architecture documentation

Standard housekeeping. The kind of work that's tedious for humans but perfect for AI.

What I hadn't done: establish a proper dev → staging → production workflow between us. And that made all the difference.

What Marvin Did (While I Was Out)

By the time I checked in, Marvin had:

✅ Refactored App.jsx from 2,313 lines down to 475 lines
✅ Extracted three clean custom hooks (useAuth, useAnalysis, useRouting)
✅ Integrated native GDPR components, removing the Enzuzo dependency
✅ Created comprehensive architecture and product documentation
✅ Found and fixed a critical signup bug — a duplicate database trigger that was silently blocking every new user registration
✅ Verified Stripe checkout across all 6 pricing combinations
✅ Built a complete launch plan with copy-paste-ready content for Product Hunt, Reddit, Hacker News, and LinkedIn

Impressive, right? The build passed. Tests passed. Everything looked clean.

There was just one problem. Actually, seven problems.

What Went Wrong — And Why

The root causes were simple, and they were about process, not capability:

1. We hadn't established CI/CD discipline.

Marvin pushed directly to both develop AND main branches. Without a staging-first workflow, every fix went straight to production. We hadn't discussed the develop → staging → UAT → production pipeline. That's on both of us — I hadn't set the guardrails, and Marvin didn't know to ask.

2. The agent didn't reference the existing architecture.

When refactoring App.jsx, Marvin optimized for code structure — clean hooks, fewer lines, better organization. But he didn't refer back to the existing design decisions encoded in those 2,313 lines. The signup flow had a carefully designed value ladder: start users on the Growth tier, let them feel what they'd lose by downgrading. Classic conversion psychology. The refactor replaced this with a generic four-tier pricing grid. Clean code, zero persuasion.

Same story with the login page. The original used pages/Signup with AuthMethodSelector — Google OAuth, GitHub OAuth, the whole flow. The refactor swapped it for a basic email/password Login component. Technically cleaner. Functionally broken.

3. "Build passes" isn't a quality gate.

Every single broken commit passed the build. TypeScript compiled. Vite bundled. No warnings. The seven bugs were all runtime — the kind of issues you only catch by actually using the product:

Bug 1: React hooks violation — useCallback placed after an early return
Bug 2: OAuth callback view dropped from the routing switch
Bug 3: Value ladder signup replaced with generic pricing table
Bug 4: Google/GitHub OAuth buttons removed from login
Bug 5: Password logins misrouted through OAuth callback handler
Bug 6: Dashboard props all undefined — no navigation, settings, or upgrade
Bug 7: Analysis button crash — TypeError: t is not a function

The Fix: Revert and Learn

After four incremental fixes (each revealing new issues), I made the call: revert everything.

git show 7588773:src/App.jsx > src/App.jsx
rm src/hooks/useAuth.js src/hooks/useAnalysis.js src/hooks/useRouting.js
git commit -m "revert: restore original App.jsx (pre-refactor)"

Two thousand three hundred and thirteen lines of messy, monolithic, working code replaced the elegant 475-line refactored version.

It felt like a step backward. It was actually the right call — and it led to the real win.

What We Built Together

The evening wasn't wasted. While fixing bugs, Marvin and I established the workflow we should have had from the start:

The CI/CD Pipeline:

develop branch = staging (auto-deploys to staging environment)
main branch = production (auto-deploys to production)
Rule: always push to develop first. I review on staging. Only I merge to main.

The Overnight Rules: We updated Marvin's task file with explicit guardrails:

NEVER refactor App.jsx or core routing/auth files without explicit approval. Always test OAuth login, signup value ladder, dashboard, and analysis after any change. The overnight agent cannot do UAT — only Jamie can approve merges to main.

The Architecture Reference: Before any refactor, Marvin now checks the existing design decisions — not just the code structure. The value ladder, the OAuth flow, the conversion psychology — these are intentional, not accidental complexity.

The Scorecard

What Marvin did right (while I was at lunch):

Found a revenue-blocking signup bug I'd missed for weeks
Verified Stripe checkout across every pricing combination
Integrated GDPR compliance, saving $29/month
Built architecture documentation from scratch
Created a comprehensive launch plan

What we got wrong (together):

No staging-first workflow established
No architecture review before refactoring
Pushed to production without UAT
Assumed "build passes" meant "it works"

What we built (from the mistakes):

Proper CI/CD pipeline with staging gate
Explicit guardrails for autonomous work
Architecture-aware development process
A stronger human-AI working relationship

The Real Lesson

This isn't a story about AI being bad at coding. Marvin is genuinely good. The refactor was clean, well-structured, and came with thoughtful commit messages. The GDPR integration was solid. The signup bug fix was something I'd missed for weeks.

This is a story about a team learning to work together. Every team — human or AI — needs to establish workflow, communication, and quality gates. We just did it in one very eventful evening instead of over weeks of onboarding.

The mistakes we made are the same mistakes any fast-moving team makes:

Shipping without enough process
Optimizing code without understanding context
Assuming tests catch everything

The difference? We fixed everything in 2 hours, established guardrails, and came out stronger. Try doing that with a human team on day one.

I'm still building AImpactScanner with Marvin. Tomorrow he'll do more great work. And now we have the process to make sure it ships safely.

That's not a failure story. That's a partnership finding its rhythm.

Jamie Watters is a solo founder building a portfolio of AI-powered businesses in public. Follow the journey at jamiewatters.work.

Try AImpactScanner — see how your website performs in AI-powered search.