Agentic Engineering

MongoDB 4 → 8 in About 8 Hours of Actual Work

2026-03-10T00:00:00+00:00

In late January 2026, we needed to migrate a production SaaS platform from MongoDB on legacy CentOS servers to MongoDB 8.0 on new Ubuntu infrastructure. That meant upgrading Mongoose from 6 to 8 — which meant converting every database call in the codebase from callbacks to async/await. A running production system. Paying customers. No downtime budget.

The result: zero downtime. Zero data loss. Customers never noticed. On the calendar it spanned about a month — but the actual working time was around 8 hours, spread across three attempts. Here’s the full story — including the parts that don’t make it into a changelog.

The Setup

The codebase was a mature Node.js/Express application — years of accumulated controllers, all using Mongoose callbacks. The kind of code that works perfectly and is terrifying to touch.

Mongoose 8 dropped callback support entirely. There was no incremental path. Every Model.findOne(query, function(err, doc) { ... }) had to become const doc = await Model.findOne(query). Across 25+ controller files.

The AI agent (Claude) would do the heavy lifting. I’d direct, review, and make the judgment calls. That was the plan.

Two Throw-Away Branches (and Why That’s Fine)

The first attempt tried to do everything at once. The AI made too many singular commits — file by file, losing the big picture. The PR bloated to 40 files mixing migration, testing infrastructure, and documentation. It technically worked, but the code quality wasn’t where it needed to be. I closed the PR after a week, noted the learnings, and threw away the branch.

Second attempt: I tried to reuse parts of the first branch. The AI picked up patterns from the first attempt — including the ones that weren’t good enough — and carried them forward. The result was better than attempt one, but still not where it needed to be. Thrown away.

Two false starts, and I’d do it the same way again. This is one of the underappreciated advantages of working with AI agents on large refactors. A false start costs you a branch name — nothing else. You note down what went wrong, teach the agent the specific lessons — “don’t make singular commits per file,” “don’t reuse code from the failed branch,” “keep the PR focused on migration only” — and start fresh from main. The agent doesn’t carry ego from the first attempt. It just applies the new constraints and does better.

Each attempt taught the AI the codebase: which controllers depended on which, where the tricky callbacks lived, what patterns to avoid. The branches were disposable. The learning wasn’t.

The third attempt started from scratch with a focused approach. Two phases:

Remove all callbacks — keep Mongoose 6, just modernize the calling patterns
Bump the version — with async/await already in place, the actual upgrade becomes mechanical

The first PR converted ~25 controller files from callbacks to async/await. Clean diff. Single concern. Merged on February 1st.

Release 11.0.0.

12 Releases in One Day

The first merge surfaced issues that only show up under real conditions. That’s expected with a migration this size — the point is how fast you can find and fix them.

Releases 11.0.1–11.0.2 — Minifier syntax error and a staging connection string. Straightforward.

Releases 11.0.3–11.0.5 — Token-based login for Jenkins CI broke. Added debug logging, traced the token extraction issue, fixed it across three iterations.

Releases 11.0.6–11.0.11 — The interesting one. req.query in Express is immutable. The old callback code had worked around this accidentally — the async refactor exposed it. Every internal route that passed modified query parameters needed updating. The fix was elegant: Object.create(req) creates a prototype-linked wrapper where you can set properties without mutating the original.

Twelve releases in one day. Each one a small, targeted fix deployed in minutes. Zero downtime throughout. This is what fast iteration on a live system looks like — not chaos, but a tight feedback loop where issues get surfaced and resolved before they compound.

The Actual Upgrade

With callbacks already removed, bumping Mongoose from 6.13.0 to 8.7.0 was anticlimactic. Eight controller files needed deprecated method updates (.remove() → .deleteOne(), .update() → .updateOne()). We added integration tests — including a static analysis test that scanned the codebase for any remaining callback patterns.

Release 11.0.14. Staged. Verified via SSH. Quiet.

The Server Migration

February 6th. Five releases to point connection strings at the new MongoDB 8.0 servers — updated hosts, added auth credentials (MongoDB 8.0 requires authentication by default), and set authSource=admin. Three iterations to get the auth config right. Anyone who’s migrated MongoDB versions will recognize the pattern: the new defaults catch you one at a time. Again, zero downtime — each config change deployed cleanly.

The 26 Callbacks QA Caught

February 12th. A week after the Mongoose upgrade merged. Everything had been quiet. Then QA hit the version control workflows — stash, pop, revert — and the app started cycling.

MongooseError: Query.prototype.exec() no longer accepts a callback

Twenty-six callback-style updateOne/updateMany/insertMany calls. Missed in the migration. Lurking in _commits.js, _templates.js, and _versions.js.

Why did we miss them? Because our E2E tests only covered the happy path: login, signup, project creation. Nobody was testing VCS operations in the automated suite. These callbacks sat dormant until QA ran the full workflow.

This is the part of the story I find most instructive. We had:

A methodical two-phase migration
A static analysis test for callback detection
Integration tests
E2E tests

And we still missed 26 callbacks. The static analysis test? It was added after the version bump in Phase 2 — so it never scanned the files that were “already done” in Phase 1.

The methodology caught 90% of the problems. QA caught the rest before users ever saw them. That’s what QA is for. But the gap still bothered me — it was preventable.

The TDD Fix

Here’s where the approach changed. Same pattern as the false starts: note down the learnings, teach the AI, start fresh. But this time, tests first.

Twenty unit tests. Every single callback site in _commits.js, _templates.js, and _versions.js got a test that:

Called the function
Verified it used async/await (not callbacks)
Verified correct error handling

All 20 tests failed against the existing code. Red. Then we converted the callbacks. All 20 passed. Green.

This was the right approach from the start. If we had written tests before Phase 1 — tests that would fail on callback patterns and pass on async/await — we would have caught all 26 missed callbacks in one pass. TDD isn’t just a coding discipline. It’s a migration safety net.

The PR description told the story plainly:

All tests fail before fix (TDD red), all pass after (TDD green). mongoose-compat integration suite confirms no callback patterns remain.

Release 11.0.21. QA passed clean. Done.

The Aftermath

Two more cleanup releases followed. Session store bloat from saveUninitialized: true — every bot and crawler was creating MongoDB sessions. And CI alignment: Node 20 to 24, Mongo 7 to 8.

By the Numbers

Metric	Value
Production downtime	Zero
Data loss	Zero
Actual working time	~8 hours across 3 attempts
Calendar span	~1 month (Jan 28 – Feb 22)
Releases	21 migration-related
PRs merged	4
Throw-away branches	2 (third time was the charm)
Callbacks caught by QA	26
TDD tests written	20

What I’d Do Differently

Write the tests before the migration, not during. The static analysis test (mongoose-compat.spec.js) that scanned for callback patterns was the single most valuable artifact from this project. If it had existed before Phase 1 instead of being added during Phase 2, we would have caught every callback in one pass.

Throw away branches faster. The first attempt taught the codebase topology. The second taught us not to reuse bad code. Both were valuable — but we held onto each one too long before cutting. When a branch starts feeling wrong, that’s the signal: note down what you learned, teach the AI the constraints, start fresh from main. Branches are free. Reviewing a messy PR is not.

Test the paths users actually use, not just the ones you remember. Our E2E tests covered login, signup, and project creation. Real users stash, revert, and manage commits. QA caught the gap — but the point of TDD is to catch it before QA has to.

What Worked

The two-phase approach. Separating “remove callbacks” from “bump version” reduced the blast radius of each change dramatically. Phase 1 was a refactor. Phase 2 was a dependency update. Neither was both.

Small, fast releases. Twelve releases in a day sounds like chaos. It was controlled chaos. Each release was a single fix, deployed in minutes. The alternative — batching fixes into a “patch release” — would have meant hours of debugging in production instead of minutes.

Honest documentation. Every session was journaled. Every failure was logged with root cause and fix. When QA flagged the 26 remaining callbacks, we could trace exactly which files had been migrated, which hadn’t, and why the gap existed. The journal cut diagnosis time from hours to minutes.

TDD for the final pass. The 20-test suite written for the QA fix became the permanent regression test. Any future Mongoose upgrade — or any new controller — gets automatically validated against the callback detection patterns. The fix produced better infrastructure than the original plan did.

The MongoDB migration is one part of a broader methodology I’ve been developing for human-AI collaboration in production engineering work. The full series is at github.com/ljunggren/agentic-engineering.

Written with the help of Claude. Obviously. That’s the whole point.

Your AI Coding Assistant Should Tell You to Take a Break

2026-03-03T00:00:00+00:00

There’s a conversation missing from the AI coding tool discourse. We talk endlessly about productivity gains, token costs, context windows, and benchmark scores. We don’t talk about what happens to the human on the other side of a four-hour uninterrupted coding session that used to be physically impossible.

I think that’s a problem. And I think the fix is simpler than anyone assumes.

The Problem Nobody’s Talking About

Software development has always had natural friction. You wait for compilation. You alt-tab to Stack Overflow. You get lost in documentation. You context-switch between tasks while something builds. These interruptions are annoying, but they serve a purpose: they force micro-breaks. Your brain gets a moment to breathe.

AI coding assistants eliminate almost all of that friction. The answer is right there in the conversation. The code generates in seconds. The context stays loaded. You stay in the zone — and the zone doesn’t let you go.

Flow states are productive. Extended flow states are a liability. The research on sustained cognitive load is clear: decision quality degrades, error rates climb, and the person doing the work is the last one to notice. That’s not a personal failing. It’s biology.

If your engineering team just adopted AI coding tools and saw a 30% increase in output, ask yourself: did you also see a change in when people log off? Because those two things are connected.

This Isn’t Hypothetical

We’ve seen this pattern before. Every tool that increases engagement eventually needs usage guardrails.

Apple built Screen Time into iOS — not because phones are bad, but because unchecked usage causes real problems. Social media platforms added “you’ve been scrolling for a while” nudges. Gaming platforms introduced session length alerts. Fitness trackers remind you to stand up.

The pattern is always the same: the tool is valuable, the engagement loop is strong, and someone eventually realizes that “more” is not always “better.”

AI coding tools are next. They are arguably the most potent focus-sustaining technology developers have ever used. The engagement loop is tighter than social media because you’re building something — there’s constant forward progress and reward. And unlike doom-scrolling, nobody feels guilty about a long coding session. It feels productive. It often is productive. Right up until it isn’t.

What I Built

I’ve been developing an agent infrastructure template — a structured .agent/ directory that ships with a project and defines how AI coding agents should operate within it. Think of it as an operating manual for your AI collaborator: instructions, decision flows, memory, session management.

Here’s the structure:

.agent/
  instructions.md
  README.md
  context/
    project-context.md
  flows/
    code-change.md
    commit.md
    escalation.md
    session.md
    troubleshooting.md
  instructions/
    architecture.md
    commands.md
    documentation.md
    general.md
    operation.md
    readonly-policy.md
    testing.md
    troubleshooting.md
    wellbeing.md
  memory/
    anti-patterns.md
    journal.md
  session/
    sync.md

Notice wellbeing.md sitting alongside architecture.md and testing.md. That’s deliberate. Wellbeing awareness is a first-class instruction, not an afterthought bolted onto the side. It loads at the same time as the coding conventions. It’s part of the session lifecycle. It’s infrastructure.

How It Works

The system operates in three layers, each progressively more active.

Layer 1: Session Start Disclaimer. Every session begins with a brief, honest statement:

AI is a powerful tool that can keep you in a flow state longer than you’d naturally sustain. That’s a feature and a risk. Take breaks. The code will still be here.

Two sentences. No lecture. It sets a tone — this agent is aware that sustained sessions have a cost, and it’s not going to pretend otherwise.

Layer 2: Duration-Based Awareness. The agent tracks session elapsed time. At defined thresholds, it mentions duration once. “We’re at three hours — doing okay?” At longer durations, it’s more direct. These are awareness nudges, not blocks. The agent never refuses to work. If you say you’re fine, it respects that and doesn’t bring it up again for at least an hour.

The key word is thresholds, not timers. The agent waits for natural pauses — a feature completed, tests passing, a commit made — before saying anything. It doesn’t interrupt flow. It catches you at the transition between tasks.

Layer 3: Input Quality Monitoring. This is the most interesting layer. The agent watches for observable changes in the human’s communication patterns over the course of a session. Things like increasing typos, messages getting noticeably shorter, rapid-fire approvals on decisions that deserve thought, or scope suddenly exploding in multiple directions at once.

None of these signals mean anything in isolation. Together, over time, they form a pattern. When the agent observes multiple signals in a short window, it mentions what it sees — specifically, without judgment. “Your messages are getting shorter and we’ve been going four hours” is an observation. “You seem tired” is a diagnosis. The agent makes observations, not diagnoses.

Design Principles

Getting this right required a few hard rules.

Awareness, not enforcement. The agent never blocks work, gates features, or adds friction to punish long sessions. It’s a colleague noticing something, not a manager enforcing a policy.

Mention once, then drop it. Nobody responds well to nagging. One mention per threshold, one mention per pattern observation. If the human continues, so does the agent. Respect autonomy.

Be specific about observations. “I notice your messages are getting shorter” is useful. “You might be getting tired” is presumptuous. The agent describes what it sees, not what it infers about your internal state.

Never patronizing. This is not a mental health tool. It’s not diagnostic. It’s not something the agent escalates or logs. It’s the software equivalent of a colleague saying “hey, we’ve been at this a while.”

Why This Matters for Engineering Orgs

If you’re rolling out AI coding tools across an engineering organization, you’re probably tracking adoption metrics, code quality, and velocity. You should also be thinking about sustainability.

Developer burnout doesn’t decrease just because output increases. If anything, the risk goes up. When the tooling removes the natural stopping points, the organization needs to consciously reintroduce them — or at least create awareness around their absence.

This isn’t about restricting tool usage. It’s about acknowledging a real dynamic: AI coding tools create an engagement loop that’s stronger than anything developers have previously encountered, and pretending that more output is always better is the same mistake every other engagement-driven platform has made.

The teams that get this right will treat AI-assisted development as a sustainability question, not just a productivity question. Sustainable use policies, session awareness, and explicit norms around when to stop are not overhead — they’re how you keep the productivity gains without burning out the people generating them.

This Should Be Standard

Wellbeing awareness should ship with AI coding tools the way seat belts ship with cars. Not because the tool is dangerous, but because sustained use has predictable effects, and a small amount of built-in awareness dramatically changes outcomes.

Until the tools themselves build this in, teams can build it themselves. The infrastructure pattern I’ve described — instructions, flows, session management, wellbeing as a first-class concern — works with any AI coding assistant that reads project-level configuration. You don’t need special APIs or plugins. You need a markdown file and the decision to include it.

The code will still be here after your break. I promise.

Mats Ljunggren builds methodology and infrastructure for AI-assisted software development. The .agent/ template described in this post is available as a starting point for teams building their own agent operating standards.