Agent Builders Are Changing How I Ship Code — Here's My Actual Workflow
Six months ago I was maintaining a 7-package monorepo alone — Express.js API, Next.js frontend, SLED admin panel, 3 SmartSync microservices, CI/CD across 24+ GitHub Actions workflows. The kind of system that normally needs a team.
I'm still the sole engineer. But the way I work has changed completely.
The Shift: From AI Assistant to AI Agent Team
Using an LLM as an autocomplete tool is table stakes at this point. What's different now is treating AI as a team of specialists — each agent scoped to a domain, pre-loaded with context about your architecture, and capable of completing multi-step tasks autonomously.
Claude Code makes this concrete. Instead of a generic chat interface, you define agents and skills as markdown files that Claude loads when invoked. The agent knows your codebase conventions, your file structure, your patterns — before you type a single word.
What I Actually Built
For GovChime, I have a set of custom agents covering the domains I work in most:
dut_ai-generation-expert — knows the full LLM generation pipeline: BullMQ queue, SSE events, Zod schemas, all 7 generation stages. When I'm debugging a generation issue, this agent has full context without me re-explaining the architecture.
dut_devops-orchestrator — understands our GitHub Actions pipeline, Komodo deployment setup, and Railway services. I describe a deployment change; it writes the workflow YAML and explains the rollout impact.
dut_design-review — runs automated design reviews using Playwright, checking responsive behaviour, contrast ratios, and component consistency against our style guide.
Custom skills like /create-project and /create-blog-post — slash commands that follow a defined multi-step process: gather context, ask clarifying questions, generate output, verify the build passes.
The TDD Loop That Makes It Work
Raw AI-generated code is only as good as the validation layer you put around it. My workflow is deliberately structured:
1. Write the test spec first — before asking Claude to implement anything, I write what the correct behaviour looks like. This is non-negotiable. 2. Implement with the agent — the agent writes code against the spec, referencing the existing codebase patterns via MCP integrations (filesystem, database, GitHub). 3. Run tests, iterate — if tests fail, feed the output back into the agent. Most issues resolve in 1-2 iterations. 4. Review the diff — I read every diff before it merges. The agent isn't autonomous in production; I'm still the engineer making the final call.
This loop consistently produces code I'm comfortable shipping. The agent handles the boilerplate and pattern matching; I handle the architectural judgment.
MCP: The Part Most Developers Skip
Model Context Protocol lets Claude agents connect to live systems — not just static files. At GovChime, I use MCP integrations so agents can query the actual database schema, read live GitHub Actions status, and inspect the running Railway deployment.
The difference between an agent that reads a schema file and one that can run SELECT * FROM information_schema.tables is significant. The live connection means the agent's suggestions are grounded in current reality, not a cached snapshot from when you last updated your docs.
What This Looks Like in Practice
A recent example: I needed to add a new materialized view to ClickHouse for a dashboard feature, update the TypeScript types, add the API endpoint, and wire it to the frontend. That's work that would have taken half a day of context switching.
With the devops + backend agents and TDD loop: I wrote the test for the expected API response, described the feature to the agent, reviewed 3 iterations of generated code, and shipped in under 2 hours — including tests passing in CI.
Key Takeaways
- Agents work because of context, not capability. A generic AI assistant is useful. An agent pre-loaded with your architecture, conventions, and domain knowledge is a force multiplier.
- TDD is what makes AI-generated code trustworthy. Without tests you wrote first, you can't verify the agent understood the requirement correctly.
- MCP integrations close the feedback loop. Agents connected to live systems give better suggestions than ones reasoning from static documentation.
- You're still the engineer. The goal isn't to remove judgment from the process — it's to eliminate the parts that don't require judgment, so you can apply it where it matters.
PostgreSQL + ClickHouse: The Dual-Database Pattern That Made 70M-Row Dashboards Instant
NextDay Detail UX Overhaul: PDF Export, Shopping Lists, and Killing Duplicate Components