How I Code: Summer 2026 Edition

This weekend I drove from Jacksonville, FL to Charlotte, NC. Before I started the journey I stopped by the excellent Southern Grounds for breakfast.

While sipping coffee, I worked in a Claude Code session discussing planning items for the day. The items included two major updates: (1) a refactor of the way scheduled events work in a major personal project I'm working on, and (2) creating and implementing an MCP server for blogs that lets an agent gain context on posts I've published here.

Claude wrote no code, only PRDs. I wrote no code nor PRDs, and only served as editor and guide, which made sipping coffee while doing this work all the more possible.

I got on the road, and during the 6-hour drive, had an excellent conversation with my friend and fellow developer Chuck Imperato about where we are with agentic development, AI, and the whole software development industry.

By the time I arrived in Charlotte, 10 PRs had been opened, reviewed, vetted for design consistency, cross-checked for code quality, and merged.

No human wrote the implementation code for those PRs.

One note before we go further: this is how I build my own personal projects, on my own machines and my own accounts. It isn't my employer's workflow, policy, or practice, and nothing here touches any company's code, data, or systems.

What? This isn't possible.

It is. The launch of Claude Code last year ushered in a new era in development. You're no longer the coder - you're the operator. You manage processes around writing the code, but your chosen agentic development partner handles the execution.

If you're still coding like it's 2025, or even worse - like it's 2024 - my friend, I have so many great things to show you.

But first, you need to strap in, abandon your ego, and accept that your job has changed. You have to zoom out a level. If you're used to being an engineer, you just became a manager. If you're used to being a manager, you just became an architect. If you're used to being an architect, you just became a director.

Letting your ego tie you to the way you have written code for most of your career is a surefire way to find yourself obsolete. Now is not the time to be stubborn. Now is the time to be nimble.

Nor is it a time to give up, venture off into the woods, and live a life as a hermit. To quote Simon Willison:

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a career thanks to the invention of the table saw.

You are in like Inning 1 of a 1000-inning game, so don't be discouraged. We're just getting warmed up.

That pace is exactly why I called this the Summer 2026 Edition: how I work today looks nothing like it did in Summer 2025, and I'd bet Summer 2027 makes this version look quaint. Treat it as a snapshot of a fast-moving target, not a permanent playbook.

You can do this today.

Another thing I want to underscore: This is very doable, very affordable, and you can do this right now.

Step 1: Init

Monorepo, always.

Every project starts as a monorepo. I don't care if it's a weekend toy or a platform play - it goes in a monorepo on day one. The reason is simple: agents work best when the whole world is in one place. One repo means one clone, one set of conventions, one CLAUDE.md, and an agent that can reason across your frontend, backend, and infra without you stitching context together from five different repositories. The more you lean on agentic coding, the more a project-sized monorepo stops being a preference and becomes a requirement.

I've created a core monorepo you can clone and use to keep things simple for the purposes of this post, but feel free to shape the structure however you like.

Infra from day one.

Everything I build runs on AWS, so I use CDK to define the infrastructure - in code, in the repo, from the very first commit. Prefer Terraform? Use Terraform. The tool doesn't matter; the rule does: nothing deploys by hand, ever. If it isn't a stack the agent can read, change, and deploy, it doesn't exist. Manual clicks in a console are invisible to your agents - and anything invisible to your agents is a liability waiting to bite you.

Then git init, push it up to GitHub, and take a break. The foundation is done.

Step 2: Plan

This is the step everything else rests on. Skip it, or phone it in, and your agents will cheerfully build you the wrong thing at superhuman speed. Time spent planning is time you don't spend untangling a mess later - and it's the single highest-leverage hour you'll spend on a project.

The `/docs/planning` folder

The docs/planning folder exists for you to put all of your features, ideas, thoughts, questions, concerns in one place. You will use this folder to brainstorm with Claude Code.

Fire up a Claude Code session, and tell Claude you want to use Plan Mode to work on a feature. Describe the feature, everything you want it to accomplish, the constraints, architecture requirements, etc. Tell Claude you want to output a PRD for the feature to a Markdown doc in docs/planning. When you're done giving Claude the inputs, let it produce an output: a Markdown-formatted PRD.

Plan Mode is extremely valuable in that it lets you think through your decisions, challenges your assumptions, and covers missing pieces that you never thought of in really effective ways.

Letter grades for PRDs

One of my favorite AI tricks is to ask it for a letter grade: "Give this PRD a letter grade, with reasoning." This forces the LLM to infer a scoring rubric, assign a score, and then explain the score, with very little typing involved.

Once you have your grade, if it's anything less than an A, ask what we could change about the PRD to make it an A. Then ask Claude to change the PRD in those ways to move it up to an A.

You can use this for anything, not just for planning - contracts, life decisions, neighborhoods, where to go for dinner. It works remarkably, and predictably, well.

Step 3: Implement

From PRDs to GitHub Issues

Once you have a set of PRDs that feels something close to complete, you are ready to move those PRDs over to GitHub Issues.

GitHub issues is a clean, simple, beautiful, and FREE way to manage your tasks. Forget JIRA integrations or Linear connections. If your project is a software development project and the only people involved in it are engineers, GitHub Issues is enough. If not, YMMV, but for the purposes of this blog post and our demonstration, GitHub Issues is it.

If the PRD is your requirements doc, the GH issue is your implementation plan.

Tell Claude to use the GitHub CLI (which you've already installed and authenticated using gh auth) to create GitHub Issues for each of the PRDs you've created in docs/planning. Tell Claude to structure the issues in ways that are independent, non-blocking, and can allow for parallel development as much as possible. Be sure to tell Claude to make the issues "agent-sized."

What "agent-sized" means

Agent-sized means something that can be implemented within roughly 30 minutes, leaving another 30 minutes for PR reviews and feedback, iterations, and resolution. Depending on your project size, what is "agent-sized" to you might look different: bigger, smaller, more complicated, or less. Experiment to find the best result for your situation.

But for the most part, on hourly loops, I've found that agent-sized means 30 minutes or less of straight coding.

The labels that let Codex find work

You'll put this in your AGENTS.md and CLAUDE.md files, but it merits calling out directly: You want your automation-ready issues to carry some labels. I use lane:cdx-any and ready. This lets your automation script know what to look for and what to ignore. Issues that aren't meant for automation don't have lane:cdx-any and are safely ignored. Issues that you're still noodling on don't have the ready label and are safely ignored.

But every hour, when that Codex automation runs, a GitHub issue with the right labels will get picked up, worked on, and processed. It works when you're awake. It works when you're asleep. In other words: It's magic!

The name autorepo was inspired by Andrej Karpathy's autoresearch.

Step 4: Iterate

Three dusty Macs become a coding cluster

Once you get the hang of your Codex automation/GH Issue automated workflow, the first thing you're going to want to do is replicate it on other machines.

I have good news: All you need is a Mac. It doesn't have to be a Mac Mini, or a brand new M5 Mac. Just a Mac that uses an M1 or newer CPU. Every dusty Mac you might have sitting in a closet somewhere just became a virtual dev. The CPU doesn't have to be that powerful because most of the processing happens at your LLM provider's point of inference.

Lanes: parallel tracks of work

Each machine in your workflow needs a unique identifier. I went with CDX-MACHINE for my naming convention, so the old 2020 M1 Mac Mini sitting in my closet became CDX-MINI. The old MacBook Pro that I just retired as my main workhorse became CDX-MBP.

It doesn't matter what your naming convention is as long as each machine has a unique identifier it can use to claim an issue.

Claiming an issue without a race

Your automation runner script does a few things in order:

Identifies the machine the script is running on.
Asserts connectivity to github.com.
Pulls any open GitHub Issues that have the required labels.
Picks an eligible issue and claims it by applying its lane label.
Waits one minute, then re-reads the issue to confirm it still owns the claim.

That wait-and-re-check step is the cheap insurance against two machines grabbing the same issue at the same moment. If a machine looks back after its minute and finds another lane got there first, it backs off and goes looking for different work. It is not a distributed lock and I won't pretend it is - it's a pragmatic, best-effort claim that has been more than good enough in practice. (More on the edge case in "When it breaks.")

Opening the PR

Once a machine owns an issue, the rest is what you'd hope. It reads the issue and its acceptance criteria, implements the change on a branch, runs the tests, and opens a PR wired back to the issue and ready for review. From here the change enters the part of the system that actually keeps quality high: Design Review, automated code review, and my own eyes on the PR. Which is exactly where we're headed next.

Design Review: the quality gate that actually works

This is the part I'm most proud of, and the part I don't see many other people doing yet.

Every PR that touches the frontend has to pass a Design Review before it can move forward. Not a linter. Not a snapshot test. An actual opinionated review of whether the change looks and feels right - run automatically, in CI, on every single frontend PR.

A custom Opus prompt plus a Design Vision doc

The mechanism is simple. I wrote a DESIGN.md that lives in the repo and describes how the product is supposed to look and feel: typography, spacing, color usage, interaction patterns, the tone of the UI, what "good" means for this project specifically.

Then a CI workflow fires on any PR that touches frontend files. It hands the diff, plus the Design Vision doc, to a custom Claude Opus prompt and asks one question: does this change honor the vision, or does it drift? Opus reviews the change against the doc and either passes it or blocks it with specific, actionable feedback - the same feedback the agent then picks up and addresses on its next loop.

The Design Vision doc is the whole trick. Without it, you're asking an LLM for generic taste. With it, you're asking whether this specific change is consistent with a documented standard you actually care about. That's the difference between "looks fine I guess" and a real gate.

Why only frontend changes go through it

The planning loop - PRDs, issues, lanes, hourly automation - is substrate-agnostic. It doesn't care whether you're shipping an API or a button. But quality gates are where frontend earns its own treatment, because frontend is where AI slop is most visible and most corrosive.

Backend drift you can mostly catch with tests. Frontend drift - inconsistent spacing, a one-off color, a component that reinvents a pattern you already have - sails right past tests, because the code is "correct," it just looks wrong. Those are exactly the changes that accumulate into a product that feels like it was assembled by ten different people who never spoke to each other. Which, in a sense, it was.

The Design Review is how I keep ten agents from producing a ten-personality UI. It's the single most effective thing I've added.

This is your filter against AI slop.

The async colleague

The first time it happened I actually laughed.

I'd left a comment on a PR - something like "this works, but pull the magic number into a constant and add a test for the empty case" - closed my laptop, and went to lunch. An hour later I came back and the comment was resolved, the constant was extracted, the test was there, and the agent had already moved on to a new issue. I hadn't done anything. I'd just... mentioned it.

How PR comments become next-cycle work

Here's the loop that makes it work: when a Codex automation runs, before it goes looking for a new issue, it checks for PR feedback on any open PRs in its lane. If it finds comments - from me, or from the automated Claude Code review - it addresses them first, pushes the updates, and only then looks for a new issue to pick up.

So PR comments aren't something I have to babysit. They're just the next cycle's work. I drop feedback whenever I happen to look - between meetings, waiting on coffee, on my phone - and it gets handled on the next hourly tick without me holding the thread open in my head.

Codex as a remote collaborator who never sleeps

The mental model that finally clicked: Codex is a remote colleague in a very different timezone who picks up my comments the moment I stop typing. I don't wait on them. They don't wait on me. I leave async feedback, they act on it async, and the work moves forward whether I'm at the keyboard or asleep.

Three machines running this means I effectively have three of those colleagues. None of them get tired, none of them get bored of the boring tickets, and none of them take it personally when I send a change back for the third time.

Why Codex

Fair question, given all my talk of swapping providers: why is Codex the one running the lanes?

Because the automation lane is the highest-volume, most token-hungry, always-on seat in the whole operation - and that's exactly where Codex earns its keep. Three machines firing every hour, grinding through implementation work around the clock, burns an absurd number of tokens. In my experience OpenAI is simply more affordable and more generous with token usage on coding workloads, and when you're running nonstop that stops being an academic question. It's the difference between "this is sustainable" and a monthly bill that makes you quietly turn the whole thing off.

It's also just a better workhorse for unattended work. Codex's automations are more predictable run-to-run: it claims the job, does the thing, and opens the PR without babysitting. When nobody's watching, predictable beats clever - and that reliability is worth as much to me as the price.

There's a quality angle to this too. GPT-5.5 at xhigh has held a remarkably consistent code-quality bar since the day it shipped - I get roughly the same caliber of work at 2pm on a Tuesday as I do at 2am on a Sunday. Claude Code, as much as I love it, has been streakier for me: the quality can swing depending on the hour and the day. For a model writing code unattended, around the clock, that consistency is the whole ballgame. I'd rather run a workhorse that's reliably an A- than one that's an A+ on a good day and a C when the servers are slammed.

So Codex draws the implementation lane, and Claude is where I lean for planning and review. That split isn't an accident, and I'll get to why a little further down.

A new rhythm for code review

This rewires how review feels. It stops being a blocking, synchronous chore where someone sits waiting on you, and becomes a stream you dip into on your own schedule. I review PRs throughout the day as I can, approving or commenting alongside the automated Claude Code review that runs on every PR. My comments and the bot's comments sit side by side, and the agent treats both as work to do.

The result is a review cadence that fits around a life instead of one that demands you park in a chair and clear a queue. I review when I want to. The work moves when I don't.

End of day: a code review with Claude

The day doesn't end when the last PR merges. It ends with a conversation.

Every evening I open a fresh Claude Code session and say some version of: "Claude, summarize all changes made since midnight, give each major change a letter grade with reasoning, and give me your thoughts on the direction we took today."

What comes back is part standup, part retro, part sanity check. A clear summary of everything that shipped, a graded read on the quality of each significant change, and an honest opinion on whether the day's direction made sense or whether I'm quietly painting myself into a corner.

Chatting with Claude about your code at the end of the day - the way you would with a trusted colleague - provides a level of insight and calm that's genuinely hard to overstate. It's the moment the operator zooms back in, just briefly, to confirm the machine is still pointed in the right direction. Then I close the laptop, and the agents keep going.

When it breaks

It is not all magic and merged PRs. If I let you believe nothing ever goes wrong, I'd be lying to you - and you'd hate me the first time you tried this and it blew up.

Failure modes I've actually hit

The real ones, in roughly the order they've bitten me:

Runaway loops. An agent gets stuck and keeps trying the same broken approach, burning tokens and producing an ever-uglier diff. The hourly cadence is itself a circuit breaker here - a bad run is bounded by the clock - but you will see it.
Hallucinated APIs. The agent confidently calls a method that doesn't exist, or a version of a library's API from two majors ago. Tests catch most of this. Not all of it.
Breaking refactors. A change that's locally correct and globally wrong - it does exactly what the issue asked and quietly breaks three things the issue didn't mention.
Dependency landmines. An agent helpfully bumps or adds a dependency, and now you've got a transitive surprise you didn't sign up for.

None of these are dealbreakers. All of them are reasons you need guardrails, not vibes.

Neon as the safety net

I cannot recommend Neon enough. Neon is what saves you when an agent goes sideways.

The killer feature is instant database branching: every agent gets its own prod-data clone to test against, spun up in seconds, with zero performance impact on the real database. The agent can run migrations, mangle data, do whatever it needs - on a throwaway branch that never touches production. When it opens a PR, the change has already been exercised against realistic data. And when it inevitably gets something wrong, the blast radius is a branch I delete, not a prod incident I have to explain.

One caveat I'll name because it matters: those clones inherit production data, which means they inherit production PII. If you do this, mask or scope the sensitive columns on the agent-facing branches. Don't hand ten automated agents an unredacted copy of your users' data and call it a safety net. And the same boundary applies to the whole setup: this is built for personal projects on infrastructure I own. Don't point any of it at an employer's code or production data unless your company has explicitly approved the tools, the data flow, and the runner isolation.

Guardrails I rely on (branch protection, label trust, runner isolation)

The whole system is only as safe as the rails around it. Mine:

Branch protection. main requires passing status checks and a review before anything merges. The agents can open PRs all day; they cannot merge to main on their own authority. This single setting is the difference between "automated development" and "automated disaster." Turn it on before you turn on anything else.
Label trust. Automation only touches issues carrying lane:cdx-any and ready. So the real question is: who can apply those labels? On a private repo with just me, that's a non-issue. The moment the repo opens up, that label becomes a code-execution trigger - anyone who can apply it can get your CI to run their code. Treat the label as a trust boundary, not a convenience.
Runner isolation. I run self-hosted GitHub runners in AWS, and self-hosted runners executing PR code have well-documented escape paths into the account that hosts them. Mine are ephemeral - a fresh environment per job, torn down after - running in an isolated account with tightly scoped IAM and pulling from a known-good image. If you copy nothing else from this post, do not point a long-lived self-hosted runner with broad AWS permissions at code an agent wrote. That's how you get popped.

And one honest caveat on the lane mechanics: the claim/wait/re-check dance from earlier is best-effort, not provably race-free. It's rare, but two machines can still double-claim the same issue inside the same minute. It hasn't caused me real pain, but I'm not going to pretend it's airtight.

Cross-provider review: why I split brains

Here's an opinion I hold strongly: don't let one model do everything.

It genuinely doesn't matter whether you use Claude Code or Codex for any given job - swap them, do the exact opposite of what I do, use one for both. That part's your call. But I've landed on a deliberate split, and it isn't arbitrary: one provider plans, a different provider implements, and a different provider again reviews.

The reasoning is simple, and a little human. A model reviewing its own work carries the same blind spots it had when it wrote the thing. Hand the same code to a different provider and it picks up on things the original author sailed right past - the same way a second set of human eyes catches what you've gone face-blind to. Cross-provider review isn't about one model being smarter than another. It's about not grading your own homework.

So I split brains on purpose. Plan with one, build with another, review with a third. The disagreements between them are where the best catches come from.

What I'd tell a skeptic

Here's the honest part. We are still not in a place where you can type "build an Uber clone, make no mistakes" and get back something that works and is actually great. Left to their own devices, these tools still make horrible architectural decisions. The agent is a brilliant builder and a questionable architect.

That is exactly why the planning matters as much as it does. On a recent project, I spent an entire week doing nothing but brainstorming and writing PRDs with Claude - before the repo was even initialized. Not a single line of code. Just thinking, out loud and on paper, until the shape of the thing was right. The automation only looks like magic because all the hard thinking happened first.

And yes, I can hear the objection: this is one person with three Macs and a greenfield repo - it'd never survive a real team or a million-line legacy codebase. Fair. The exact mechanics are tuned for a solo operator moving fast. But the principles scale further than the setup does: plan before you build, gate quality in CI, never let a model grade its own work, keep a human as the architect. Those hold whether you're one person or a hundred - the blast radius just gets bigger and the guardrails get stricter. The teams that figure out how to industrialize this are going to run circles around the ones still debating whether it's real.

But you already knew that. Because you've embraced your new job. You're not the coder anymore - you've zoomed out. You're the architect. The operator. The director.