Software Engineering First

AI is a helper, not a driver.

Jun 11, 2026

In my normal programming work, I use AI tools in a very focused way that I've been told is "wrong," even though I'm more productive, but there we are. I'm not spewing out 20x my previous output, so maybe that's it. I'm an engineer, so when I work with an LLM, I do engineering. Shocking, I know.

I work incrementally, in small batches, with a well-defined architecture guiding the process. (People talk nebulously about "guardrails." Having a coherent AI-friendly architecture is an important one.) I don't let the AI design the program for me. I build thin vertical slices with domain value, not random features. I focus on quality, not output volume.

I tell the LLM what the code should do, of course, but I also tell it how to structure the code—how it fits into the architecture, what the APIs or messaging look like, etc. I don't simply describe the surface behavior of a "feature" and expect the AI to do all the work under the covers. Instead, I ask the AI to write small, manageable, focused, black-box testable components that fit into an overall system architecture that it does not control. The architecture comes first, and then I tell the LLM to implement it one component at a time. (I should add that some architectures work better than others in this context. DDD nested aggregates and entities have worked well for me, as have message-based components with hardened perimeters.)

I'll admit that I often don't give the output more than a cursory glance, but I'm a fanatic about testing. (There's another "guardrail.") I don't let the AI write my tests. When I first started doing this, I found that the AI would modify the tests so that the incorrect code it generated would pass. That's not testing. Also, the AI-generated tests tested the wrong things—e.g., small implementation details, not domain-level behavior.

I'm a big TDD guy, so I write my tests first and then tell the AI that the code it creates must pass them. When I do use the LLM, I'll specify the test in Gherkin format, and the results are small enough to review manually. To me, manual review of the tests is essential. There's no room for AI ambiguity in a test.

I've been told that I'm not leveraging the full capabilities of the AI by working this way, that I should just describe features or modifications and have the AI do all the work. I am, nonetheless, more productive than when I don't use the LLM, and don't seem to have the problems (e.g., lurking bugs, fragility in the face of scaling, overwhelming complexity, etc.) that seem commonplace with other approaches.

My measure for productivity is time to complete a story, and that time has gone down. I couldn't care less about output volume. When working in the small like this, with the work constrained by component boundaries, the LLM cannot break existing code when it makes unrelated changes. I'm not overwhelmed by a pile of code so vast that I can't understand it.

So, maybe I am doing it "wrong," but I'm happy with what I'm doing.

Agility!

Discussion about this post

Ready for more?