Architecture as Probability Management

Why architecture matters more, not less

oes the fact that more and more code is going to be AI-generated make architecture and software design less important than before? If you can regenerate anything at any time (or so they'll tell you), why care about its internal structure or technical details? Things are certainly going that way; CLI tools don't even show you the code unless you're interested. Like no-code/low-code, but actual custom code that you just won't see. Many are completely willing to let the AI define the higher-level architecture from a set of existing blueprints, or even let it just come up with something on its own. After all, if it doesn't work, you can just ask it to fix it, right?

Good practices, modularity and clear separations of concern give your coding agents a living chance.

Click to enlarge

I'd like to disagree about this at least for now.

For projects with low expected lifetime, simple structure and limited feature set this might actually be reasonable. The ability to produce amazing-looking almost fully functional systems with zero lines of code from the developer, is of course a result of insane amounts of training data fed into models. In a way, what you've seen is Stack Overflow 2.0 in action: like it or not, you are witnessing patterns in the training data being reproduced rather than something entirely new or creative.

I'm a great fan of using established patterns, practices, blueprints and reference architectures for e.g. application architecture, infrastructure, security and so on. That's what we've been up to since the internet became a thing. But that's still a far cry from letting go completely of the system you might end up giving guarantees for. Or having your employer's name in the headlines with unflattering context such as "data breach" or "security incident", or just for having to maintain the thing for a long time.

So, I'd argue the opposite is true for anything more complicated than a simple CRUD web app with a hamburger menu on the top left and a persona symbol on the top right (funny how this became the standard layout at some point). Consider the figure below which illustrates the core building blocks of a software product. For the time being, I'll continue to insist on knowing what these blocks are made of, and why, but will remain open for good suggestions, and have certainly no desire to write any IaC code myself anymore.

The building blocks of a software product. Each block is an independent concern with its own patterns, conventions and task boundaries.

Click to enlarge

So my point is that you need to pay more attention to architecture, not less; it's in no way easier to correct big upfront errors later on, AI-generated code or not. It's the other side of the 'spec-driven design': define the structure and rules for the code to be produced first, and then let the model fill in the blanks. The architecture is the specification, and the code is the implementation.

Next, I'll explain in a bit more detail why I think you should still care.

Don't bite more than you can chew

An agent working within a well-defined module with clear inputs, clear outputs, clear boundaries, clear intent or concern has the privilege of operating in a small solution space.

Basically the number of options is limited and manageable. It's like sending 007 on a mission in London to catch a Spectre agent in the British Museum. Yes, there's a lot of rooms to hide, but the doors are locked and the layout is familiar. The probability of success is high.

When M sends James to Istanbul, however, his problem space gets vastly bigger. So many streets, waterways, and people talking a foreign language. In the movie, the bad guy conveniently shows up at the casino, but in your less martini-intensive world of coding that won't happen.

So, for James working across poorly separated concerns creates an enormous solution space where the probability of a correct end-to-end solution drops exponentially. It's the same for us developers: good luck finding anything in a codebase of a few hundred kLOC with no naming standards, modules, or clear patterns.

Good architecture tries to have clear patterns, boundaries and conventions that limit the solution space when developing a feature, fixing a bug, or implementing a change.

The Step Size Principle from Chapter 5 applies directly here: tasks that cross multiple modules or layers have a lower probability of success than tasks contained within a single module. So keep the layers organized.

Separation of concerns

Classic software architecture principles and patterns like separation of concerns, modularity, and layering (low coupling, high cohesion) are often seen as a recipe for maintainability and comprehensibility, a way to build software while remaining sane.

While I agree with Dave Farley's notion that boring code is good and that smart patterns are actually harmful, he certainly didn't mean the "100 different ways to do the same thing in the same app" kind of boring. He observed that AI tends to write "boring code": long functions with few branches, lots of static declarations, no fancy patterns. But that doesn't mean the code is well-structured or that it has clear separation of concerns, which is another thing entirely.

Each abstraction level adds coupling and context requirements. Agents thrive in constrained spaces; humans must own the boundaries.

Click to enlarge

I'm not willing to give up easily on having proper, well-thought-out, and human-understandable structure even when you let a model do the typing for you. (And don't get me wrong, Dave Farley wasn't suggesting that either.)

At some point, you will need to take the steering wheel to debug, fix, and sort things out. If what you (or the poor AI agent) need to look at is just a big ball of mud with spaghetti, good luck finding that subtle bug.

In AI-assisted development, separation of concerns is not just about maintainability. It's about probability management. Each layer boundary is a context boundary that limits the information an agent needs to process, which in turn limits the probability of context-related errors.

And let's not forget the quality attributes (performance, security, reliability) which you cannot test in a code review or with Playwright, or express as tasks with clear intent for agents to execute. These are baked into the architecture, which is exactly why they're called cross-cutting concerns.

Context engineering and architecture

Context engineering is typically discussed as a prompting technique, using subagents to start fresh with a smaller context, or including only the relevant information for the task at hand. But the most powerful context engineering happens at the architecture level, before any prompting begins.

A well-architected codebase can be described to an agent with a simple diagram or list: a map of modules, their responsibilities, their interfaces, and their dependencies. It does not matter if this is the CLAUDE.md file, an architecture.md, or anything, but the key thing is to specify which modules exist, what they do, and how they interact. This is the context that agents need to navigate the codebase effectively.

Here's a real example. This is a compressed documentation index from a project, designed to be loaded into agent context in a single line:

[Docs]|root:.docs|core:{conventions.md,architecture.md}|api:{api.md}
|auth:{authentication.md}|state:{state.md}|ts:{typescript.md}
|ui:{ui.md}|perms:{permissions.md}|scaffold:{scaffolding.md}
|test:{testing.md}|agents:{agent-phases.md}|migrate:{migration.md}
|sse:{notifications.md}|ver:{version-management.md}|skills:{reference.md}

The agent doesn't need to read all of these. A routing table tells it which to load on demand:

Key	Category	When to Read
core	Conventions, architecture	Always read first
api	API integration patterns	Creating services, backend calls
auth	Authentication patterns	Anything touching login, tokens, sessions
ui	UI patterns and components	Frontend work
test	Testing workflows	Writing or reviewing tests
agents	Agent phase definitions	Orchestration and pipeline work

The CLAUDE.md simply points to this index with one line: "Detailed guides in .docs/ — see .docs/INDEX.md for full structure." The agent reads the index, picks the relevant docs, and ignores the rest. This is architecture serving context: the documentation structure mirrors the codebase structure, which means the agent only loads what it needs for the module it's working on.

When your system grows, you might separate, for example, the frontend and backend into different files and make sure they are loaded on demand from separate files. The microservices-backend-engineer-agent could and should not care less about your next.js or tailwind configuration and practices.

Convention files as architecture enforcement

Coding conventions aren't just style guides: they're architectural constraints expressed as rules. Some of them are related to syntax or naming, some of them are patterns. I'm fond of rules like "Never create a new service or API without asking the developer first." to prevent slop and limit the combinatorial explosion of new files and classes many LLM agents are overly keen to create.

Also, rules like "Services always return Result error types." "Components never call APIs directly, they MUST use service adapters." "State mutations always go through the store." are good candidates. When these conventions are documented in files that agents read as context, they become soft enforcement of architectural boundaries.

It's a matter of scope and agent design when these rules are applied. I'd prefer as early as possible, meaning that the plan should already contain tasks that will follow your architecture. So when the coding starts, the agent is already in the right context and following the right patterns. Another approach is to trust the CLAUDE.md and have a strong belief it'll be followed throughout the process. In my experience, a detailed task list or todo list designed according to your separation of concerns tends to yield better results.

In practice, agents violate boundaries in predictable ways. The most common: doing exactly what was explicitly prohibited, like altering files they were told not to touch. Agents also love to stop after each step and ask for confirmation even when specifically instructed to continue autonomously, or just quietly disregard project context directions when those directions conflict with what seems "reasonable" to the model. Your best defenses are .llmignore files and tool restrictions rather than polite instructions. Soft rules get soft compliance.

The single most persistent violation is one-shotting. Despite explicit instructions to work one task at a time, agents will attempt to implement an entire feature or multiple tasks in a single pass. The only reliable fix I've found is limiting the scope specifically to a single task per invocation. Telling them not to one-shot doesn't work; you have to make it structurally impossible.

For example, instead of giving an agent a vague task like

Example

'Task: Add a button and an API call to service XYZ to perform action ABC'

You might have a better chance of NOT violating your house rules when the task is like:

Example

'Task 1: Add a button to the UI on Page XYZ on the top left menu to call "Do action." Wire that to serviceX/doSomething API (POST)'

'Task 2: Add a new route to serviceX/doSomething and POST handler'

'Task 3: Implement the API handler to parse the action and save the results to the DB.'

As a bonus, you don't need to write these yourself. Just make sure your plan is atomic enough.

Module-level context scoping

When an Agent works on a task, it should receive the following inputs:

the task specification,
the relevant module's code,
the interfaces of adjacent modules
the conventions document

and nothing else. Much of this is automatic and can be inferred from the task, conventions, and by traversing the codebase. Limiting the search space to the correct module, directory, or layer makes the difference between looking at a chance of 30 files edited with +1235 and -394 lines, and 4 files with +23 and -4 lines. Believe me.

So, every additional file, perhaps found by grepping for keywords in the context, is a potential source of confusion. Architecture that is 'good' for AI is modular, has clear interfaces and separation of concerns: it limits the search space. You can increase the intelligence of your coding tools by using additional indexing and graphing tools, such as Serena, to discover structures and patterns in your codebase.

https://github.com/oraios/serena — Serena

Designing for regeneration?

I saved this for the last.

Let's imagine you still cherish the idea of being able to regenerate major parts of your codebase at any time. Before I can talk you out of the idea by lecturing about the importance of architecture, let's consider what it would take to make that a reality.

First thing is the state of your project. Has any of it already been in production, possibly integrated into some other system, and stored some data? Your re-generation option just went out of the window.

Second, have the developers, agents, and pipelines already been somehow fitted to your initial (or 2nd, or 3rd) architecture? If so, you'll need to redo or at least verify them again.

Need I go on? Realistically, if you have more than one person on board, have already worked quite some time, and potentially spent some (customer) money on the project, you are not going to be able to just throw it all away and start over. Your AI tools will also struggle when trying to fix it or add new features to it. It's a major screwup, and you cannot hide behind the 'but the AI did it' excuse. I will conclude the chapter with a diagram of "Point of no return" in software development.

The window for major architectural changes closes well before your first deployment.

Click to enlarge

Do your homework first, stick to the planning mode as long as possible and have somebody else look at the first shots before investing significant time and resources.