Agile Architecture: 4 Common Strategies
At ThoughtWorks, our preferred way to start a project is by doing a set of workshops and sessions with stakeholders for about two weeks. That is what we call Inception. After the Inception, we usually have a product backlog for the project and are ready to start writing production code.
During this period we come up with a technical vision for the system. It is a very incomplete artefact, pretty much just a system metaphor, barely good enough to start the project. At this stage I always get funny looks from the client’s architecture or development team, and questions like: “So, in this agile thing we have no architecture? We just start coding without thinking about how things are structured at a higher level?”
Well, not really. We do tackle architecture problems, but in a way that is very different from what other methodologies may suggest. When a project starts, I do not care about having a sound and complete architecture. What I do care about is having a reasonable strategy for dealing with architecture decisions.
In an agile project, User Stories contain both business value and technical investigation or architecture work. The usual way to deal with that is to pile those cards into an iteration backlog and play them in priority order. This works really well when the new feature requires just design changes, but is not so great for architecture decisions. Architecture, remember, may be defined as “things that people perceive as hard to change”.
As a Tech Lead or Iteration Manager, I have learned and experimented with many different strategies for dealing with those “hard to change” decisions. In this article I want to discuss four of them.
1 - Iteration Zero
One way of dealing with architecture issues is to address them in the very first iteration for a project, in what is often called Iteration Zero. The team works on all the technical investigation and architecture decisions they know will have to be addressed at once, before they actually start writing production code.
In theory, this strategy benefits from some economy of scope: it is probably more efficient to group your technical and architectural problems while solving them. User Stories would then leverage the architecture defined in the first iteration; they should contain 100% business value and no known technical risk should have to be addressed while playing them.
This has some hidden problems, though:
- Wasteful: You will be spending time designing and architecting things for cards that may never be played.
- Doesn’t deliver business value: You are introducing agile methods to a client, and the first thing you want to tell them is that every iteration they will get some business value. After all the work involved in convincing them, you start the project and do not deliver anything in the first iteration. This can become such a big problem that account managers inside ThoughtWorks sometimes ask teams following this strategy to call this “Preparation”, “Technical Investigation Phase”, or “anything but an iteration!”.
- You don’t know what you don’t know: You may try to address all technical investigation now, but if your project is really agile, what you know now is just a tiny fraction of what you will know in two weeks’ time. You are not making commitments at the last responsible moment, and that really reduces the benefits of agile methods.
- Not enough time: It is very common for Iteration Zero to break the timebox and go on for many weeks, even months. The amount of technical investigation and work — which, remember, may or may not be necessary — simply will not fit into a regular iteration.
I do use Iteration Zero, but I do not use it as the main mechanism for technical investigation and design. My Iteration Zero aims to get a story done, from end to end. This has to be an important story, so that the work performed to deliver it can be reused elsewhere. Given that we have no structure in place yet, this is often too much work for a single developer or pair to accomplish. What I usually do is break the story into tasks, Scrum style.
That way we will have something to show the client in the showcase and, according to the Rule of the Second Card, will have done just enough design to reduce risk for the upcoming iterations.
2 - Tech Story
This is the style suggested by Philippe Kruchten. In this strategy, architecture and design work have their own cards that get prioritised and played together with business-value cards. We call those cards Tech Stories, as opposed to User Stories. Theoretically, most architecture decisions will be made while playing the Tech Stories, and the developer playing a User Story will benefit from the fact that all decisions were already made.
The biggest benefit of this strategy is that it creates visibility into the technical risks and design work required for a project. When you mix technical and business requirements in a single story, it is very hard to understand how much of the effort is technical and how much is business-related; you often have to justify to the business why “add a link to the home page” is actually a lot of work because your architecture was not ready for that.
It is also good that this approach does not hurt the just-in-time nature of agile methods. You can schedule Tech Stories to be played in any iteration, ideally just before playing a dependent User Story.
In my experience, though, this is not so great:
- Wasteful: Just like the Iteration Zero approach, Tech Stories mean that you may waste time investigating and designing things that will not be useful when requirements change. And they will change.
- Architecture becomes low priority: If those cards are put into the product backlog, that means they have to be prioritised by the business, and it is really hard for them to understand the value of those technical tasks. Developers have to keep having the same debates, trying to convince people that those acronyms have real value.
- Creates a dependency chain: The dependency between Tech Stories and User Stories is really hard to manage. If a Tech Story takes longer than expected, it may block not one but many User Stories.
- There is still something missing: Regardless of how much up-front technical investigation you do, you will not be able to cover all possible issues for all User Stories. Technical investigation and design will still happen inside a User Story anyway.
Tech Stories will happen in any non-trivial project. I try to minimise their number. I do not want my client paying up front for technical effort that they may or may not need. I want the client to understand that every time we add a feature there is a cost, and that part of this cost comes from purely technical work.
I also hate asking the client to prioritise things they do not understand. How can I explain to a product manager that “Refactor user class” is more important than “Edit user details”? Many important technical decisions cannot be directly mapped to a tangible benefit. You can try to convince your client, but you are going to have this same conversation every single week for the rest of the project.
When I work on projects where, for some reason, Tech Stories are the norm, I usually apply a technique called a technical budget (Kruchten calls it “buffers”). In this technique, every iteration has a fixed capacity — often expressed as a percentage of velocity — allocated to Tech Stories. This works, but it is very inefficient. Budget sizes will always be too large or too small; you often end up having to renegotiate buffer increases every iteration.
3 - Spike
Doing any up-front design or investigation is mostly about mitigating risk. Risk can only be mitigated by generating information (see Managing the Design Factory for an excellent discussion on this). The canonical way to generate information in agile projects is called a Spike.
A Spike is an experiment whose goal is to generate information. It may be something like “can we use JSON.Net to serialise our data instead of doing it with a template engine?” or “should we split this blob into a three-Layer system?”.
A developer or pair playing a Spike will try to generate the required information, often by writing some code. This effort is time-boxed and the Spike does not add to the team’s velocity. A Spike will not remove the need for technical investigation or risk from User Stories completely; it will just reduce the required effort by providing some information before the card is played.
My problems with this technique:
- Architecture becomes low priority: Just like with Tech Stories, Spikes need to be prioritised by the product manager.
- Some will be useless: Spikes may end up generating no useful information whatsoever. Surely the developers will learn something, but that information may not be relevant to the project at all.
- Throw-away: The code written for a Spike should be disposed of once its goal is met, i.e. once the information was generated. Spike code tends to have very low quality, and developers can end up hacking together a quick solution only to find later that it is not applicable to production code.
I use Spikes all the time. Most of them are not disposable code, though. I often require my team to apply the same quality standards to Spikes that we apply to production code.
As for prioritisation, it is less of a problem than it is with Tech Stories. Tech Stories are supposed to have value by themselves, and that is hard to communicate to business people. With Spikes, though, I usually ask for two estimates: one for “if this card is played before the Spike” and another for “if that card is played after the Spike”. In many projects, not playing the Spike before the User Story means an estimate that is 60% higher.
4 - Assembly Line
The last strategy I want to discuss today is what I call an Assembly Line. In this strategy, the development part of a User Story is split into two or more sections: one for technical investigation or architecture, and another for actual development, i.e. meeting the acceptance criteria.
User Stories are first placed in some Technical Investigation queue and, after someone does the necessary investigation and design, they are moved to the actual Development queue.
This is really useful when there is a huge difference in the work and skill set required to design a solution, as opposed to just meeting the acceptance criteria.
The problems I see here:
- What is “done”? It is really tricky to define what “done” means for the work performed in the investigation queue. Expect many endless and pointless discussions.
- Cards are blocked forever: When using this strategy, I have often had up to two development pairs idle with no cards free to be played. They were all in the Technical Investigation queue, and this queue often has fewer developers than the Development queue.
- Developers vs “Architects”: It is far too easy to create a distinction between people who do technical investigation and people who only play cards. The developers who are only playing cards often end up with very little interest in, or commitment to, the project. When faced with a technical problem, they are much more likely to move the card back to the Technical Investigation queue instead of working on it.
- Throw over the wall: Separating the work into two steps tends to create a “throw over the wall” culture.
This technique is, at times, the exact opposite of what we are trying to achieve with agile methods. I see the Assembly Line as a strategy that only sometimes makes sense — when recovering a project is a good example — and even then only for a limited period. The impact on morale for a team using this strategy for longer than strictly necessary can be massive, and the division of work creates all sorts of problems around information sharing and blame culture.
It is possible that your project requires some heavy up-front design for some cards, but if this is a common scenario for you there is probably something wrong. Design should not be separate from implementation.