Risk Storming: Surfacing Architectural Risk Before Production Does

You have a solution. A C4 diagram, or some other design. You show it to the team, everyone nods, everyone agrees it looks fine. Then it ships, and a month later one service becomes a bottleneck and falls over, and it turns out the design wasn't optimal after all. Nobody was careless. The team just never thought through a few risks that, in hindsight, were fairly obvious. At the time they didn't cross anyone's mind.

The Core Idea

Risk Storming doesn't remove risk from your system. It pulls risk out of people's heads and onto a shared diagram, before production discovers it for you.

Risks Like to Hide

Architectural risks have a habit of hiding, and they hide wherever we design for the happy path. The user clicks, the request flies, the data is saved, everyone's happy. But your system doesn't live in a laboratory. It lives in an environment that is unreliable by definition.

The network can drop in the middle of a request.
Traffic can spike tenfold on a Thursday at 8pm because marketing launched a campaign and nobody told you.
The third-party API holding up half your functionality has its own outages and its own SLA that nobody read.
Bus factor. One person on the team understands the billing module, and they just left for a three-week holiday.

All of these things sit in people's heads as quiet assumptions. "Payments always respond." "There won't be that much traffic." It sounds like a fact, it looks like a fact, right up until the first incident. A hidden assumption has one nasty property. You can't see it until it blows up.

When Disagreement Is Really Undiscovered Risk

You see the same thing in design reviews. You're sitting with the team, sometimes someone from C-level joins. Someone presents the architecture and it kicks off. One senior says it won't survive heavier traffic. Another defends their design because it's simple and ships fast. The CTO weighs in that the quarter-end deadline is what matters most. Everyone pulls in their own direction, the discussion goes in circles, and usually whoever talks loudest, or has the bigger title, wins.

It looked like an argument about architecture. Look closer. The first senior named a scaling risk, the second a delivery risk, the CTO a business risk. Each one was real, but nobody called it a risk. So instead of calmly comparing those risks, you argued about who was right.

Those weren't disagreements. They were undiscovered risks dressed up as differences of opinion.

What Risk Storming Is

For all of these situations there's one technique: Risk Storming. It was created by Simon Brown, the same person behind the C4 model, and he describes it in his book Software Architecture for Developers and on riskstorming.com under a Creative Commons licence. He frames it as a quick, collaborative, visual way to identify risk that the whole team can take part in, not just the architect. Developers, testers, project managers, and operations people all see different things, and that is the point.

You put your architecture diagrams in front of the team and ask everyone, individually and in silence, to write down the risks they spot. One risk per sticky note. In the diagram below, the notes sit right on top of the area each risk threatens.

A C4 container diagram for an Internet Banking System with Risk Storming sticky notes placed on the components: red notes marking higher-priority risks on the API Application, Database, and client containers, green notes marking lower-priority ones. — Risk Storming on a C4 container diagram: each sticky note is a risk placed close to the container it affects. Here two participants used a colour each, so you can see at a glance who raised what and where their concerns overlap.

Scoring Priority

Each risk gets a score of probability x impact. Rate probability and impact each from 1 to 3 and you get a value from 1 to 9, which maps to three colours: green for low (1 to 2), amber for medium (3 to 4), and red for high (6 to 9). The colour is what makes the high-risk areas of the diagram jump out.

How you run it technically is secondary. Sticky notes on a whiteboard in the office, comments or a Miro board for a remote team, all of them work. What matters is that the risks finally land on the diagram, and that you look for the option that does the least damage.

The Four Steps

1. Draw the diagrams: Put up the architecture you plan to build or change. Brown's advice is to use a set of diagrams at different levels of abstraction, ideally following the C4 model, because each level surfaces different risks. A context diagram exposes integration and dependency risks, a container diagram exposes deployment and scaling risks, and so on.
2. Identify risks individually and silently: For around ten minutes, everyone writes risks on their own sticky notes, one per note, scored by probability and impact and coloured accordingly. Notes stay hidden until the end of this step. Working in silence first stops the loudest voice from anchoring everyone else.
3. Converge the risks on the diagrams: Everyone places their notes on the diagrams, close to the area where the risk applies. Similar risks cluster together, and the clusters show you where your architecture is most exposed.
4. Review, prioritise, and mitigate: Walk the board together. Pay special attention to risks only one person spotted, and to risks where people disagree on the priority. Capture the agreed risks in a register and, for the high-priority ones, work out mitigation strategies.

The Most Interesting Part: When Scores Diverge

That disagreement in the team, like the one back in the design review, almost always comes from somewhere. Someone is holding an assumption or a risk they never said out loud. They're against your solution because in their last project, exactly this approach blew up on them.

But that project was a completely different context. Different load, different team, different API. What didn't work there might run just fine for you, and the reverse happens too. Risk Storming drags those quiet assumptions into the open. Everyone shows their cards and places their risk on the diagram, and then together you check whether, in your situation, it's real or just a shadow from an old project.

Don't

Average the scores

One person gives a risk a 9, another gives it a 2, so you split the difference and call it a 5.5.

You've buried the most valuable signal in the room under arithmetic.

Dig into the gap

A 9 against a 2 means an unspoken assumption just surfaced. Find out what the person scoring 9 knows that the person scoring 2 doesn't.

That conversation is the whole point of the session.

When one person scores a risk a nine and another scores it a two, don't reach for the average. That gap is the best moment of the entire session. An assumption nobody spoke aloud just came out.

There's a side effect worth naming too. When the team clicks the risks onto your diagram themselves, they understand the solution far better than they would from a presentation. The technique is simple to run and it earns its place quickly.

From Risk to Architectural Decision

Identifying risk is only half the work. The other half is deciding what to do about it. Mitigation means reducing either the probability of a risk or its impact, and each high-severity risk that survives the session becomes an input to your next architectural decision.

Reduce probability by adding retries with backoff, health checks, or a more reliable dependency.
Reduce impact with circuit breakers, fallbacks, graceful degradation, or a bulkhead between services.
Accept and document. Some risks aren't worth mitigating, so record the decision and move on.
Capture the outcome in an ADR so the trade-off stays explicit and you can revisit it later.

Start Applying Now

You don't have to summon the whole team right away. You can practice this technique by yourself first, on your own diagram, before rolling it out in meetings. Take your last design and ask yourself one question.

This Week

Open your most recent C4 diagram. Where would you stick the first red sticky note? Start there. That's your highest risk, and it's been hiding in plain sight.

Risk Storming doesn't make risk disappear. It takes risk out of people's heads and puts it on a shared diagram, where the team can see it, score it, and decide what to do, before production decides for you.

Risk Storming: Surfacing Architectural Risk Before Production Does

Risks Like to Hide

When Disagreement Is Really Undiscovered Risk

What Risk Storming Is

The Four Steps

The Most Interesting Part: When Scores Diverge

From Risk to Architectural Decision

Start Applying Now

Want to Work Together?

Kamil Bączek