This post expands on a train of thought initiated by Dan North in his talk “Kicking the Complexity Habit” at NDC London 2014.
“Frequent releases reduce risk” – this is something you hear all the time in conversations about Continuous Delivery. How exactly is this the case? It sounds counter-intuitive. Surely releasing more often is introducing more volatility into Production? Isn’t it less risky to hold off releasing as long as possible and take your time with testing to guarantee confidence in the package?
Let’s think about what we mean by risk.
What is risk?
Risk is a factor of the likelihood of a failure happening combined with the worst case impact of that failure:
Therefore an extremely low risk activity is when failure is incredibly unlikely to happen and the impact of the failure is negligible. Low risk activities also include those where either of these factors is remarkably low such that it severely reduces the effect of the other.
Playing the lottery is low risk – the chance of failing (i.e. not winning) is very high, but the impact of failing (i.e. losing the cost of the ticket) is minimal, hence why many people play the lottery.
Flying is also low risk due to the factors being balanced the opposite way. The chance of a failure is extremely low – flying has a very high safety record – but the impact of a failure is extremely high. We fly often as we consider the risk to be very low.
High risk activities are when both sides of the ratio are high – high likelihood of failure and high impact, for example extreme sports such as free solo climbing and cave diving.
Large, infrequent releases are riskier
Small, frequent releases reduce the likelihood of a failure
Releasing often, containing as small a change as possible, reduces the likelihood that the release will contain a failure.
There’s no way to reduce the impact of a failure – the worst case is still that the release could bring the whole system down and incur severe data loss, but we lower the overall risk with the smaller releases.
Release small changes often and reduce the likelihood of a failure and therefore the risk.
This is a great introduction to the counter-intuitive truth that smaller, more frequent changes reduce risk.
You are entirely right that smaller changes reduce the probability of a failure, as splitting a large experiment into smaller independent experiments reduces variation in outcomes i.e. by dividing up the risk associated with a release into smaller releases, we can more accurately estimate where failure probability truly resides and act upon that new information.
However, saying “there is no way to reduce the impact of a failure” is incorrect – smaller, frequent changesets can also reduce the cost of failure as well as the probability of failure. I outline why in my Release Testing Is Risk Management Theatre talk (see 35:17 of https://skillsmatter.com/skillscasts/5394-release-testing-is-risk-management-theatre). The cost of a failure is also two-dimensional – the economic impact of failure and the duration of failure. The former is hard to influence let alone control due to external market forces, but duration is easy to control due to Little's Law – less WIP means a lower lead time, which means a lower failure duration, which means a lower cost.
That is why production defect fixes always go out in small changesets – to minimise the opportunity cost of the current defect, *and* to reduce the probability of causing further defects with the fix.