How to quantify technical debt inflation

Author: Chloé Lourseyre
Editor: Peter Fordham

If you work for a software company, you necessarily end up in a situation where you have a technical debt to repay, but don’t have the approval of your management to do so now. “We’ll deal with it later”, they say. But, as a good developer, you know two things: technical debt is harder to solve the more we wait to solve it, and dormant technical debt has a cost that is added to everything that is written in the meantime.

When you try to argue “Technical debt is costly” to the said management, they answer “How much will it cost?”. But you don’t have an answer for that, as there is no way to predict the future.

This is what I call the Technical Debt Inflation problem or TDI for short.

Disclaimer: I could not solve the decades-old problem of the TDI in a 2k-words article. What is given here is a line of approach, a stepping stone to your own reflection, and is trying to open the debate. Enjoy.

What is technical debt?

Technical debt is (for the sake of this article) when you have a part of your code that is badly designed and needs refactoring to be efficient. It has no impact on the user but makes the code harder to maintain and harder for new features to be developed.

It often appears when you chose the short-term approach rather than a long-term solution. It would cost you more to write a long-term solution, so you chose the short-term one, although it will cost you more to maintain this solution in the future.

What is technical debt inflation?

Technical debt is costly in two ways.

First, it is costly to solve. Solving a technical debt takes time, and since it is invisible to the user and to management it is often considered useless by them.

Second, it is costly to work around. Technical debt almost always impacts how you develop new features (or maintain existing features) that depend on it. For instance, the technical debt can be a badly designed interface, which is counter-intuitive to learn to use. Another example is, that if a module is badly written, any update that is made within it will take more time than if it were well designed.

Technical debt inflation is the fact that the more you wait to solve a technical debt, the more these costs increase.

Indeed, the more you wait to solve a technical debt, the higher the impact of solving it will be (because you will have more pieces of code that depend on it, so it will be longer to refactor). Plus, the more a technical debt is outdated, the harder it is to use and maintain it.

Why quantify technical debt and technical debt inflation?

It is hard to evaluate the magnitude of the impact of technical debt. It is even harder to justify such evaluation to the ones that will approve -or not- such labor.

If we manage to design a model which puts numbers behind technical debt, justify the necessity of refactoring will be easier.

You could say things like “Yes, it would take three days to solve this technical debt now, but if we don’t, in two years it will cost an average of two hours per week for each developer, for a total of fifty days at the end of the second year…”. Maybe that could serve to put perspective into your manager’s eyes.

But I should say that the most important thing is not the numbers, but your arguments.

How to quantify technical debt?

Technical debt is costly to solve. The first step is to evaluate how much time it would cost to solve the technical debt today. Without that, we won’t be able to evaluate the inflation of this cost.

Fortunately, this is the easiest part. Based on your experience with your codebase, you should be able to evaluate how much time it would take you to perform the correction.

Usually, I would advise multiplying any reasonable evaluation by two or three to take unforeseen difficulties into account.

If you work within a team, it is customary to calculate this prediction as if it was the slowest dev of your team that would do the job (you might not be the one that performs the correction, and if you work faster than your coworker, your evaluation might be flawed).

With this quantification set, you can now evaluate the inflation of this technical debt.

How to quantify inflation?

The most important thing, about inflation, is that it is not linear.

In fact, since there are two ways that technical debt is costly (costly to solve and costly to use), and they both suffer inflation, then the technical debt inflation is (at least) quadratic¹. It is not proportional to the time, but to the time squared.

This is the single thing you should remember from this article, this is: TDI is quadratic.

Now, how can we evaluate this inflation?

A simple and usable indicator is the size of the code. The size of the code tends to increase with time, and if you manage to have a model extrapolated from how the size of the code increased in the past months/years, you will be able to predict how the size of the code will increase in the future.

I provide you with an example of how to evaluate the size of your code in the addenda.

Then, you take that evolution and apply a quadratic factor to it. This is what I call the Quadratic Expansion model.

Formalization of the Quadratic Expansion model

Let C₀ be the size of the whole code at t₀.
Let C₁ be the extrapolation of the size of the whole code at t₁.
Let D₀ be the estimated time needed to solve the technical debt at t₀.
Let D₁ be the evaluated time needed to solve the technical debt at t₁.
Let I₀₁ be the time wasted by the impact of the technical debt between t₀ and t₁
Let Δ₀₁ be the cumulated time the technical debt costs between t₀ and t₁.

C₀, C₁ and D₀ are known value.
D₁ and I₀₁ are intermediary values.
Δ₀₁ is the goal of the model.

D₁ = D₀ × C₁ ÷ C₀

I₀₁ = Λ × C₁ ÷ C₀, where Λ is a constant called the “lambda factor”².

Δ₀₁ = (I₀₁ × D₁) – D₀

Δ₀₁ = Λ × D₀ × (C₁ ÷ C₀)² – D₀

For simple calculus, you can assume Λ = 1 (we are looking for an order of magnitude, not a precise value), which gives

Δ₀₁ = D₀ × ( (C₁ ÷ C₀)² – 1 )

Example

You have a major technical debt to resolve that your manager is considering delaying for six months. You told them that it would cost more time to let be, and they are asking you to evaluate how much it would cost.

Today, your feature is composed of 216 hundred lines of code. Three months ago, it was composed of 178 hundred lines, so it grew by 38 hundred lines in three months. However, your four-dev team just has welcomed a newcomer and now has five devs, so the expected inflation within the next six months can be evaluated as 95 hundred more lines (38 × 2 × 1.25).

You estimate that solving the technical debt would take, at most, a whole week (5 days).

C₀ = 21.6k

C₁ = 31.1k

D₀ = 5 days

Δ₀₁ = D₀ × ( (C₁ ÷ C₀)² – 1 ) ≈ 5.4 days

Conclusion: according to the quadratic expansion model, the wait would cost about 5 and a half more days.

So you tell your manager that considering the teams’ productivity, waiting six more months will more than double the time lost on the technical debt (including solving and having to maintain a bad design).

Limits of this model

This model has huge limitations.

First, the calculus is not transitive. E.g. Δ₀₂ ≠ Δ₀₁ + Δ₁₂. This reflects the fact that the further we try to look (0 → 2) the more uncertain the technical debt cost will be. Mathematically, we should try to reflect that uncertainty in the model with a confidence interval.
Then, evaluating and extrapolating the size of the code is often feasible, but not trivial.
And, of course, this model has yet to be proven in real life³.

The million-dollar question

There would be one way to evaluate mathematically the TDI problem: by aggregating data over hundreds of projects over the years. But this is not a simple task, if not impossible. Here are the reasons why:

It would mean intruding into the code owned by private companies.
Even in retrospect, it’s hard to evaluate the impact of technical debt.
The study would take years if not decades to be fulfilled because technical debt takes years in terms of impact.

With that in mind, aggregating real data for a serious study seems impossible. But could we elaborate on a smaller model that would help us to solve the TDI problem? That needs more thought.

Wrapping up

I’ll tell it once again so there are no possible ambiguities: the Quadratic Expansion model is a limited, inaccurate, and unscientific way to evaluate technical debt inflation, but it gives a coherent order of magnitude and an argument in favor of early refactoring.

I hope that this will be the start of more serious studies about the TDI problem.

Remember that evaluating “lost” time of a living technical debt is not a trivial operation, and a live-testing evaluation model is impossible at a small scale.

But I hope this will help you get an order of magnitude of the cost of technical debt inflation.

Thanks for reading and see you next time!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

How to evaluate the size of your code? An example

With Git and a Linux shell, you can easily evaluate the current size of your codebase.

git ls-files allows you to list all files.

grep -E '\.(cpp|h|hpp)$' is a filter on source and header files.

wc -l counts the number of lines.

Here is the whole command to launch :

git ls-files | grep -E '\.(c|cpp|h|hpp)$' | xargs -d '\n' wc -l

(NB : xargs command allows us to feed the resulting output into the input of wc. The option -d '\n' is here to escape space in file paths)

Alternatively, you can use wc -m instead of wc -l to count the characters instead of the lines. It is a bit slower and a bit less intuitive, but I think it is a better metric than the line count.

To have a better output you can,

grep -E '^ *[0-9]+ total$' to only get the line with the total result.

sed -r 's/^ *([0-9]+) total$/\1/' removes the surrounding text to only keep the number.

The full command is now :

If you have several submodule, you can:

Add --recurse-submodules to recursively evaluate every submodule.

awk '{s+=$1} END {print s}' sums the values (which are individually reported for each submodule.

Final command line :

git ls-files --recurse-submodules | grep -E '\.(c|cpp|h|hpp)$' | xargs -d '\n' wc -l | grep -E '^ *[0-9]+ total$' | sed -r 's/^ *([0-9]+) total$/\1/' | awk '{s+=$1} END {print s}'

Notes

This is based on this idea : since there are two costs that increase with time and that these two costs are in close interaction (the consequences are intertwined), their combined cost is multiplicative (and not additive). That what makes, according to my view, the inflation quadrative.
Λ represents how much the rest of the code depends on the technical debt. The higher the impact the technical debt has on the rest of the code, the higher Λ will be, and the higher I₀₁ will be. However, as a matter of simplicity (and lack of better modeling), Λ is here considered constant.
Is it even possible to design a protocol that would allow us to evaluate the righteousness of any TDI model? Since we can only do one of two things (either solving it now or letting the debt inflate), there will always be uncertainty about the evaluation of the alternative. Plus, the time needed to solve a debt is dependent on one’s skill and, often, luck. In addition to that, the model claims to include risk, meaning the estimated inflation will be large because you can not know how bad a technical debt can grow. There is (in my knowledge) no way to verify this kind of abstract representation.

3 thoughts on “How to quantify technical debt inflation”

kobica says:

April 6, 2022 at 10:40 pm

thanks for the post. I also like this: https://github.com/AlDanial/cloc to measure files LOC.

Log in to Reply
1. Chloé `Senua` Lourseyre says:
  
  April 7, 2022 at 8:11 am
  
  Yes, using basic shell commands to evaluate the size of the code is far from being the best. It’s bettre IMHO to use other tools like the one you mention.
  
  Log in to Reply
Pingback: Comment quantifier l’inflation de la dette technique ? | Assurer le C++