Meta – Belay the C++

Author: Chloé Lourseyre
Editor: Peter Fordham

If you work for a software company, you necessarily end up in a situation where you have a technical debt to repay, but don’t have the approval of your management to do so now. “We’ll deal with it later”, they say. But, as a good developer, you know two things: technical debt is harder to solve the more we wait to solve it, and dormant technical debt has a cost that is added to everything that is written in the meantime.

When you try to argue “Technical debt is costly” to the said management, they answer “How much will it cost?”. But you don’t have an answer for that, as there is no way to predict the future.

This is what I call the Technical Debt Inflation problem or TDI for short.

Disclaimer: I could not solve the decades-old problem of the TDI in a 2k-words article. What is given here is a line of approach, a stepping stone to your own reflection, and is trying to open the debate. Enjoy.

What is technical debt?

Technical debt is (for the sake of this article) when you have a part of your code that is badly designed and needs refactoring to be efficient. It has no impact on the user but makes the code harder to maintain and harder for new features to be developed.

It often appears when you chose the short-term approach rather than a long-term solution. It would cost you more to write a long-term solution, so you chose the short-term one, although it will cost you more to maintain this solution in the future.

What is technical debt inflation?

Technical debt is costly in two ways.

First, it is costly to solve. Solving a technical debt takes time, and since it is invisible to the user and to management it is often considered useless by them.

Second, it is costly to work around. Technical debt almost always impacts how you develop new features (or maintain existing features) that depend on it. For instance, the technical debt can be a badly designed interface, which is counter-intuitive to learn to use. Another example is, that if a module is badly written, any update that is made within it will take more time than if it were well designed.

Technical debt inflation is the fact that the more you wait to solve a technical debt, the more these costs increase.

Indeed, the more you wait to solve a technical debt, the higher the impact of solving it will be (because you will have more pieces of code that depend on it, so it will be longer to refactor). Plus, the more a technical debt is outdated, the harder it is to use and maintain it.

Why quantify technical debt and technical debt inflation?

It is hard to evaluate the magnitude of the impact of technical debt. It is even harder to justify such evaluation to the ones that will approve -or not- such labor.

If we manage to design a model which puts numbers behind technical debt, justify the necessity of refactoring will be easier.

You could say things like “Yes, it would take three days to solve this technical debt now, but if we don’t, in two years it will cost an average of two hours per week for each developer, for a total of fifty days at the end of the second year…”. Maybe that could serve to put perspective into your manager’s eyes.

But I should say that the most important thing is not the numbers, but your arguments.

How to quantify technical debt?

Technical debt is costly to solve. The first step is to evaluate how much time it would cost to solve the technical debt today. Without that, we won’t be able to evaluate the inflation of this cost.

Fortunately, this is the easiest part. Based on your experience with your codebase, you should be able to evaluate how much time it would take you to perform the correction.

Usually, I would advise multiplying any reasonable evaluation by two or three to take unforeseen difficulties into account.

If you work within a team, it is customary to calculate this prediction as if it was the slowest dev of your team that would do the job (you might not be the one that performs the correction, and if you work faster than your coworker, your evaluation might be flawed).

With this quantification set, you can now evaluate the inflation of this technical debt.

How to quantify inflation?

The most important thing, about inflation, is that it is not linear.

In fact, since there are two ways that technical debt is costly (costly to solve and costly to use), and they both suffer inflation, then the technical debt inflation is (at least) quadratic¹. It is not proportional to the time, but to the time squared.

This is the single thing you should remember from this article, this is: TDI is quadratic.

Now, how can we evaluate this inflation?

A simple and usable indicator is the size of the code. The size of the code tends to increase with time, and if you manage to have a model extrapolated from how the size of the code increased in the past months/years, you will be able to predict how the size of the code will increase in the future.

I provide you with an example of how to evaluate the size of your code in the addenda.

Then, you take that evolution and apply a quadratic factor to it. This is what I call the Quadratic Expansion model.

Formalization of the Quadratic Expansion model

Let C₀ be the size of the whole code at t₀.
Let C₁ be the extrapolation of the size of the whole code at t₁.
Let D₀ be the estimated time needed to solve the technical debt at t₀.
Let D₁ be the evaluated time needed to solve the technical debt at t₁.
Let I₀₁ be the time wasted by the impact of the technical debt between t₀ and t₁
Let Δ₀₁ be the cumulated time the technical debt costs between t₀ and t₁.

C₀, C₁ and D₀ are known value.
D₁ and I₀₁ are intermediary values.
Δ₀₁ is the goal of the model.

D₁ = D₀ × C₁ ÷ C₀

I₀₁ = Λ × C₁ ÷ C₀, where Λ is a constant called the “lambda factor”².

Δ₀₁ = (I₀₁ × D₁) – D₀

Δ₀₁ = Λ × D₀ × (C₁ ÷ C₀)² – D₀

For simple calculus, you can assume Λ = 1 (we are looking for an order of magnitude, not a precise value), which gives

Δ₀₁ = D₀ × ( (C₁ ÷ C₀)² – 1 )

Example

You have a major technical debt to resolve that your manager is considering delaying for six months. You told them that it would cost more time to let be, and they are asking you to evaluate how much it would cost.

Today, your feature is composed of 216 hundred lines of code. Three months ago, it was composed of 178 hundred lines, so it grew by 38 hundred lines in three months. However, your four-dev team just has welcomed a newcomer and now has five devs, so the expected inflation within the next six months can be evaluated as 95 hundred more lines (38 × 2 × 1.25).

You estimate that solving the technical debt would take, at most, a whole week (5 days).

C₀ = 21.6k

C₁ = 31.1k

D₀ = 5 days

Δ₀₁ = D₀ × ( (C₁ ÷ C₀)² – 1 ) ≈ 5.4 days

Conclusion: according to the quadratic expansion model, the wait would cost about 5 and a half more days.

So you tell your manager that considering the teams’ productivity, waiting six more months will more than double the time lost on the technical debt (including solving and having to maintain a bad design).

Limits of this model

This model has huge limitations.

First, the calculus is not transitive. E.g. Δ₀₂ ≠ Δ₀₁ + Δ₁₂. This reflects the fact that the further we try to look (0 → 2) the more uncertain the technical debt cost will be. Mathematically, we should try to reflect that uncertainty in the model with a confidence interval.
Then, evaluating and extrapolating the size of the code is often feasible, but not trivial.
And, of course, this model has yet to be proven in real life³.

The million-dollar question

There would be one way to evaluate mathematically the TDI problem: by aggregating data over hundreds of projects over the years. But this is not a simple task, if not impossible. Here are the reasons why:

It would mean intruding into the code owned by private companies.
Even in retrospect, it’s hard to evaluate the impact of technical debt.
The study would take years if not decades to be fulfilled because technical debt takes years in terms of impact.

With that in mind, aggregating real data for a serious study seems impossible. But could we elaborate on a smaller model that would help us to solve the TDI problem? That needs more thought.

Wrapping up

I’ll tell it once again so there are no possible ambiguities: the Quadratic Expansion model is a limited, inaccurate, and unscientific way to evaluate technical debt inflation, but it gives a coherent order of magnitude and an argument in favor of early refactoring.

I hope that this will be the start of more serious studies about the TDI problem.

Remember that evaluating “lost” time of a living technical debt is not a trivial operation, and a live-testing evaluation model is impossible at a small scale.

But I hope this will help you get an order of magnitude of the cost of technical debt inflation.

Thanks for reading and see you next time!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

How to evaluate the size of your code? An example

With Git and a Linux shell, you can easily evaluate the current size of your codebase.

git ls-files allows you to list all files.

grep -E '\.(cpp|h|hpp)$' is a filter on source and header files.

wc -l counts the number of lines.

Here is the whole command to launch :

git ls-files | grep -E '\.(c|cpp|h|hpp)$' | xargs -d '\n' wc -l

(NB : xargs command allows us to feed the resulting output into the input of wc. The option -d '\n' is here to escape space in file paths)

Alternatively, you can use wc -m instead of wc -l to count the characters instead of the lines. It is a bit slower and a bit less intuitive, but I think it is a better metric than the line count.

To have a better output you can,

grep -E '^ *[0-9]+ total$' to only get the line with the total result.

sed -r 's/^ *([0-9]+) total$/\1/' removes the surrounding text to only keep the number.

The full command is now :

If you have several submodule, you can:

Add --recurse-submodules to recursively evaluate every submodule.

awk '{s+=$1} END {print s}' sums the values (which are individually reported for each submodule.

Final command line :

git ls-files --recurse-submodules | grep -E '\.(c|cpp|h|hpp)$' | xargs -d '\n' wc -l | grep -E '^ *[0-9]+ total$' | sed -r 's/^ *([0-9]+) total$/\1/' | awk '{s+=$1} END {print s}'

Notes

This is based on this idea : since there are two costs that increase with time and that these two costs are in close interaction (the consequences are intertwined), their combined cost is multiplicative (and not additive). That what makes, according to my view, the inflation quadrative.
Λ represents how much the rest of the code depends on the technical debt. The higher the impact the technical debt has on the rest of the code, the higher Λ will be, and the higher I₀₁ will be. However, as a matter of simplicity (and lack of better modeling), Λ is here considered constant.
Is it even possible to design a protocol that would allow us to evaluate the righteousness of any TDI model? Since we can only do one of two things (either solving it now or letting the debt inflate), there will always be uncertainty about the evaluation of the alternative. Plus, the time needed to solve a debt is dependent on one’s skill and, often, luck. In addition to that, the model claims to include risk, meaning the estimated inflation will be large because you can not know how bad a technical debt can grow. There is (in my knowledge) no way to verify this kind of abstract representation.

Author: Chloé Lourseyre

This article is based on a Lightning Talk I made during the CppCon 2020: How many languages a (C++) expert should speak ? – Thomas Lourseyre – CppCon 2020 – YouTube.

The speaker is a bit stressed and fumbles with words, but I’ll try and elaborate on what he wants to says.

Why learning other languages?

Assuming you are a C++ developer, I’ll explain the reasons why you would learn other languages using two quotes.

If all you have is a hammer, everything looks like a nail.
Maslow’s hammer

Say you are given the task to solve a programming problem/issue/dilemma. One of the first questions you will have to ask yourself is “What language should I use to solve this?”. If you only know C++, you can only answer “C++”. Obviously, C++ can not always be the perfect answer.

Even if you are not an expert in other languages, if you know the basics and can say “This language is better suited than C++” then all you’ll have to do is to find the appropriate expert. This is an insight you can’t have if you only know one language.

Those who cannot remember the past are condemned to repeat it.
George Santayana

The programming industry is not a pink paradise where all projects start from scratch and you have all the tools you want in their latest versions. Sometimes (or often, depending on your position in the industry) you will have to work with legacy code. Sometimes it will be written in C++ and you’ll have to port it to another language, and sometimes it will be written in another language and you will need to translate it to C++.

In both cases, this is work you can only do if you know other languages than C++.

Asking the real question

With that said, I can safely state that the real question is not “Should I learn other languages?” but rather the following:

How many other languages should I learn?

The language knowledge table

I designed a table that ranks the languages one knows depending on their knowledge of them:

Level of knowledge	Details	Personal examples
Expert level	It’s OK if you only know one.	C++
Practical level	Languages you tried and loved, and you practice regularly.	Python, C, Rust, Bash
Documentation level	Languages you practiced but didn’t like that much. You know how to fetch documentation though.	Java, C#, js, basic, perl
Hello word! level	You don’t know much about it…	Ruby, AS, Go, etc.

You may have noticed that my personal examples have evolved since last year – it’s a good thing.

The levels of knowledge

There are four levels of knowledge depicted in this table:

Expert level: These are the languages for which you are expertly reliable. A lot of developers have none, and most of the remaining have one. Some top-tier experts have several, but it requires to be very knowledgeable in both, so it’s quite rare. It is not a mandatory field and it’s OK if you have only one.
Practical level: These are the languages you practice regularly but you can not say you’re an expert of. Usually, these are the languages you love and use for your personal projects.
Documentation level: There are probably languages you tried to learn, practiced a bit but did not like. Maybe there are some languages you had to learn and practice for a specific project but immediately forgot afterward. These are languages you are not very knowledgeable in, but you know the basics and are able de to document yourself if needed.
Hello world! level: These are the languages you basically know nothing about. Languages that you know by name, but wouldn’t be able to give the core specification. Maybe you wrote a “Hello world!” someday, but not much more.

The only useful levels of knowledge are expert, practical, and documentation. As soon as you can at least know where to fetch documentation, you’ll be able to make something out of the language. The Hello world! level is useless in that regard.

How to promote a language?

So your goal will be to try to promote (in the table) as many languages as possible. But the efforts it takes to promote a language depend on the level of knowledge this language stands on:

From practical to expert: Becoming an expert in any field is time-consuming. You are not to become an expert unless you want to spend a lot of time on the language. You may want to, but it is a big investment of time you cannot do for the sole sake of being more polyvalent.
From documentation to practical: If you try and force yourself to practice a language you don’t love, there is a good chance you will hate the said language even more. You can’t force love, and even you may have a good surprise (and end up loving a language you didn’t like at first sight), over-practicing a language you dislike for the sake of being more polyvalent is just pointless and self-harm.
From Hello world! to documentation: What does it take to try and practice a new language to the point you make an opinion about it? In a matter of a few hours, you will be able to read the letter of intent of the language, learn the basics, and grasp how its documentation is organized. You may like it or not, but at least you will be a bit knowledgeable about it.

The most feasible promotion thus is from Hello world! to documentation level. If you are curious and try to learn the whys and hows of as many languages as you can, you’ll have a broader view of the world of programming.

So, how many languages should you speak?

The answer I provide you today, regarding all we said, is the following: as many as your curiosity can bring you to.

One of the main qualities of an expert (be it in C++ or not) is their curiosity. This is what pushes you to learn more, every day, about every subject. As long as you stay curious, you will learn and grow.

Don’t waste your curiosity on a single subject, curiosity is best employed in width that in depth.

On what basis should you learn and/or keep your knowledge of other languages?

The answer to that question is always closely dependent on the free time you can allocate to the activity of coding.

If you have an hour of free time you want to invest in development, pick a language you know by name but never practiced. Read the not of intent and write a simple program. This should be enough for you to know if you want to invest time into this language. If not, keep in mind where you can find the documentation in case you need it later (or bookmark it in your favorite browser, if you don’t want to remember it).

If a language appeals to you, then try to evaluate how much time you can spend training. You can follow tutorials, you can do exercises, you can develop new projects.

There is a good way to keep your knowledge of your favorite languages: Platforms like CodinGame offers training and challenges and include tons of different languages. Try it, it’s very addictive: Coding Games and Programming Challenges to Code Better (codingame.com).

Is it only restricted to C++ experts?

I can hear some of you sighing in front of your screen, “I’m not even a C++ expert, should I even bother learning another language?”.

All the arguments I detailed above are valid for both experts and non-experts. As a non-expert, in addition to that, you are (in my experience) more likely to switch to another language sooner or later. You could wait to get onto a project to learn the language it features, but then you’ll be taking the risk to end up coding in a language you dislike or don’t understand.

Knowing other languages not only makes you more appealing to those who seek new collaborators (most recruiters will prefer someone who trained a bit in the language over someone who never touched it), but you will have a small insight into what the language is all about.

Curiosity is good quality, in software development.

Thanks for your attention and see you next week!