a = b = c, a strange consequence of operator associativity

Author: Chloé Lourseyre
Editor: Peter Fordham

Case study

If you code in C++ regularly, you probably have encountered the following syntax:

class Foo;
Foo * make_Foo();

int main()
{

    Foo * my_foo;
    if (my_foo = make_Foo())
    {
        // ... Do things with the my_foo pointer
    }

    return 0;
}

In terms of semantics, this code is equivalent to the following:

class Foo;
Foo * make_Foo();

int main()
{

    Foo * my_foo = make_Foo();
    if (my_foo)
    {
        // ... Do things with the my_foo pointer
    }

    return 0;
}

This is today’s subject: assignment is an expression.

How does it work?

What is the value of such expression?

If we run the following code, we’ll have the answer:

int main()
{
    int foo = 2;
    std::cout << (foo = 3) << std::endl;
    return 0;
}

The standard output prints 3.

So we can say that the assignment expression is evaluated as the assigned variable, after it is assigned1.

One typo away from catastrophe

Let’s say that we have three variables, a, b and c. You want the value of a to be true if (and only if) b and c are equal.

So we will write this:

bool a, b, c;
// ...
a = b == c;

But, we are not very far away from a serious typo. This one:

bool a, b, c;
// ...
a = b = c;

This code will compile, and won’t give you the intended result. How come?

The expression a = b = c are two assignment operations within one expression. According to the C++ Operator Precedence Table, the associativity of = is from right to left. So the expression a = b = c is equivalent to a = (b = c).

Since (b = c) is evaluated (as seen earlier) as the value of b after assignment, a = b = c; is equivalent to b = c; a = b;

If you then use a as a boolean, it will be evaluated as true if (and only if) c is also true.

Conclusion about a = b = c

There may be cases where this syntax (with two = within a single expression) is useful, but most of the time I find it obtuse and confusing.

As of today, there is no way to efficiently prevent a typo like this from happening (adding parenthesis will not prevent the compilation in case of a typo is made). All you can do is open your eyes wide and use constant variables as much as possible (if b is const, then the code fails to compile if there is a typo)2.

The assignment operation returns an lvalue

I’ll finish this article by showing that the assignation operation is an lvalue.

Let’s take the a = b = c back and add parenthesis around the a = b.

int main()
{
    int a = 1, b = 2, c = 3;

    (a = b) = c;

    std::cout << a << b << c << std::endl;
    return 0;
}

This compiles and prints the following result: 323.

That means that a has been assigned the value of b, then the value of c. The expression a = b is indeed a lvalue.

void foo(int&);

int main()
{
    int a = 1, b = 2;

    foo(a = b); // Compiles because `a = b` is an lvalue
    foo(3); // Does not compile because `3` is an rvalue

    return 0;
}

More specifically, the assignment operation (of fundamental types) returns a reference to the resulting variable.

Assignment operation for user-defined types

However, when you define an operator=, the standard allows you to return any type you want (refer to the Canonical implementations section of operator overloading – cppreference.com for more details3).

You can of course return a reference to the assigned object, like so:

struct Foo
{
    Foo& operator=(const Foo&) { return *this; }
};

int main()
{
    Foo a, b, c;
    a = b = c;
    return 0;
}

You can also return a value instead of a reference:

struct Foo
{
    Foo operator=(const Foo&) { return *this; }
};

int main()
{
    Foo a, b, c;
    a = b = c; // Also works, but a copy is made
    return 0;
}

Since the result is copied, the assignment b = c becomes a rvalue. Indeed, if you now try to take a reference out of it, you have a compilation error:

struct Foo
{
    Foo operator=(const Foo& other) 
    { 
        val = other.val; 
        return *this; 
    }
    int val;
};

int main()
{
    Foo b = {1}, c = {2};
    Foo & a = b = c; // Does not compile because here, (b = c) is an rvalue
    return 0;
}

This code would compile if operator= returned a Foo& instead of a Foo.

You can also return nothing (using void as return value). In that case, you cannot write a = b = c at all.

struct Foo
{
    void operator=(const Foo&) {  }
};

int main()
{
    Foo a, b, c;
    a = b = c; // Does not compile because (b = c) returns nothing
    return 0;
}

This can be used as a good safeguard against the a = b = c syntax4.

About declarations

There are specific cases where you can write a declaration, within another statement (such as the assignment we have seen earlier).

You can use this specific syntax in most flow control statements (like if, while, switch and, of course for) and within function calls.

For instance, the very first example of this post can also be written like this5:

class Foo;
Foo * make_Foo();

int main()
{

    if (Foo * my_foo = make_Foo())
    {
        // ... Do things with the my_foo pointer
    }

    return 0;
}

However, the declaration itself is neither a lvalue nor a rvalue.

You can’t write this:

int main()
{
    int a = 1, c = 3;
    a = (int b = c); // Does not compile

    return 0;
}

nor this:

int main()
{
    int b = 2, c = 3;
    (int a = b) = c; // Does not compile

    return 0;
}

This is specified as “init-statement” in the standard, so when you see init-statement written in a prototype, you know you can put a declaration in there:

Wrapping up

Syntaxes like a = b = c and if (a = b) are intentional and clearly defined in the standard. However, they are alien to most developers and are so rarely used that they can easily be misleading.

Bugs can occur because the symbol = really looks like ==, so be wary of that. If you want to avoid it with your user-defined types, you can declare the operator= function returning void so the a = b = c syntax becomes invalid, but this is not possible with fundamental types, and is a constraint on its own.

Thanks for reading and see you next time!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Notes

  1. It is in fact evaluated as a reference to the variable, and not the value of the variable. This will be demonstrated further in the article.
  1. You can actually activate specific warnings to prevent specific cases (for instance, -Wparentheses can be used with GCC to avoid assignment inside a flow control statement), but that doesn’t cover every case (typically a = b = c cannot be warned out) and sometimes you may not want to activate them depending on your appreciation of this syntax.
  1. The site cppreference.com says that “for example, assignment operators return by reference to make it possible to write a = b = c = d, because the built-in operators allow that.”. However, there is no mention of this specific intention in the 4th edition of The C++ Programming Language by Bjarne Stroustrup. I suspect this is a free interpretation.
  1. You can, as you may have guessed, also return any type you want, if you have very specific needs. The prototype int operator=(const Foo&); (member function of class Foo) is valid. This can be useful, for instance, if you want to return an error code.
  1. There is a difference in pragmatics in terms of scope (that is not the subject of today’s), because in the first example, the my_foo variable only lives within the if block, whereas in the first examples, it lives through the scope of main. But since it’s technically the same in this specific case (because there is nothing after the if block), I deem it not necessary to elaborate.

How to quantify technical debt inflation

Author: Chloé Lourseyre
Editor: Peter Fordham

If you work for a software company, you necessarily end up in a situation where you have a technical debt to repay, but don’t have the approval of your management to do so now. “We’ll deal with it later”, they say. But, as a good developer, you know two things: technical debt is harder to solve the more we wait to solve it, and dormant technical debt has a cost that is added to everything that is written in the meantime.

When you try to argue “Technical debt is costly” to the said management, they answer “How much will it cost?”. But you don’t have an answer for that, as there is no way to predict the future.

This is what I call the Technical Debt Inflation problem or TDI for short.

Disclaimer: I could not solve the decades-old problem of the TDI in a 2k-words article. What is given here is a line of approach, a stepping stone to your own reflection, and is trying to open the debate. Enjoy.

What is technical debt?

Technical debt is (for the sake of this article) when you have a part of your code that is badly designed and needs refactoring to be efficient. It has no impact on the user but makes the code harder to maintain and harder for new features to be developed.

It often appears when you chose the short-term approach rather than a long-term solution. It would cost you more to write a long-term solution, so you chose the short-term one, although it will cost you more to maintain this solution in the future.

What is technical debt inflation?

Technical debt is costly in two ways.

First, it is costly to solve. Solving a technical debt takes time, and since it is invisible to the user and to management it is often considered useless by them.

Second, it is costly to work around. Technical debt almost always impacts how you develop new features (or maintain existing features) that depend on it. For instance, the technical debt can be a badly designed interface, which is counter-intuitive to learn to use. Another example is, that if a module is badly written, any update that is made within it will take more time than if it were well designed.

Technical debt inflation is the fact that the more you wait to solve a technical debt, the more these costs increase.

Indeed, the more you wait to solve a technical debt, the higher the impact of solving it will be (because you will have more pieces of code that depend on it, so it will be longer to refactor). Plus, the more a technical debt is outdated, the harder it is to use and maintain it.

Why quantify technical debt and technical debt inflation?

It is hard to evaluate the magnitude of the impact of technical debt. It is even harder to justify such evaluation to the ones that will approve -or not- such labor.

If we manage to design a model which puts numbers behind technical debt, justify the necessity of refactoring will be easier.

You could say things like “Yes, it would take three days to solve this technical debt now, but if we don’t, in two years it will cost an average of two hours per week for each developer, for a total of fifty days at the end of the second year…”. Maybe that could serve to put perspective into your manager’s eyes.

But I should say that the most important thing is not the numbers, but your arguments.

How to quantify technical debt?

Technical debt is costly to solve. The first step is to evaluate how much time it would cost to solve the technical debt today. Without that, we won’t be able to evaluate the inflation of this cost.

Fortunately, this is the easiest part. Based on your experience with your codebase, you should be able to evaluate how much time it would take you to perform the correction.

Usually, I would advise multiplying any reasonable evaluation by two or three to take unforeseen difficulties into account.

If you work within a team, it is customary to calculate this prediction as if it was the slowest dev of your team that would do the job (you might not be the one that performs the correction, and if you work faster than your coworker, your evaluation might be flawed).

With this quantification set, you can now evaluate the inflation of this technical debt.

How to quantify inflation?

The most important thing, about inflation, is that it is not linear.

In fact, since there are two ways that technical debt is costly (costly to solve and costly to use), and they both suffer inflation, then the technical debt inflation is (at least) quadratic1. It is not proportional to the time, but to the time squared.

This is the single thing you should remember from this article, this is: TDI is quadratic.

Now, how can we evaluate this inflation?

A simple and usable indicator is the size of the code. The size of the code tends to increase with time, and if you manage to have a model extrapolated from how the size of the code increased in the past months/years, you will be able to predict how the size of the code will increase in the future.

I provide you with an example of how to evaluate the size of your code in the addenda.

Then, you take that evolution and apply a quadratic factor to it. This is what I call the Quadratic Expansion model.

Formalization of the Quadratic Expansion model

Let C0 be the size of the whole code at t0.
Let C1 be the extrapolation of the size of the whole code at t1.
Let D0 be the estimated time needed to solve the technical debt at t0.
Let D1 be the evaluated time needed to solve the technical debt at t1.
Let I01 be the time wasted by the impact of the technical debt between t0 and t1
Let Δ01 be the cumulated time the technical debt costs between t0 and t1.

C0, C1 and D0 are known value.
D1 and I01 are intermediary values.
Δ01 is the goal of the model.

D1 = D0 × C1 ÷ C0

I01 = Λ × C1 ÷ C0, where Λ is a constant called the “lambda factor”2.

Δ01 = (I01 × D1) – D0

Δ01 = Λ × D0 × (C1 ÷ C0)2 – D0

For simple calculus, you can assume Λ = 1 (we are looking for an order of magnitude, not a precise value), which gives

Δ01 = D0 × ( (C1 ÷ C0)2 – 1 )

Example

You have a major technical debt to resolve that your manager is considering delaying for six months. You told them that it would cost more time to let be, and they are asking you to evaluate how much it would cost.

Today, your feature is composed of 216 hundred lines of code. Three months ago, it was composed of 178 hundred lines, so it grew by 38 hundred lines in three months. However, your four-dev team just has welcomed a newcomer and now has five devs, so the expected inflation within the next six months can be evaluated as 95 hundred more lines (38 × 2 × 1.25).

You estimate that solving the technical debt would take, at most, a whole week (5 days).

C0 = 21.6k

C1 = 31.1k

D0 = 5 days

Δ01 = D0 × ( (C1 ÷ C0)2 – 1 ) ≈ 5.4 days

Conclusion: according to the quadratic expansion model, the wait would cost about 5 and a half more days.

So you tell your manager that considering the teams’ productivity, waiting six more months will more than double the time lost on the technical debt (including solving and having to maintain a bad design).

Limits of this model

This model has huge limitations.

  • First, the calculus is not transitive. E.g. Δ02 ≠ Δ01 + Δ12. This reflects the fact that the further we try to look (0 → 2) the more uncertain the technical debt cost will be. Mathematically, we should try to reflect that uncertainty in the model with a confidence interval.
  • Then, evaluating and extrapolating the size of the code is often feasible, but not trivial.
  • And, of course, this model has yet to be proven in real life3.

The million-dollar question

There would be one way to evaluate mathematically the TDI problem: by aggregating data over hundreds of projects over the years. But this is not a simple task, if not impossible. Here are the reasons why:

  • It would mean intruding into the code owned by private companies.
  • Even in retrospect, it’s hard to evaluate the impact of technical debt.
  • The study would take years if not decades to be fulfilled because technical debt takes years in terms of impact.

With that in mind, aggregating real data for a serious study seems impossible. But could we elaborate on a smaller model that would help us to solve the TDI problem? That needs more thought.

Wrapping up

I’ll tell it once again so there are no possible ambiguities: the Quadratic Expansion model is a limited, inaccurate, and unscientific way to evaluate technical debt inflation, but it gives a coherent order of magnitude and an argument in favor of early refactoring.

I hope that this will be the start of more serious studies about the TDI problem.

Remember that evaluating “lost” time of a living technical debt is not a trivial operation, and a live-testing evaluation model is impossible at a small scale.

But I hope this will help you get an order of magnitude of the cost of technical debt inflation.

Thanks for reading and see you next time!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

How to evaluate the size of your code? An example

With Git and a Linux shell, you can easily evaluate the current size of your codebase.

git ls-files allows you to list all files.

grep -E '\.(cpp|h|hpp)$' is a filter on source and header files.

wc -l counts the number of lines.

Here is the whole command to launch :

git ls-files | grep -E '\.(c|cpp|h|hpp)$' | xargs -d '\n' wc -l

(NB : xargs command allows us to feed the resulting output into the input of wc. The option -d '\n' is here to escape space in file paths)

Alternatively, you can use wc -m instead of wc -l to count the characters instead of the lines. It is a bit slower and a bit less intuitive, but I think it is a better metric than the line count.

To have a better output you can,

grep -E '^ *[0-9]+ total$' to only get the line with the total result.

sed -r 's/^ *([0-9]+) total$/\1/' removes the surrounding text to only keep the number.

The full command is now :

git ls-files | grep -E '\.(c|cpp|h|hpp)$' | xargs -d '\n' wc -l | grep -E '^ *[0-9]+ total$' | sed -r 's/^ *([0-9]+) total$/\1/'

If you have several submodule, you can:

Add --recurse-submodules to recursively evaluate every submodule.

awk '{s+=$1} END {print s}' sums the values (which are individually reported for each submodule.

Final command line :

git ls-files --recurse-submodules | grep -E '\.(c|cpp|h|hpp)$' | xargs -d '\n' wc -l | grep -E '^ *[0-9]+ total$' | sed -r 's/^ *([0-9]+) total$/\1/' | awk '{s+=$1} END {print s}'

Notes

  1. This is based on this idea : since there are two costs that increase with time and that these two costs are in close interaction (the consequences are intertwined), their combined cost is multiplicative (and not additive). That what makes, according to my view, the inflation quadrative.
  2. Λ represents how much the rest of the code depends on the technical debt. The higher the impact the technical debt has on the rest of the code, the higher Λ will be, and the higher I01 will be. However, as a matter of simplicity (and lack of better modeling), Λ is here considered constant.
  3. Is it even possible to design a protocol that would allow us to evaluate the righteousness of any TDI model? Since we can only do one of two things (either solving it now or letting the debt inflate), there will always be uncertainty about the evaluation of the alternative. Plus, the time needed to solve a debt is dependent on one’s skill and, often, luck. In addition to that, the model claims to include risk, meaning the estimated inflation will be large because you can not know how bad a technical debt can grow. There is (in my knowledge) no way to verify this kind of abstract representation.

Who owns the memory?

Author: Chloé Lourseyre
Editor: Peter Fordham

Have you ever heard of “ownership” of memory, in C++, when speaking about pointers?

When you use raw pointers, you have to delete them (or else, you’ll leak memory). But if the said pointer is passed down functions and complex features, or returned by a factory, you have to know whose job it is to delete.

Ownership means “responsibility to cleanup”. The owner of the memory is the one who has to delete its pointer.

Deletion can either be explicit (through the keyword delete of the function free() regarding raw pointers) or bound to the lifetime of an object (through smart pointers and RAII1).

In this article, the term “pointer” will refer to both raw and smart pointers.

The problem of memory ownership

The problem is addressed by this post: You Can Stop Writing Comments About Pointer Ownership (gpfault.net).

TL;DR: smart pointer allows you to cover every case that could have used raw pointers, so don’t use raw pointers. Move semantics on smart pointers allow ownership rules to be checked at compile time.

The post is quite interesting, but misses a huge point: what do we do with existing raw pointers? What do we do when we are forced to use raw pointers2?

In the rest of the article, these are the questions I will try to answer.

First, when you have to use a feature that requires raw pointers, you have to ask yourself: how does the feature behave regarding the ownership of the resources?

Once that is done, we can enumerate four cases:

  • When you receive a pointer and the ownership
  • When you receive a pointer but not the ownership
  • When you transfer a pointer but not the ownership
  • When you transfer a pointer and the ownership

When you receive a pointer and the ownership

This one’s probably the easiest. Since we can construct a std::unique_ptr or std::shared_ptr using a raw pointer, all we have to do is put the received raw pointer into any smart pointer and it will be properly destroyed.

Example

#include <memory>
#include <iostream>

struct Foo
{
    Foo() { std::cout << "Leak?" << std::endl; }
    ~Foo() { std::cout << "No leak" << std::endl; }
};

// We don't own this function, so we can't change the return value
Foo * make_Foo()
{
    return new Foo();
}

int main()
{
    std::unique_ptr<Foo> foo_ptr(make_Foo());
    // The instance of Foo is properly destroyed when function ends
    return 0;
}

The resulting output will be:

Leak?
No leak

When you receive a pointer but not the ownership

This case is a bit trickier. It does not happen often, but sometimes, for internal reasons (or just because it is badly conceived), a library gives you a pointer you must not delete.

Here, we can not use a smart pointer (like in the previous case) because it will delete the pointer.

For instance, in the following example, the class IntContainer creates a pointer to an int and deletes it when it is destroyed:

// We don't own this container, we can't change its interface
struct IntContainer
{
    IntContainer(): int_ptr(new int(0)) {}
    ~IntContainer() { delete int_ptr; }

    int * get_ptr() { return int_ptr; }

    int * int_ptr;
};

If we try to use an unique_ptr, like so:

int main()
{
    IntContainer int_cont;
    std::unique_ptr<int>(int_cont.get_ptr());
    // Double delete
    return 0;
}

We get undefined behavior. With my compiler (GCC 11.2) I got a runtime exception: free(): double free detected in tcache 2

There is a simple solution to this problem. Instead of using a pointer, we can get a reference to the object pointed by the raw pointer returned by IntContainer. This way, we can access the pointed value just as we can with a pointer, without risking deleting it.

int main()
{
    IntContainer int_cont;
    int & int_ref = *int_cont.get_ptr();
    // We have access to the value of int_ptr via the reference
    return 0;
}

When you transfer a pointer but not the ownership

Some libraries require that you give them raw pointers. In most cases, you keep the ownership of that pointers, but the problem of having to pass a raw pointer exists.

There are 2 situations:

  • You have the object as a value or a reference.
  • You have a smart pointer to the object.

Situation 1: You have the object as a value or a reference

In that situation, all you have to do is use the & operator to pass the address of your object down to the feature that requires a raw pointer. Since it won’t try to delete it, nothing bad will happen.

#include <iostream>

struct Foo
{
    Foo() { std::cout << "Leak?" << std::endl; }
    ~Foo() { std::cout << "No leak" << std::endl; }
};

// Function that requires a raw pointer
void compute(Foo *)
{
    // ...
}

int main()
{
    Foo foo;
    // ...
    compute(&foo);
    return 0;
}

Situation 2: You have a smart pointer to the value

When all you have is a smart pointer to the object, you can use the member function get() to get the raw pointer associated with the smart pointer. Both std::unique_ptr and std::shared_ptr implement this function3.

#include <memory>
#include <iostream>

struct Foo
{
    Foo() { std::cout << "Leak?" << std::endl; }
    ~Foo() { std::cout << "No leak" << std::endl; }
};

// Function that requires a raw pointer
void compute(Foo *)
{
    // ...
}

int main()
{
    std::unique_ptr<Foo> foo_ptr = std::make_unique<Foo>();
    // ...
    compute(foo_ptr.get());
    return 0;
}

When you transfer a pointer and the ownership

Probably the rarest case of all4, but hypothetically there could be a feature that requires a raw pointer and take ownership.

Situation 1: You have the object as a value or a reference

If you have the plain object (or a reference), the only way to make it a raw pointer that the feature will be able to delete safely is by calling new.

However, just a new will copy the object and this is unwanted. Since the ownership is theoretically given to the feature, we can do a std::move on the object to call the move constructor (if existent) and avoid a perhaps heavy copy of the object

So we just need to call the feature that requires a raw pointer, with a new to create the desired pointer, in which we move the value.

#include <iostream>

struct Foo
{
    Foo() { std::cout << "Leak?" << std::endl; }
    Foo(const Foo &&) { std::cout << "Move constructor" << std::endl; }
    ~Foo() { std::cout << "No leak" << std::endl; }
};

void compute(Foo *foo)
{
    // ...
    delete foo;
}

int main()
{
    Foo foo;
    // ...
    compute(new Foo(std::move(foo)));
}

Situation 2: You have a smart pointer to the value

The get() member function does not give ownership, so if we use it to pass the raw pointer to the feature, we’ll have double deletion.

The member function release(), however, releases the ownership and returns the raw pointer. This is what we want to use here.

#include <iostream>
#include <memory>

struct Foo
{
    Foo() { std::cout << "Leak?" << std::endl; }
    ~Foo() { std::cout << "No leak" << std::endl; }
};

void compute(Foo *foo)
{
    // ...
    delete foo;
}

int main()
{
    std::unique_ptr<Foo> foo_ptr = std::make_unique<Foo>();
    // ...
    compute(foo_ptr.release());
    return 0;
}

The problem is that release() is only a member of unique_ptr. Shared pointers can have multiple instances pointing to the same value, therefore they don’t own the pointer in the first place.

How do I know the intended ownership?

This is key when you do this kind of refactoring, because misidentification of the intended ownership may lead to memory leaks or undefined behavior.

Usually, the feature documentation will point out whose job it is to cleanup memory.

How to deal with memory allocated with malloc?

The cases that are presented in this article only concern memory that is allocated with new and freed with delete.

But there will be rare cases where the feature you want to use will have recourse to malloc and free.

When a feature asks for a raw pointer and you have to delete it, this problem is non-existent (you have control of the allocation and the cleanup)

When a feature gives you a raw pointer (created using malloc) and you mustn’t delete it, you have nothing to do since it’s not your job to free the resource. You can work with a reference to the resource as is demonstrated earlier.

When a feature asks for a raw pointer and you mustn’t delete it (because it uses free on it), you will have to do the malloc yourself. If you have a smart pointer on the object, you will unfortunately still have to use malloc.

Last, when a feature gives you a raw pointer (created using malloc) and you have to delete it, this is the tricky part. The best way to do this is probably to use a unique_ptr, with a custom deleter as second template. The second template parameter of unique_ptr is a function object (i.e. a class that implements operator()) that will be called to free the memory. In our specific case, the deleter we will need to implement will simply call free. Here is an example:

#include <memory>
#include <iostream>

struct Foo {};

// We don't own this function, so we can't change the return value
Foo * make_Foo()
{
    return reinterpret_cast<Foo*>(malloc(sizeof(Foo)));
}

// This deleter is implemented fo Foo specifically, 
// but we could write a generic template deleter that calls free().
struct FooFreer
{
    void operator()(Foo* foo_ptr)
    {
        free(foo_ptr);
    }
};

int main()
{
    std::unique_ptr<Foo, FooFreer> foo_ptr(make_Foo());
    // The instance of Foo is properly destroyed when function ends
    return 0;
}

Wrapping up

Here is a table summarizing what has been demonstrated here:

I receive a raw pointerI transfer a raw pointer
I must delete itStore it in a unique_ptr or in a shared_ptrUse the & operator or the .get() member function
I mustn’t delete itGet a reference to the objectUse the new operator and move or the .release() member function

With these tools, you can safely remove raw pointers from your code, even when some libraries still use them

The solutions proposed are very simple, but it is critical to identify which one to use in each situation. The main problem with this method is that the refactorer has to correctly identify ownership (but that’s a problem that cannot be avoided).

Thanks for reading and see you next time!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Notes

  1. In case you didn’t know, RAII is an essential technique in modern C++, that is about resource acquisition and release. We often say that “something is RAII” when it cleans up perfectly and can not lead to memory leaks, as soon as it is released from the stack. For instance, raw pointers are not RAII because if you forget to delete them, you have a memory leak. On the contrary, std::string and std::vector are RAII because they release their internal allocations as soon as their destructor is called.
  2. It is sometimes hard for some developers to understand how can something can be “enforced” in your code. Here are a few situations when it occurs:
    – When you arrive on an existing project. You can’t refactor everything, on your own. You have to adapt and take you time to change things.
    – When parts of the code are not in your jurisdiction. On many projects, some essential code is developed by another team, in which you can’t meddle.
    – When you have to prioritize refactoring. You can’t always rewrite everything you’d like to at once. You have to choose your battles.
    – When management doesn’t give you approval or budget for the refactoring you are asking for. It happens, and there is nothing much you can do about it.
  3. No case presented in this article can work with a std::weak_ptr.
  4. While writing this article, I could not find, either on the Internet or in my memory, one single example of a feature that requires a pointer, then deletes it for you.

3 interesting behaviors of C++ casts

Author: Chloé Lourseyre
Editor: Peter Fordham

This article is a little compilation1 of strange behaviors in C++, that would not make a long enough article on their own.

Static casting an object into their own type can call the copy constructor

When you use static_cast, by defaut (i.e. without optimizations activated) it calls the conversion constructor of the object you are trying to cast into (if it exists).

For instance, in this code.

class Foo;
class Bar;

int main()
{
    Bar bar;
    static_cast<Foo>(bar);
}

The highlighted expression would call the following constructor (if existent): Foo(const Bar &).

So far so good, and there is a good chance that you already knew that.

But do you know what happens if you try to static cast an object into its own type?

Let’s take the following code:

struct Foo
{
    Foo(): vi(0), vf(0) {};
    Foo(const Foo & other): vi(other.vi), vf(other.vf) {};
    long vi;
    double vf;
};

int main()
{
    Foo foo1, foo2, foo3;
    foo2 = foo1;    
    foo3 = static_cast<Foo>(foo1);

    return 0;
}

And look at the assembly of the highlighted lines

Line 12

        mov     rax, QWORD PTR [rbp-32]
        mov     rdx, QWORD PTR [rbp-24]
        mov     QWORD PTR [rbp-48], rax
        mov     QWORD PTR [rbp-40], rdx

Line 13

        lea     rdx, [rbp-32]
        lea     rax, [rbp-16]
        mov     rsi, rdx
        mov     rdi, rax
        call    Foo::Foo(Foo const&) [complete object constructor]
        mov     rax, QWORD PTR [rbp-16]
        mov     rdx, QWORD PTR [rbp-8]
        mov     QWORD PTR [rbp-64], rax
        mov     QWORD PTR [rbp-56], rdx

We can see that when we static cast the object foo1, it calls the copy constructor of Foo as if the copy constructor was actually a “conversion constructor of a type into itself”.

(Done using GCC 11.2 x86-64, Compiler Explorer (godbolt.org))

Of course, this behavior will disappear as soon as you put an optimization option in the compiler.

This is typically useless knowledge2 and something you doesn’t encounter often in real life (I happen to have encountered it once, but this was an unfortunate accident)

Static casts can call several conversion constructors

Talking about conversion constructors, they can be transitive when static_cast is used.

Take the following classes:

struct Foo
{  Foo() {};  };

struct Bar
{  Bar(const Foo & other) {};  };

struct FooBar
{  FooBar(const Bar & other) {};  };

struct BarFoo
{  BarFoo(const FooBar & other) {};  };

We have four types: Foo, Bar, FooBar, and BarFoo. The conversion constructors say we can convert a Foo into a Bar, a Bar into a FooBar, and a FooBar into a BarFoo.

If we try to execute the following code:

int main()
{
    Foo foo;
    BarFoo barfoo = foo;
    return 0;
}

There is a compilation error on line 4: conversion from 'Foo' to non-scalar type 'BarFoo' requested.

However, if we static_cast foo into a FooBar, as such:

int main()
{
    Foo foo;
    BarFoo barfoo = static_cast<FooBar>(foo);
    return 0;
}

The program compiles.

If we now take a look at the assembly code associated with line 4:

        lea     rdx, [rbp-3]
        lea     rax, [rbp-1]
        mov     rsi, rdx
        mov     rdi, rax
        call    Bar::Bar(Foo const&) [complete object constructor]
        lea     rdx, [rbp-1]
        lea     rax, [rbp-2]
        mov     rsi, rdx
        mov     rdi, rax
        call    FooBar::FooBar(Bar const&) [complete object constructor]
        lea     rdx, [rbp-2]
        lea     rax, [rbp-4]
        mov     rsi, rdx
        mov     rdi, rax
        call    BarFoo::BarFoo(FooBar const&) [complete object constructor]

There are no less than 3 conversions generated by that single statement.

(Done using GCC 11.2 x86-64, Compiler Explorer (godbolt.org))

Hold up!

You may be wondering why I didn’t cast foo into a BarFoo and I only cast it into a FooBar using the static_cast.

If we try and compile the following code:

int main()
{
    Foo foo;
    BarFoo barfoo = static_cast<BarFoo>(foo);
    return 0;
}

We end up with a compilation error!

<source>:16:44: error: no matching function for call to 'BarFoo::BarFoo(Foo&)'

In fact, static_cast is not transitive

What really happens is the following:

The expression static_cast<FooBar>(foo) tries to call the following constructor: FooBar(const Foo&). However, it doesn’t exist, the only conversion constructor FooBar has is FooBar(const Bar&). But, there is a conversion available from Foo to Bar, so the compiler implicitly converts foo into a Bar to call the FooBar(const Bar&).

Then we try to assign the resulting FooBar to a BarFoo. Or, more precisely, we try to construct a BarFoo using a FooBar, which calls the BarFoo(const FooBar&) constructor.

That is why there is a compilation error when we try to cast a Foo directly into a BarFoo.

In fact, static_cast is not really transitive.

What to do with this information?

Implicit conversion can happen anywhere. Since static_cast (and any cast) is, pragmatically3, a “function call” (in the sense that it takes an argument and returns a value) it gives two opportunities for the compiler to try an implicit conversion.

The behavior of C-style casts

Using C-style casts is a fairly widespread bad practice in C++. It really should have made into this old article A list of bad practices commonly seen in industrial projects.

Many C++ developers don’t understand the intricacies of what C-style casts actually do.

How do casts work in C?

If I remember right, casts in C has three uses.

First, they can convert one scalar type into another, like this:

int toto = 42;
printf("%f\n", (double)toto);

But this can only be used to convert scalar type. If we try to convert a C struct into another using a cast:

#include <stdio.h>

typedef struct Foo
{
    int toto;
    long tata;
} Foo;

typedef struct Bar
{
    long toto;
    double tata;
} Bar;


int main()
{
    Foo foo;
    foo.toto = 42;
    foo.tata = 666;
    
    Bar bar = (Bar)foo;
    
    printf("%l %d", bar.toto, bar.tata);

    return 0;
}

We obtain the following compilation error:

main.c:22:5: error: conversion to non-scalar type requested
   22 |     Bar bar = (Bar)foo;
      | 

(Source: GDB online Debugger | Code, Compile, Run, Debug online C, C++ (onlinegdb.com))

Second, they can be used to reinterpret a pointer into a pointer of another type, like this:

#include <stdio.h>

typedef struct Foo
{
    int toto;
    long tata;
    int tutu;
} Foo;

typedef struct Bar
{
    long toto;
    int tata;
    int tutu;
} Bar;


int main()
{
    Foo foo;
    foo.toto = 42;
    foo.tata = 666;
    foo.tutu = 1515;
    
    Bar* bar = (Bar*)&foo;
    
    printf("%ld %d %d", bar->toto, bar->tata, bar->tutu);

    return 0;
}

This prints the following output4:

42 666 0

(Source: GDB online Debugger | Code, Compile, Run, Debug online C, C++ (onlinegdb.com))

And finally, it can be used to add or remove a const qualifier:

#include <stdio.h>

int main()
{
    const int toto = 1;
    int * tata = (int*)(&toto);
    *tata = 42;
    
    printf("%d", toto);

    return 0;
}

This prints 42.

(Source: GDB online Debugger | Code, Compile, Run, Debug online C, C++ (onlinegdb.com))

This also works on structs.

And that’s pretty much all5.

So what happens in C++

C++ has its own cast operators (mainly static_cast, dynamic_cast, const_cast, and reinterpret_cast, but also many other casts like *_pointer_cast, etc.)

But C++ was also intended to be backward-compatible with C (at first). So we needed a way to implement the C-style casts so that they would work similarly to C casts, all in the C++ new way of casting.

So in C++, when you do a C-style cast, the compiler tries each one of the five following cast operations (in that order), and uses the first that works:

  • const_cast
  • static_cast
  • static_cast followed by const_cast
  • reinterpret_cast
  • reinterpret_cast followed by const_cast

More details here: Explicit type conversion – cppreference.com.

Why this is actually bad?

Most C++ developers agree that it is really bad practice to use C-style casts in C++. Here are the reasons why: it is not explicit what the compiler will do. The C-style cast will often work, even when there is an error, and silence that error. You always want only one of these casts, so you should explicitly call it. That way, if there is any mistake there’s a good chance the compiler will catch it. Objectively, there are absolutely no upsides to using a C-style cast.

Here is a longer argument against C-style casts: Coding Standards, C++ FAQ (isocpp.org).

Wrapping up

Casting is a delicate operation. It can be costly (more than you think because it gives room for implicit conversions) and still today, there are a lot of people using C-style without knowing how bad it is.

It is tedious, but we need to understand of casts work and the specificities of each one.

Thanks for your attention and see you next time!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Static casting an object into their own type can call the copy constructor

Static casts can call several conversion constructors

The behavior of C-style casts

Notes

  1. Pun intended.
  2. If you know a case where it is useful, please share in the comments.
  3. In the linguistic field, pragmatics is the study of context (complementary to semantics, which studies the meaning, and many other fields). In terms of programming language, pragmatics can be interpreted as how features interact with others in a given context. In our example, a static_cast can hardly be considered a function call in the semantics sense, but act as one in the interactions it has with its direct environment (as it is explained in the paragraph). The technical truth is in-between: for POD it is not a function call, but for classes that define a copy-constructor it is.
  4. I won’t explain in detail why it prints 0 instead of 1515 for the value of tutu: just know that because we reinterpret the data stored in memory, reading a Foo as if it was a Bar leads to errors.
  5. I am not as fluent in C as I am in C++. I may have forgotten another use of C casts. If so, please contribute in the comments.

Constant references are not always your friends

Author: Chloé Lourseyre
Editor: Peter Fordham

Early on, when we teach modern C++, we teach that every non-small1 data should be passed, by default, as constant reference:

void my_function(const MyType & arg);

This avoids the copy of these parameters in situations where they don’t need to be copied.

Other situations call for other needs, but today we will focus on constant references.

I found out that people tend to over-do const refs, thinking they are the best choice in every situation, and should be used everywhere they can be used.

But are they always better than the alternatives? What are the dangers and hidden traps within them?

NB: In the whole article, I use “constant reference” (or the shorter “const ref”) for what is, really, a reference to a constant. This is a convention that, though technically inaccurate, is way more practical.

First situation: const ref as a parameter

This is kind of a textbook case, involving a non-optimal usage of a const ref.

Take this class:

struct MyString
{
     // Empty constructor
    MyString()
    { std::cout << "Ctor called" << std::endl; }
    
    // Cast constructor
    MyString(const char * s): saved_string(s) 
    { std::cout << "Cast ctor called" << std::endl; }
    
    std::string saved_string;
};

This is basically a std::string wrapper, with outputs to see if and when the constructors are called. We will use it to see if there are unnecessary calls to constructors and if there are any implicit conversions3. From now on, we’ll consider that constructing a MyString is heavy and unwanted.

Using a constant reference

Let’s take a function that takes a constant reference to MyString as a parameter:

void foo(const MyString &)
{
    // ...
}

And now, let’s call it with, let’s say, a literal string:

int main()
{
    foo("toto");
}

It compiles, it works, and it prompts the following message on the standard output:

Cast ctor called

The cast constructor is called. How come?

The thing is, const MyString & can’t refer to the "toto" we pass down to foo(), because "toto" is a const char[]. So, naively, it shouldn’t compile. However, since the reference is constant, and so won’t modify the source object, the compiler reckons it can be copied, somewhere in memory, with the correct type. Thus, it performs an implicit conversion.

This is not neat, because this conversion is heavy for a lot of types, and in the collective unconscious passing down a const ref does not copy the object. It’s the fact that it is implicit (thus not clear) that is thus unwelcome.

Using the explicit keyword

In C++, we can use the explicit keyword to specify that a constructor or a conversion function cannot be used implicitly.

explicit MyString(const char * s): saved_string(s) 
{ std::cout << "Cast ctor called" << std::endl; }

With that keyword, you cannot use the foo() function with a literal string anymore:

foo("toto"); // Does not compile

You have to cast it:

foo(static_cast<MyString>("toto")); // Does compile

However, there is a major downside: you can’t use explicit on STD types (such as std::string) or types you import from external libraries. How can we work around that?

Using a plain reference

Let’s put aside the explicit keyword and consider that MyString in external and can not be edited.

We’ll tune the foo() function so that the reference it takes as a parameter is not constant anymore:

void foo(MyString &)
{
    // ...
}

So what happens now? If we try to call foo() with a literal string we got the following compilation error:

main.cpp: In function 'int main()':
main.cpp:24:9: error: cannot bind non-const lvalue reference of type 'MyString&' to an rvalue of type 'MyString'
   24 |     foo("toto");
      |         ^~~~~~
main.cpp:11:5: note:   after user-defined conversion: 'MyString::MyString(const char*)'
   11 |     MyString(const char * s): saved_string(s)
      |     ^~~~~~~~
main.cpp:17:10: note:   initializing argument 1 of 'void foo(MyString&)'
   17 | void foo(MyString &)
      |          ^~~~~~~~~~

Here, the compiler cannot perform an implicit conversion any more. Because the reference is not constant, and thus may be modified within the function, it cannot copy and convert the object.

This is actually a good thing, because it warns us that we are trying to perform a conversion and asks us to explicitly perform the conversion.

If we want this code to work, we do have to call the cast constructor4 explicitly:

int main()
{
    MyString my_string("toto");
    foo(my_string);
}

This compiles, and gives us the following message on the standard output:

Cast ctor called

But this is better than the first time, because here the cast constructor is called explicitly. Anyone who reads the code knows that the constructor is called.

However plain references have downsides. For one, it discards the const-qualifier.

Using template specialization

Finally, an other way to prevent implicit conversion is to use template specialization:

template<typename T>
void foo(T&) = delete;

template<>
void foo(const MyString& bar)
{
    // …
}

With this code, when you try to call foo() with anything that isn’t a MyString, you’ll try to call the generic templated overload of foo(). However, this function is deleted and will cause a compilation error.

If you call it with a MyString, though, it is the specialization that will be called. Thus, you’ll be sure that no implicit conversion can be done.

Conclusion of the first situation

Sometimes, constant references can perform implicit conversions. Depending on the type and the context, this may be undesirable.

To avoid that, you can use the explicit keyword. This forbids implicit conversion.

When you can’t use explicit (because you need it on an external type), you can use a plain reference instead or a template specialization as seen above, but both have implications.

Second situation: const ref as an attribute

Let’s take (again) a wrapper to a std::string, but this time, instead of storing the object, we’ll store a constant reference to the object:

struct MyString
{    
    // Cast constructor
    MyString(const std::string & s): saved_string(s) {}
    
    const std::string & saved_string;
};

Using a constant reference stored in an object

Let’s use it now, and see if it works:

int main()
{
    std::string s = "Toto";
    MyString my_string(s);

    std::cout << my_string.saved_string << std::endl;
    
    return 0;
}

With that code, we get the following standard output:

Toto

So this seems to work fine. However, if we try to edit the string from outside the function, like this:

int main()
{
    std::string s = "Toto";
    MyString my_string(s);

    s = "Tata";

    std::cout << my_string.saved_string << std::endl;
    
    return 0;
}

The output changes to that:

Tata

It seems that the fact that we stored a constant reference does not mean the value cannot be modified. In fact, it means that it can not be modified by the class. This is a huge difference that can be misleading.

Trying to reassign a constant reference

With that in mind, you might want to try and reassign the reference stored in the object rather than modifying its value.

But in C++, you can’t reseat a reference. As it is said in the IsoCpp wiki: Can you reseat a reference? No way. (Source: References, C++ FAQ (isocpp.org)).

So beware, because if you write something like this:

int main()
{
    std::string s = "Toto";
    MyString my_string(s);

    std::string s_2 = "Tata";
    my_string.saved_string = s_2;

    std::cout << my_string.saved_string << std::endl;
    
    return 0;
}

This won’t compile, because you are not trying to reseat my_string.saved_string to the reference of s_2, you are actually trying to assign the value of s_2 to the object my_string.saved_string refers to, which is constant from MyString‘s point of view (and thus can’t be assigned).

If you try to work around that and de-constify the reference stored inside MyString, you may end up with this code:

struct MyString
{    
    // Cast constructor
    MyString(std::string & s): saved_string(s) {}
    
    std::string & saved_string;
};

int main()
{
    std::string s = "Toto";
    MyString my_string(s);

    std::string s_2 = "Tata";
    my_string.saved_string = s_2;

    std::cout << my_string.saved_string << std::endl;
    
    return 0;
}

The output is, as expected, Tata. However, try and print the value of s and you’ll have a little surprise:

std::cout << s << std::endl;

And you’ll see that it prints Tata again!

Indeed, as I said, by doing that you do try to reassign the value referred by my_string.saved_string, which is a reference to s. So by reassigning my_string.saved_string you reassign s.

Conclusion of the second situation

In the end, the keyword const for the member variable const std::string & saved_string; does not mean “saved_string won’t be modified”, it actually means that “a MyString can’t modify the value of its saved_string“. Beware, because const does not always mean what you think it means.

Types that should be passed by value and not by reference

Using constant references is also a bad practice for some types.

Indeed, some types are small enough that passing by const ref instead of passing by value is actually not an optimization.

Here are examples of types that should not be passed by const ref:

  • int (and short, long, float etc.)
  • pointers
  • std::pair<int,int> (any pair of smal types)
  • std::span
  • std::string_view
  • … and any type that is cheap to copy

The fact that these types are cheap to copy tells us that we can pass-by-copy, but it does doesn’t tell us why we should pass them by copy.

There are three reasons why. These three reasons are detailed by Arthur O’Dwyer in the following post: Three reasons to pass `std::string_view` by value – Arthur O’Dwyer – Stuff mostly about C++ (quuxplusone.github.io)

Short version:

  1. Eliminate a pointer indirection in the callee. Passing by reference forces the object to have an address. Passing by value enables the possibility to pass using only registries.
  2. Eliminate a spill in the caller. Passing by value and using registries sometimes eliminates the need for a stack frame in the caller.
  3. Eliminate aliasing. Giving a value (i.e. a brand new object) to the callee give it greater opportunities for optimization.

Wrapping up

Here are two dangers of constant references:

  • They can provoke implicit conversions.
  • When stored in a class, they can still be modified from the outside.

Nothing is inherently good or bad — thus nothing is inherently better or worse.

Most of the time, using constant references to pass down non-small parameters is best. But keep in mind that it has its own specificities and limits. That way, you’ll avoid the 1% situation where const refs are actually counter-productive.

They are several semantic meanings to the keyword const. Sometimes, you think it means something while in fact it means another thing. But I keep that for another article.

Thanks for reading and see you next time5!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Examples in Godbolt

First situation: const ref as a parameter: Compiler Explorer (godbolt.org) and Compiler Explorer (godbolt.org)

Second situation: const ref as an attribute: Compiler Explorer (godbolt.org)

Notes

  1. “Small data” refers, in that context, to PODs2 that are small enough to be passed down without losing efficiency — such as simple integers and floating values.
  2. POD means “Plain Old Data” and refers to data structures that are represented only as passive collections of field values without using object-oriented features.
  3. MyString is just a placeholder for heavier classes. There are classes -such as std::string that are costly to construct or copy.
  4. What I call “cast constructor” is the one with one parameter. These kinds of constructors are often called that way because it’s the ones that the static_cast use.
  5. Scientific accuracy has always been one of my goals. I don’t always reach it (I don’t often reach it) but I try to as much as I can. That’s why, from now on, I won’t say “See you next week” since according to the stats, I publish two-point-eight articles per month on average.

I don’t know which container to use (and at this point I’m too afraid to ask)

Author: Chloé Lourseyre
Editor: Peter Fordham

As far as containers go in C++, since std::vector is well suited for most cases (with the occasional std::map when you need key-value association1), it’s become easy to forget that there are other types of containers.

Each container has its strength and weaknesses, so if you are one to forget what its specificities are, this article is a good start.

Disclaimer: Not all C++ containers are listed, only the most usable ones in my opinion. If you want to go further, you’ll find two very useful links in the addenda, at the bottom of the page.

How to choose a container

Criteria

First™ criteria: sequence or associative?

The first question you have to ask yourself when you need a container is: it is a sequence container or an associative container?

In sequence containers, data is organized in an ordered and sequential way, with each value following the previous one. In memory, it doesn’t have to be contiguous (and often isn’t), but in practice, you access a value by knowing its index inside the container.

Unlike the sequence containers, associative containers do not store data as a sequence, but by associating a value to a key. Instead of using an index to refer to a stored value, you use its key.

Sequence containers criteria

  • Is the size of the container static? If so, an std::array is what you need (and it’s not really impacted by the other criteria). In any other case, you will need to answer other criteria.
  • Will the size of the container vary a lot? This criterion is important for memory allocation. If you make the size of a container vary a lot when it’s not designed for it, you may experience slowness and memory over-usage2.
  • Is the order is important? This criterion relates to data structures where you can add and erase value only at the beginning or at the end, while the middle remains hidden (FIFO and FILO structures).
  • Do you need to add/erase values at the extremities of the container (at the beginning or at the end)? Some containers do that with maximum efficiency.
  • Do you need to be efficient at inserting/deleting data in the middle of the structure? Same as the previous criteria, sometimes you do a lot of additions and deletions, but in the middle of the container. There are also structures efficient for that purpose.
  • Do you need to find the nth element? Depending on your container, search is not always optimal.
  • Do you need to merge collections? Depending on your container, structure merge is not always optimal either (some need additional allocation and some don’t).

Associative containers criteria

  • There are two kinds of associative arrays: the ones who associate a key with a value (possibly of different types), that’s maps, and the one for which the key is the value itself, that’s sets.
  • By default, the keys are unique. But there is a variant when keys can have multiple entries. These are the multimap/multiset.
  • The inner structures of these containers can be implemented in two ways. By default, they are ordered by key. But they can also be unordered and use hashable keys. These are the unordered_ version of each.

Note: the ordered/unordered distinction is important for associative containers. Most of the time, ordered associative containers use balanced binary trees while the unordered ones use hash tables. This have an impact on performance and on the fact that, for hash tables, you don’t need to implement an order operator).

Containers’ matrix

I present you with two matrices (one for sequence containers and one for associative containers).

Each matrix represents for which criteria it is best to use3.

Sequence containers

(for the queue, insertion is at the front and deletion at the back)

Associative containers

Flowchart cheat sheet

Here is an easy-to-read and printable flowchart that will help you choose which container to use in any situation4 : Joe Gibson’s data structure selection flowchart.

But what happens in real life?

As I said in the introduction of the article, most real-life container problems can be easily resolved with std::vector and std::map. With that in mind, to what extent is it useful to search for the “optimal” container every single time?

The semantics behind each container is linked to how you use it. When you see deque you’ll think “insert and erase at the front and the back”, when you see queue you’ll think “insert at the front and delete at the back” and when you see list you’ll think “insert and erase in the middle”.

We see each container for how we use them more than how it is efficient to use them (which is linked but not equivalent). But there are a lot of operations that are available in several containers. For instance: you can easily insert and erase values at the end of a vector, as much as in deque (each of them has the push_back and pop_back member functions).

But may want to say “But inserting and deleting at the end of a vector is strongly inefficient!”. In theory, yes. But in practice, it only matters if it is within a critical section of the code. Most of the time, if you are within the 80% of Pareto’s principle, using a vector instead of a deque won’t affect performance.

So let me ask you this: if using a vector works, and if it doesn’t affect performance, why would you use any other container?

Vectors are the most understandable structure because it is quite close to the plain-old arrays. Most C++ users aren’t experts, and std::vector is the container they know how to use best. We shouldn’t make mundane code any more difficult to read and understand.

Of course, as soon as you have special needs, you should use the most appropriate container, but that doesn’t happen very often.

(I took std::vector as an example here for sequential containers, but the same applies for std::map and associative containers)

Optimization?

Optimization must be an opt-in behavior, not an up-front behavior.

When you first write a piece of code, you should not think about performance. You should write it in the clearest way possible.

Only then, you may think of the impact your code has on global performances. You may do benchmarks, and static/dynamic analysis to know if your code needs be optimized and what is badly optimized (execution time? memory usage? etc.).

If the code you are writing is not in the 20% of Pareto’s principle, you should not think about optimization at all. If it is, you can think about it after you have written it.

A more efficient way to look at it

I present you with a flowchart cheat sheet (in the same spirit as Joe Gibson’s flowchart) that summarize what has been said in the previous section:

The principle is simple. You have to ask yourself two questions:

  1. Can I do what I want to do with a vector/map?
  2. Am I in a critical section of the code?

If your answers are 1) Yes and 2) No, then you should use std::vector or std::map.

Wrapping up

It is more than useful to know the differences between the C++ containers. However, in your everyday life, try not to think too much about it. It is a fairly common mistake to overthink situations that can have simple solutions. But this knowledge is nor wasted: every now and then, you’ll into a specific situation that’ll require specific containers.

Pareto’s principle can also apply to this: more than 80% of the tools we know are useful in less than 20% of the situations.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Sources and resources

The flowchart is from the following GitHub repo, where you can also find a more detailed view of each container: cpp-cheat-sheet/Data Structures and Algorithms.md at master · gibsjose/cpp-cheat-sheet · GitHub.

You will also find a neat cheat sheet, a bit more graphical and dense (along with other kinds of cheat sheet) here: C++ Cheat Sheets & Infographics | hacking C++ (hackingcpp.com).

Joe Gibson’s data structure selection flowchart

Joe Gibson’s data structure selection flowchart (source: GitHub – gibsjose/cpp-cheat-sheet: C++ Syntax, Data Structures, and Algorithms Cheat Sheet)

Notes

  1. Really, we can agree that std::unordered_map is best than map for basic key-value association. But in real life, I see too many indiscirminate use of map and almost no unordered_map.
  2. Plus, It’s not just the global size that matters, it is also the local maximum size, e.g. if a container will never exceed 20 entries in a critical section of the code we can pre-allocate locally to avoid unexpected reallocation, while maintaining the flexibility of using a variable size data structure.
  3. Of course, each container can perform most listed operation. But the matrices summarize which containers are best suited for each need.
  4. Unfortunatly, the presented flowchart lack any unordered_ associative container. But you can think of it like this: “Values need to be ordered? Yes -> map/set ; No -> unordered_map/unordered_set

Retrospective: The simplest error handler

Author: Chloé Lourseyre
Editor: Peter Fordham

This week, we’ll talk about last week’s article and try to be critical about it. There are a few things to say and it occurred to me (thanks to some feedback I got through social media) that it can be improved a lot.

If you don’t know about SimplestErrorHandler, go read the article about it: One of the simplest error handlers ever written | Belay the C++ (belaycpp.com).

The repo containing the feature is still available and is up-to-date with the changes I’ll present here: SenuaChloe/SimplestErrorHandler (github.com).

Version 2: recursion was useless

Indeed.

Sometimes, when you unpack a parameter pack, you may need to differentiate between the main loop of the recursion and the base case. This happens when you need to do more things in the main loop than in the base case.

For the ErrorHandler, the base case does the same thing as the main loop (plus some extras). So we actually don’t need any recursion and can directly unfold the parameter pack into the stream (just like it is done in the concept declaration):

template<typename TExceptionType = BasicException, typename ...TArgs>
requires ErrorHandlerTemplatedTypesConstraints<TExceptionType, TArgs...>
void raise_error(const TArgs & ...args)
{
    std::ostringstream oss;
    (oss << ... << args);
    const std::string error_str = oss.str();
    std::cerr << error_str << std::endl;
    throw TExceptionType(error_str);
}

Avoiding recursion is preferred when you can because recursion accumulates stack frames, which are best to avoid for several reasons1.

Thanks to this new form, we don’t need an auxiliary function (to pass down the std::ostringstream) anymore. Thus, we also don’t need private functions. Without private functions, there is no use for the class, so we can use a namespace instead. And since we now use a namespace, we can declare the concept inside it.

namespace ErrorHandler
{   
    template<typename TExceptionType, typename ...TArgs>
    concept TemplatedTypesConstraints = requires(std::string s, std::ostringstream oss, TArgs... args)
    {
        TExceptionType(s); // TExceptionType must be constructible using a std::string
        (oss << ... << args); // All args must be streamable
    };

    // ...

    template<typename TExceptionType = BasicException, typename ...TArgs>
    requires TemplatedTypesConstraints<TExceptionType, TArgs...>
    void raise_error(const TArgs & ...args)
    {
        // ...
    }

    template<typename TExceptionType = BasicException, typename ...TArgs>
    requires TemplatedTypesConstraints<TExceptionType, TArgs...>
    void assert(bool predicate, const TArgs & ...args)
    {
       // ...
    }
};

Discussion: Need for speed?

String streams are slow. That is a fact2. Plus, in our case, they aren’t especially practical to use (we need to declare a std::ostringstream locally, which means an additional #include, just for that). Is there a way to get rid of that?

The main reason string streams are slow is to-string conversions. However, to keep it as simple as we can (in our ErrorHandler), we want to let the std::ostringstream do its magic, even if it means slower-code.

Time performance is rarely critical (we can safely say that 80% of the time it is not critical). What we are developing is an error thrower. The only reason an error thrower would be inside a performance-sensitive piece of code is if we use it as control flow.

But that would be a mistake. It is called ErrorHandler after all, not FlowControlHandler. By design, it is not to be used in critical code. The only good way to use it in critical code is to get out of it in case of error.

So no, we won’t try to optimize the handler time-wise. We will keep it simple and short. We don’t need speed.

The full code

Here is the last version of the error handler:

#pragma once

#include <iostream>
#include <sstream>

namespace ErrorHandler
{   
    template<typename TExceptionType, typename ...TArgs>
    concept TemplatedTypesConstraints = requires(std::string s, std::ostringstream oss, TArgs... args)
    {
        TExceptionType(s); // TExceptionType must be constructible using a std::string
        (oss << ... << args); // All args must be streamable
    };

    class BasicException : public std::exception
    {
    protected:
        std::string m_what;
    public:
        BasicException(const std::string & what): m_what(what) {}
        BasicException(std::string && what): m_what(std::forward<std::string>(what)) {}
        const char * what() const noexcept override { return m_what.c_str(); };
    };

    template<typename TExceptionType = BasicException, typename ...TArgs>
    requires TemplatedTypesConstraints<TExceptionType, TArgs...>
    void raise_error(const TArgs & ...args)
    {
        std::ostringstream oss;
        (oss << ... << args);
        const std::string error_str = oss.str();
        std::cerr << error_str << std::endl;
        throw TExceptionType(error_str);
    }

    template<typename TExceptionType = BasicException, typename ...TArgs>
    requires TemplatedTypesConstraints<TExceptionType, TArgs...>
    void assert(bool predicate, const TArgs & ...args)
    {
        if (!predicate)
            raise_error<TExceptionType>(args...);
    }
};

Wrapping up

Version 2 of the ErrorHandler is even shorter and simpler than the first. This is a major improvement.

I want to give special thanks to the people who noticed that mistakes were made and suggested improvement.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Github repo

SenuaChloe/SimplestErrorHandler (github.com)

Notes

  1. Mainly to avoid stack overflow and make debugging more readable. At the end of the day, recursion is not really bad, especially in a case like this where stack overflow is unlikely to happen, but using a fold expression is shorter and clearer. Plus, there is also recursion at compile-time which slows down compilation and may even crash the compiler in improperly bounded cases. This shall never happen with a fold expression
  2. This is pretty hard to source as a fact since everyone prefers to get rid of streams instead of proving they are slow. Since I’m bad at benchmarks (but I’m working on it), I won’t develop that here. If you want to share your own benchmarks (either to prove me wrong or right), please do in the comments.

One of the simplest error handlers ever written

Author: Chloé Lourseyre
Editor: Peter Fordham

This week, I’ll present you a small device I wrote to handle basic errors, the most compact and generic I could think of.

It is certainly not perfect (mainly because perfection is subjective) but it is very light-weight and easy to use.

If you want to skip the article and go directly to the source, here is the Github repo: SenuaChloe/SimplestErrorHandler (github.com)

Specifications

In terms of error handling, my needs are usually as follow:

  • The error handler must write a message on the error output (std::cerr).
  • The error handler must be able to take several arguments and stream them into the error message.
  • The error handler must raise an exception.
  • The what() of the exception must return the same message that is prompted on std::cerr.
  • The specific type of exception raised must be configurable.
  • Raising an error must be a one-function call.
  • The error handler must not rely on macros.

So that’ll be the basic criteria I’ll be relying on to design the error handler (we may add a few specifications along the way).

Step 0: Setup

To make it very light and simple, all the code will be in a single header file (it’s simpler to include a header into your project than a lib). But since there will probably be auxiliary functions (that won’t be part of the interface), we need a way to hide them.

That’s why we will put all the code in a full-static class1. There will be private static member functions (for this internal functions), public static member functions (the interface), and possibly types and such.

Step 1: Basic recursion and variadic templates

Starting with a simple recursion

To have a fully customizable error message, we need a variable number of arguments (and thus some variadic templates). We will recurse over the arguments2, streaming each of them, starting with head, into a stream.

template<typename THead>
static void raise_error_recursion(const THead & arg_head)
{
    std::cerr << arg_head << std::endl;
    throw;
}

template<typename THead, typename ...TTail>
static void raise_error_recursion(const THead & arg_head, const TTail & ...arg_tail)
{
    std::cerr << arg_head;
    raise_error_recursion(arg_tail...);
}

The first raise_error_recursion represents the base condition of the recursion: if there is only one argument, then we print it then throw.

The second raise_error_recursion represents the recursion loop. As long as there is more than one argument in the arg_tail parameter pack, we call the second raise_error_recursion, which prints the arg_head into cerr and then calls itself back. As soon as there is only one parameter left in the parameter pack, we end up in the first overload that ends the recursion.

With a stream and a real exception

However, in the snippet just above, we don’t throw any exception, we just throw;. As a reminder, two of the specification were:

  • The error handler must raise an exception.
  • The what() of the exception must return the same message that is printed on std::cerr.

So we need to throw a real exception, and that exception must be constructed with our error message.

As an example, we’ll use the std::runtime_error exception, which can be constructed with a std::string.

The problem is then that we can’t just stream the error message into cerr anymore: we need a way to memorize the message to, at the end, stream it into cerr and construct our runtime_exception.

An solution to that is to add a stringstream as a parameter of the recursive functions.

template<typename THead>
static void raise_error_recursion(std::ostringstream & error_string_stream, const THead & arg_head)
{
    error_string_stream << arg_head;
    const std::string current_error_str = error_string_stream.str(); 

    std::cerr << current_error_str << std::endl;
    throw std::runtime_error(current_error_str);
}

template<typename THead, typename ...TTail>
static void raise_error_recursion(std::ostringstream & error_string_stream, const THead & arg_head, const TTail & ...arg_tail)
{
    error_string_stream << arg_head;
    raise_error_recursion(error_string_stream, arg_tail...);
}

There, in the body of the recursion, we stream the error message into the stringstream instead of cerr. In the base case of the recursion, we convert this stream into a string that is then used to construct the exception and to print the error output.

Is that it?

These two functions are the mainframe of the error handling. We don’t need anything more than that to fulfill most of the specifications.

But of course, it’d be great to add a few enhancements that’ll ease the use of the handler.

Step 2: Adding interface

This stringstream is a pain and should be invisible to the user. Thus, we’ll put the previous functions in the private part of the class and write a member function to be used as interface:

template<typename ...TArgs>
static void raise_error(const TArgs & ...args)
{
    std::ostringstream error_string_stream;
    raise_error_recursion(error_string_stream, args...);
}

raise_error is now a very simple function to use.

Step 3: Adding customizable exceptions

The exception as a template

The only spec that is not implemented is “The specific type exception raised must be configurable“.

To do that we will add a template to each function. This represents the exception that must be raised.

class ErrorHandler
{
    ErrorHandler(); // Private constructor -- this is a full-static class
    
    template<typename TExceptionType, typename THead>
    static void raise_error_recursion(std::ostringstream & error_string_stream, const THead & arg_head)
    {
        error_string_stream << arg_head;
        const std::string current_error_str = error_string_stream.str();

        std::cerr << current_error_str << std::endl;
        throw TExceptionType(current_error_str);
    }

    template<typename TExceptionType, typename THead, typename ...TTail>
    static void raise_error_recursion(std::ostringstream & error_string_stream, const THead & arg_head, const TTail & ...arg_tail)
    {
        error_string_stream << arg_head;
        raise_error_recursion<TExceptionType>(error_string_stream, arg_tail...);
    }

public:

    template<typename TExceptionType, typename ...TArgs>
    static void raise_error(const TArgs & ...args)
    {
        std::ostringstream error_string_stream;
        raise_error_recursion<TExceptionType>(error_string_stream, args...);
    }

    template<typename TExceptionType>
    static void raise_error()
    {
        raise_error<TExceptionType>("<Unknown error>");
    }
};

This way, you can call raise_error with any exception that is constructable with a std::string, like so:

ErrorHandler::raise_error<std::runtime_error>("Foo ", 42);

However, this is a bit heavy. Sometimes, you just wanna raise a generic error and don’t mind whether it is a runtime_error, an invalid_argument, etc.

That’s why we’ll add a default value for the template TException. Unfortunately, we can’t use std::exception for this default value because it can’t be constructed using a std::string.

What I suggest is to define our own generic exception, within the namespace of the ErrorHandler. This way, we’ll have a generic exception to be used as a default value, and users may use it as base class to implement custom exceptions, all related to error handling (which can be useful in try-catches).

A custom generic exception for the error handler

class BasicException : public std::exception
{
protected:
    std::string m_what;
public:
    BasicException(const std::string & what): m_what(what) {}
    BasicException(std::string && what): m_what(std::forward<std::string>(what)) {}
    const char * what() const noexcept override { return m_what.c_str(); };
};

Of course, there is a public inheritance of std::exception so that BasicException can be used like any other standard exception3.

I implemented two constructors, one that builds the error message using a constant reference string, and one that builds the error message using a r-value reference (to be able to move data into the constructor).

And, of course, the what() virtual overload that returns the error message.

Using this exception as default, the raise_error functions now look like this:

template<typename TExceptionType = BasicException, typename ...TArgs>
static void raise_error(const TArgs & ...args)
{
    std::ostringstream error_string_stream;
    raise_error_recursion<TExceptionType>(error_string_stream, args...);
}

template<typename TExceptionType = BasicException>
static void raise_error()
{
    raise_error<TExceptionType>("<Unknown error>");
}

Now you can raise an error without having to provide an exception:

ErrorHandler::raise_error("Foo ", 42);

This will throw a ErrorHandler::BasicException by default.

Step 4: Adding an assert one-liner

The most common situation when you have to raise an error is if <something is wrong> then <raise an error>. It can also be seen as assert <expression>, if false <raise an error>.

This is commonly encountered in unit-testing, functions that take the form of assert(expression, message_if_false);

That’s why I think it’s a good idea to add a single function that will take an expression and a parameter pack (the error message) and call raise_error if the expression is not true.

template<typename TExceptionType = BasicException, typename ...TArgs>
static void assert(bool predicate, const TArgs & ...args)
{
    if (!predicate)
        raise_error<TExceptionType>(args...);
}

Using this, instead of writing this:

bool result = compute_data(data);
if (result != ErroCode::NO_ERROR)
    ErrorHandler::raise_error("Error encountered while computing data. Error code is ", result);

You’ll be able to write something like this:

bool result = compute_data(data);
ErrorHandler::assert(result == ErroCode::NO_ERROR, "Error encountered while computing data. Error code is ", result);

Step 5: Concept and constraints

We use a lot of templates. Many templates mean that the user will be likely to misuse them. Leading to compilation errors. And when we talk about template-related compilation errors, we talk about almost illegible error messages.

But, lucky us, there is a way in C++20 to make these error more readable while protecting our functions better: concepts and constraints.

We currently have two constraints:

  • TExceptionType must with constructible using a std::string.
  • Every TArgs... must be streamable.

So we’ll implement these two constraints within a single concept4:

template<typename TExceptionType, typename ...TArgs>
concept ErrorHandlerTemplatedTypesConstraints = requires(std::string s, std::ostringstream oss, TArgs... args)
{
    TExceptionType(s); // TExceptionType must be constructible using a std::string
    (oss << ... << args); // All args must be streamable
};

We now only have to add this concept as a constraint on our interface member functions:

template<typename TExceptionType = BasicException, typename ...TArgs>
requires ErrorHandlerTemplatedTypesConstraints<TExceptionType, TArgs...>
static void raise_error(const TArgs & ...args)
{
    std::ostringstream error_string_stream;
    raise_error_recursion<TExceptionType>(error_string_stream, args...);
}

template<typename TExceptionType = BasicException, typename ...TArgs>
requires ErrorHandlerTemplatedTypesConstraints<TExceptionType, TArgs...>
static void assert(bool predicate, const TArgs & ...args)
{
    if (!predicate)
        raise_error<TExceptionType>(args...);
}

The complete code

If we put everything together, the resulting header file looks like this:

#pragma once

#include <iostream>
#include <sstream>

template<typename TExceptionType, typename ...TArgs>
concept ErrorHandlerTemplatedTypesConstraints = requires(std::string s, std::ostringstream oss, TArgs... args)
{
    TExceptionType(s); // TExceptionType must be constructible using a std::string
    (oss << ... << args); // All args must be streamable
};

class ErrorHandler
{
    ErrorHandler(); // Private constructor -- this is a full-static class
    
    template<typename TExceptionType, typename THead>
    static void raise_error_recursion(std::ostringstream & error_string_stream, const THead & arg_head)
    {
        error_string_stream << arg_head;
        const std::string current_error_str = error_string_stream.str();

        std::cerr << current_error_str << std::endl;
        throw TExceptionType(current_error_str);
    }

    template<typename TExceptionType, typename THead, typename ...TTail>
    static void raise_error_recursion(std::ostringstream & error_string_stream, const THead & arg_head, const TTail & ...arg_tail)
    {
        error_string_stream << arg_head;
        raise_error_recursion<TExceptionType>(error_string_stream, arg_tail...);
    }

public:

    class BasicException : public std::exception
    {
    protected:
        std::string m_what;
    public:
        BasicException(const std::string & what): m_what(what) {}
        BasicException(std::string && what): m_what(std::forward<std::string>(what)) {}
        const char * what() const noexcept override { return m_what.c_str(); };
    };

    template<typename TExceptionType = BasicException, typename ...TArgs>
    requires ErrorHandlerTemplatedTypesConstraints<TExceptionType, TArgs...>
    static void raise_error(const TArgs & ...args)
    {
        std::ostringstream error_string_stream;
        raise_error_recursion<TExceptionType>(error_string_stream, args...);
    }

    template<typename TExceptionType = BasicException, typename ...TArgs>
    requires ErrorHandlerTemplatedTypesConstraints<TExceptionType, TArgs...>
    static void assert(bool predicate, const TArgs & ...args)
    {
        if (!predicate)
            raise_error<TExceptionType>(args...);
    }
};

To go further

We could push the genericity of the handler a little further and try to replace the std::cerr output stream by a customizable output stream that takes std::cerr by default.

However, that would mean more functions, a longer code, and the goal is to keep the header as short as possible.

It’s up to you now to stop here or go further and complete the implementation.

Wrapping up

This is certainly not the most complete way to handle errors in your program, but this is, in my opinion, a simple and clean way to do it while fulfilling the established specifications.

Up to you now to define your own specifications and to write your own error handler if your needs are different than mine.

You can use this code (almost) as you wish, as it is under the CC0-1.0 License.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Github repo

SenuaChloe/SimplestErrorHandler (github.com)

Useful documentation

I used a lot of advanced features of C++. To learn more about them, follow the links:

Notes

  1. “Full-static” means that the class won’t be instantiable. All its member functions and member variable will be static, and the constructor will be private. That’s why we need a class and can’t use a namespace here: with a namespace, we couldn’t hide any auxiliary function.
    If you don’t want to use a full-static class and still want to hide the auxiliary functions, you’d have to put them into a cpp file. But if you do this, you need to compile the error handler as a lib in order to import it in other projects.
  2. To learn about recursion, read this page: Recursion – GeeksforGeeks. To learn about variadic parameters and templates, here you go: Variadic arguments – cppreference.com and Parameter pack(since C++11) – cppreference.com.
  3. See std::exception – cppreference.com to have some insight into how exceptions work in C++.
  4. The only small problem with that is that we have to implement the concept in the global namespace. That is why I used a pretty long name that begins with “ErrorHandler”: to avoid name collision as much as possible.

The three types of development

Author: Chloé Lourseyre
Editor: Peter Fordham

This week we’ll discuss a serious topic affecting the developer community. This touches several languages, but the C++ community is one of the most affected by it1.

There are several “ways” to write C++. I mean “way” as a collection of constraints and circumstances that will affect what you can do, what you should do, and how you can and should do it.

This may seem vague, but think of it as types of environments that can drastically change your approach to the code you reading, editing, and writing.

Based on my experience, I can distinguish three types of development2.

The three categories

The (almost) solo development

This is the type of development that has the fewest constraints (if not none at all). When you are developing alone or with very few collaborators, you can freely choose what to do and how you want to do it.

The collaborative licensed development

When you are on a bigger project, you will see constraints arise. Most of the time, these constraints will consist in which library you can or cannot use.

For instance, if you want to sell your software, you can’t use a JRL licensed software, because it prohibits commercial uses.

This is generally a type of development that concerns small companies or freelance developers.

The industrial development

Some projects are launched by big companies or company groups. They can be developed over numerous years (even decades if you include the maintenance phase of these projects), but more importantly they have heavy constraints overs which libraries you can use and in what environment the development takes place.

This is typically the type of development where the C++ version is the oldest (often prior to C++17, sometimes even in C++03). This is because the management (not to say the salesmen) that pilot the budget of this kind of project and decide whether the environment can be migrated or not.

A lot of developers that work on this kind of project arrive in the middle of it and face heavy resistance when they try to improve the environment3.

In this kind of project, you often have to deal with legacy code or with a part of the codebase that you can’t edit4.

What is specific to C++?

C++ is a complicated language, not only because of its syntax and language specification, but also because there are hundreds (if not thousands) of different possible environments.

There are dozens of C++ compilers, ported on numerous operating systems. As of today, there are 5 different versions of the standard5 that are present in professional projects.

It is thus essential for each C++ developer to adapt their advice to the person they are talking to. Because depending on the situation, you may say the complete opposite of what you would have said otherwise.

Clashing grounds

There is one place where the three types of development can be represented at the same time: on the internet. When you lurk on dedicated forums, you’ll eventually find people that are currently working on the different types of projects.

Overall this is a good thing, that all kinds of developers can meet over the internet, but it can lead to communication issues.

Indeed, if one developer who has only performed one type of development ever tries to give advice or feedback to a developer from another type of environment, a lot of this advice and feedback will not take the developer’s constraints and circumstances into account and will not be useful.

Let’s take a few examples to illustrate that.

Example from r/cpp

Here is an example that comes for Reddit, specifically from the subreddit r/cpp:

This example is typical: while courteous, it misses the point and is based on two sophisms:

  • “Every modern C++ compiler produces warnings if […]”. It greatly depends on what “modern” means, but there is a lot of compilers that do not work like your standard compiler. I’m thinking about compiling for embedded systems, experimental compilers, and home-made compilers that you can sometimes encounter on very specific projects, or even older compilers that did not implement the said warning at that time. Trying to generalize in this context is somewhat of a fallacy, especially since “printf API […] doesn’t enforce it”.
  • “[…] honestly you should have compiler warnings enabled anyway.”. I hear that a lot, and I think most of those who say it never have worked under project and environment constraints. When you arrive on a project, you don’t always have a say on how the environment works, especially if the project was ongoing for several years when you arrived. Our work (as C++ experts and such) is to try and change mentalities, but sometimes it doesn’t work, unfortunately. There are also situations where when you arrive on the projects, there are hundreds and hundreds of warnings, and the management won’t give you the time to correct them all. In this context warnings-hunting is a lost cause.

Of course, we should always try to change the world for the better and try to destroy improper environments, but denying the existence of these contexts is denying the reality, how the world of development sometimes really works.

When that occurs, try to add nuance to your statements, leave it open for people to explain what are their constraints.

Instead of

“Yes you have to enable -Wall but honestly you should have compiling warnings enabled anyway.”

say something like:

“If you can you should enable -Wall because it will help you prevent the issue and others as well.”

Example from SO

Here is another example, taken from Stack Overflow:

Pretty short, but a lot to say nonetheless.

“Best advice is not to write macros like that.” Okay, no problem, but why? Because of how macros work? Because you can’t do whatever you want to do with it? Because macros are bad design and there is a working alternative?

The question states the following constraint:

Is the question “Why do you need to use __LINE__?” really relevant? Since the question is based on the statement just above, whether you know why does the user need __LINE__ or not won’t help the original poster6.

Writing relevant advice is really easy when you put your thought into it. For instance:

This comment simply states that pointers are usually bad, while admitting that depending on the case they may be needed. It has been written to hint at the original poster about the dangers of pointers while remaining relevant.

Wrapping up

When you want to be helpful to other developers, you have to pay attention to their circumstances. Your answer won’t reach its target if it is irrelevant.

Plus you have to ask yourself: are you really helping anyone if your advice can be summarized by “You have to change your environment” to someone who can’t or won’t? You have to adapt to these situations, put your words into perspective, so the person you are talking to will acknowledge your advice, even if they can’t apply it directly.

It’s easy to fall into sophism and authoritative arguments. Always try to explain your arguments, even if they seem trivial to you. This will give weight to them. Moreover, maybe it’s trivial to you but it may not be so for others. And if you don’t manage to simply explain your argument, there is a really good chance that it’s a fallacy.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Notes

  1. In this article, I’ll use C++ to illustrate, but everything that is said can be applied to any programming language. I explain why C++ is specifically affected by this later in the article.
  2. Depending on your own experience, you may discover other types of development. They supplement the existing ones.
  3. The definition of improve here is the key. What a new developer on a project might consider an improvement isn’t the same as what a senior developer, project manager, accountant, salesman or customer would consider an improvement. “It’s great that you’ve spent a year bringing the codebase up to C++20 with new GCC and clang , but you haven’t fixed any of the reported bugs, implemented the new features we promised to the customer and now we don’t support our legacy platform anymore…”
  4. For instance: because it is owned by another team or company, because it has already been sold to the client, or because it has already been QA’d and it’d take weeks to be QA’d again.
  5. I’m only counting from C++03 (so C++03, 11, 14, 17 and 20) since C++98 is very similar to C++03.
  6. It may sometimes occur that the original poster states a constraint that they could avoid. But it is unconstructive to “babysit” the OP in that case, it would be better to propose alternative with examples.

Prettier switch-cases

Author: Chloé Lourseyre
Editor: Peter Fordham

I learned this syntax during a talk at the CppCon 2021 given by Herb Sutter, Extending and Simplifying C++: Thoughts on Pattern Matching using `is` and `as` – Herb Sutter – YouTube. You also find this talk on Sutter’s blog, Sutter’s Mill – Herb Sutter on software development.

Context

Say you have a switch-case bloc with no fallthrough (this is important), like this one:

enum class Foo {
    Alpha,
    Beta,
    Gamma,
};

int main()
{
    std::string s;
    Foo f;

    // ...
    // Do things with s and f 
    // ...

    switch (f)
    {
        case Foo::Alpha:
            s += "is nothing";
            break;
        case Foo::Beta:
            s += "is important";
            f = Foo::Gamma;
            break;
        case Foo::Gamma:
            s += "is very important";
            f = Foo::Alpha;
    }
    
    // ...
}

Nothing fantastic to say about this code: it appends a suffix to the string depending on the value of f, sometimes changing f at the same time.

Now, let’s say we add a Delta to the enum class Foo, which does exactly like Gamma, but with a small difference in the text. This has a good chance to be the result:

enum class Foo {
    Alpha,
    Beta,
    Gamma,
    Delta,
};

int main()
{
    std::string s;
    Foo f;

    // ...
    // Do things with s and f 
    // ...

    switch (f)
    {
        case Foo::Alpha:
            s += "is nothing";
            break;
        case Foo::Beta:
            s += "is important";
            f = Foo::Alpha;
            break;
        case Foo::Gamma:
            s += "is very important";
            f = Foo::Alpha;
        case Foo::Delta:
            s += "is not very important";
            f = Foo::Alpha;
    }

    // ...
}

The new case block is obviously copy-pasted. But did you notice the bug?

Since in the first version, the developer of this code did not feel it necessary to put break at the end, when we copy-pasted the Gamma case we left it without break. So there will be an unwanted fallthrough in this switch.

New syntax

The new syntax presented in this article makes this kind of mistake less likely and makes the code a bit clearer.

Here it is:

    switch (f)
    {
        break; case Foo::Alpha:
            s += "is nothing";
        break; case Foo::Beta:
            s += "is important";
            f = Foo::Alpha;
        break; case Foo::Gamma:
            s += "is very important";
            f = Foo::Alpha;
        break; case Foo::Delta:
            s += "is not very important";
            f = Foo::Alpha;
    }

This is it: we put the break statement before the case.

This may look strange to you since the very first break is useless and there is no closing break in the last case block, but this is really functional and convenient.

If you begin each of your case block with break; case XXX:, you will never have a fallthrough bug ever again.

Benefits

The first benefit is the avoidance of the presented bug in the first section, when you forget to add a break when adding a case block. Even if you don’t copy-paste to create your new block, it’ll be visually obvious if you forget the break (your case statement won’t be aligned with the others).

But the real benefit (in my opinion) is that the syntax is, overhaul, nicer. For each case, you save a line by not putting the break within the case block, and everyone will notice at first sight that the switch-case has no fallthrough.

Of course, beauty is subjective. That includes the beauty of code. However, things like better alignment, clearer intentions, and line economy1 are, it seems to me, quite objective as benefits.

Disclaimer

The first time I saw this syntax, I quickly understood how it worked and why it was better than the “classic” syntax. However, I know that several people were confused and had to ask for an explanation.

But that’s almost always the case when introducing a new syntax.

So keep in mind that your team may be confused at first if you use it in a shared codebase. Be sure to explain (either in person or in the comments) so people can quickly adapt to this new form.

Wrapping up

This is certainly not a life-changing tip that is presented here, but I wanted to share it because I really like how it looks.

It’s yet another brick in the wall of making your code prettier.

Thanks for reading and see you next week.

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Notes

  1. “Line economy” is beneficial when it discards non-informative statements, just like the break is in this context. I would never say huge one-liners are better than a more detailed block of code (because they aren’t). Reuniting the break and the case keywords let your code breathe (you can put an empty line in place of the break if you want to keep space).