I don’t know which container to use (and at this point I’m too afraid to ask)

Author: Chloé Lourseyre
Editor: Peter Fordham

As far as containers go in C++, since std::vector is well suited for most cases (with the occasional std::map when you need key-value association1), it’s become easy to forget that there are other types of containers.

Each container has its strength and weaknesses, so if you are one to forget what its specificities are, this article is a good start.

Disclaimer: Not all C++ containers are listed, only the most usable ones in my opinion. If you want to go further, you’ll find two very useful links in the addenda, at the bottom of the page.

How to choose a container

Criteria

First™ criteria: sequence or associative?

The first question you have to ask yourself when you need a container is: it is a sequence container or an associative container?

In sequence containers, data is organized in an ordered and sequential way, with each value following the previous one. In memory, it doesn’t have to be contiguous (and often isn’t), but in practice, you access a value by knowing its index inside the container.

Unlike the sequence containers, associative containers do not store data as a sequence, but by associating a value to a key. Instead of using an index to refer to a stored value, you use its key.

Sequence containers criteria

  • Is the size of the container static? If so, an std::array is what you need (and it’s not really impacted by the other criteria). In any other case, you will need to answer other criteria.
  • Will the size of the container vary a lot? This criterion is important for memory allocation. If you make the size of a container vary a lot when it’s not designed for it, you may experience slowness and memory over-usage2.
  • Is the order is important? This criterion relates to data structures where you can add and erase value only at the beginning or at the end, while the middle remains hidden (FIFO and FILO structures).
  • Do you need to add/erase values at the extremities of the container (at the beginning or at the end)? Some containers do that with maximum efficiency.
  • Do you need to be efficient at inserting/deleting data in the middle of the structure? Same as the previous criteria, sometimes you do a lot of additions and deletions, but in the middle of the container. There are also structures efficient for that purpose.
  • Do you need to find the nth element? Depending on your container, search is not always optimal.
  • Do you need to merge collections? Depending on your container, structure merge is not always optimal either (some need additional allocation and some don’t).

Associative containers criteria

  • There are two kinds of associative arrays: the ones who associate a key with a value (possibly of different types), that’s maps, and the one for which the key is the value itself, that’s sets.
  • By default, the keys are unique. But there is a variant when keys can have multiple entries. These are the multimap/multiset.
  • The inner structures of these containers can be implemented in two ways. By default, they are ordered by key. But they can also be unordered and use hashable keys. These are the unordered_ version of each.

Note: the ordered/unordered distinction is important for associative containers. Most of the time, ordered associative containers use balanced binary trees while the unordered ones use hash tables. This have an impact on performance and on the fact that, for hash tables, you don’t need to implement an order operator).

Containers’ matrix

I present you with two matrices (one for sequence containers and one for associative containers).

Each matrix represents for which criteria it is best to use3.

Sequence containers

(for the queue, insertion is at the front and deletion at the back)

Associative containers

Flowchart cheat sheet

Here is an easy-to-read and printable flowchart that will help you choose which container to use in any situation4 : Joe Gibson’s data structure selection flowchart.

But what happens in real life?

As I said in the introduction of the article, most real-life container problems can be easily resolved with std::vector and std::map. With that in mind, to what extent is it useful to search for the “optimal” container every single time?

The semantics behind each container is linked to how you use it. When you see deque you’ll think “insert and erase at the front and the back”, when you see queue you’ll think “insert at the front and delete at the back” and when you see list you’ll think “insert and erase in the middle”.

We see each container for how we use them more than how it is efficient to use them (which is linked but not equivalent). But there are a lot of operations that are available in several containers. For instance: you can easily insert and erase values at the end of a vector, as much as in deque (each of them has the push_back and pop_back member functions).

But may want to say “But inserting and deleting at the end of a vector is strongly inefficient!”. In theory, yes. But in practice, it only matters if it is within a critical section of the code. Most of the time, if you are within the 80% of Pareto’s principle, using a vector instead of a deque won’t affect performance.

So let me ask you this: if using a vector works, and if it doesn’t affect performance, why would you use any other container?

Vectors are the most understandable structure because it is quite close to the plain-old arrays. Most C++ users aren’t experts, and std::vector is the container they know how to use best. We shouldn’t make mundane code any more difficult to read and understand.

Of course, as soon as you have special needs, you should use the most appropriate container, but that doesn’t happen very often.

(I took std::vector as an example here for sequential containers, but the same applies for std::map and associative containers)

Optimization?

Optimization must be an opt-in behavior, not an up-front behavior.

When you first write a piece of code, you should not think about performance. You should write it in the clearest way possible.

Only then, you may think of the impact your code has on global performances. You may do benchmarks, and static/dynamic analysis to know if your code needs be optimized and what is badly optimized (execution time? memory usage? etc.).

If the code you are writing is not in the 20% of Pareto’s principle, you should not think about optimization at all. If it is, you can think about it after you have written it.

A more efficient way to look at it

I present you with a flowchart cheat sheet (in the same spirit as Joe Gibson’s flowchart) that summarize what has been said in the previous section:

The principle is simple. You have to ask yourself two questions:

  1. Can I do what I want to do with a vector/map?
  2. Am I in a critical section of the code?

If your answers are 1) Yes and 2) No, then you should use std::vector or std::map.

Wrapping up

It is more than useful to know the differences between the C++ containers. However, in your everyday life, try not to think too much about it. It is a fairly common mistake to overthink situations that can have simple solutions. But this knowledge is nor wasted: every now and then, you’ll into a specific situation that’ll require specific containers.

Pareto’s principle can also apply to this: more than 80% of the tools we know are useful in less than 20% of the situations.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Sources and resources

The flowchart is from the following GitHub repo, where you can also find a more detailed view of each container: cpp-cheat-sheet/Data Structures and Algorithms.md at master · gibsjose/cpp-cheat-sheet · GitHub.

You will also find a neat cheat sheet, a bit more graphical and dense (along with other kinds of cheat sheet) here: C++ Cheat Sheets & Infographics | hacking C++ (hackingcpp.com).

Joe Gibson’s data structure selection flowchart

https://raw.githubusercontent.com/gibsjose/cpp-cheat-sheet/master/General/Data%20Structures%20Selection.png
Joe Gibson’s data structure selection flowchart (source: GitHub – gibsjose/cpp-cheat-sheet: C++ Syntax, Data Structures, and Algorithms Cheat Sheet)

Notes

  1. Really, we can agree that std::unordered_map is best than map for basic key-value association. But in real life, I see too many indiscirminate use of map and almost no unordered_map.
  2. Plus, It’s not just the global size that matters, it is also the local maximum size, e.g. if a container will never exceed 20 entries in a critical section of the code we can pre-allocate locally to avoid unexpected reallocation, while maintaining the flexibility of using a variable size data structure.
  3. Of course, each container can perform most listed operation. But the matrices summarize which containers are best suited for each need.
  4. Unfortunatly, the presented flowchart lack any unordered_ associative container. But you can think of it like this: “Values need to be ordered? Yes -> map/set ; No -> unordered_map/unordered_set

Prettier switch-cases

Author: Chloé Lourseyre
Editor: Peter Fordham

I learned this syntax during a talk at the CppCon 2021 given by Herb Sutter, Extending and Simplifying C++: Thoughts on Pattern Matching using `is` and `as` – Herb Sutter – YouTube. You also find this talk on Sutter’s blog, Sutter’s Mill – Herb Sutter on software development.

Context

Say you have a switch-case bloc with no fallthrough (this is important), like this one:

enum class Foo {
    Alpha,
    Beta,
    Gamma,
};

int main()
{
    std::string s;
    Foo f;

    // ...
    // Do things with s and f 
    // ...

    switch (f)
    {
        case Foo::Alpha:
            s += "is nothing";
            break;
        case Foo::Beta:
            s += "is important";
            f = Foo::Gamma;
            break;
        case Foo::Gamma:
            s += "is very important";
            f = Foo::Alpha;
    }
    
    // ...
}

Nothing fantastic to say about this code: it appends a suffix to the string depending on the value of f, sometimes changing f at the same time.

Now, let’s say we add a Delta to the enum class Foo, which does exactly like Gamma, but with a small difference in the text. This has a good chance to be the result:

enum class Foo {
    Alpha,
    Beta,
    Gamma,
    Delta,
};

int main()
{
    std::string s;
    Foo f;

    // ...
    // Do things with s and f 
    // ...

    switch (f)
    {
        case Foo::Alpha:
            s += "is nothing";
            break;
        case Foo::Beta:
            s += "is important";
            f = Foo::Alpha;
            break;
        case Foo::Gamma:
            s += "is very important";
            f = Foo::Alpha;
        case Foo::Delta:
            s += "is not very important";
            f = Foo::Alpha;
    }

    // ...
}

The new case block is obviously copy-pasted. But did you notice the bug?

Since in the first version, the developer of this code did not feel it necessary to put break at the end, when we copy-pasted the Gamma case we left it without break. So there will be an unwanted fallthrough in this switch.

New syntax

The new syntax presented in this article makes this kind of mistake less likely and makes the code a bit clearer.

Here it is:

    switch (f)
    {
        break; case Foo::Alpha:
            s += "is nothing";
        break; case Foo::Beta:
            s += "is important";
            f = Foo::Alpha;
        break; case Foo::Gamma:
            s += "is very important";
            f = Foo::Alpha;
        break; case Foo::Delta:
            s += "is not very important";
            f = Foo::Alpha;
    }

This is it: we put the break statement before the case.

This may look strange to you since the very first break is useless and there is no closing break in the last case block, but this is really functional and convenient.

If you begin each of your case block with break; case XXX:, you will never have a fallthrough bug ever again.

Benefits

The first benefit is the avoidance of the presented bug in the first section, when you forget to add a break when adding a case block. Even if you don’t copy-paste to create your new block, it’ll be visually obvious if you forget the break (your case statement won’t be aligned with the others).

But the real benefit (in my opinion) is that the syntax is, overhaul, nicer. For each case, you save a line by not putting the break within the case block, and everyone will notice at first sight that the switch-case has no fallthrough.

Of course, beauty is subjective. That includes the beauty of code. However, things like better alignment, clearer intentions, and line economy1 are, it seems to me, quite objective as benefits.

Disclaimer

The first time I saw this syntax, I quickly understood how it worked and why it was better than the “classic” syntax. However, I know that several people were confused and had to ask for an explanation.

But that’s almost always the case when introducing a new syntax.

So keep in mind that your team may be confused at first if you use it in a shared codebase. Be sure to explain (either in person or in the comments) so people can quickly adapt to this new form.

Wrapping up

This is certainly not a life-changing tip that is presented here, but I wanted to share it because I really like how it looks.

It’s yet another brick in the wall of making your code prettier.

Thanks for reading and see you next week.

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Notes

  1. “Line economy” is beneficial when it discards non-informative statements, just like the break is in this context. I would never say huge one-liners are better than a more detailed block of code (because they aren’t). Reuniting the break and the case keywords let your code breathe (you can put an empty line in place of the break if you want to keep space).

Best ways to convert an enum to a string

Author: Chloé Lourseyre

One of the oldest problem C++ developers ever encountered is how to print the value of an enum type.

All right, I may be a little overly dramatic here, but this is an issue many C++ developers encountered, even the most casual ones.

The thing is, there isn’t one true answer to that issue. It depends on many things such as your constraints, your needs, and, as always, the C++ version of your compiler.

This article is a small list of ways to add reflection to enums.

NB: If you know of a way that isn’t listed here and has its own benefits, feel free to share in the comments.

Magic Enum library

Magic Enum is a header-only library that gives static reflection to enums.

You can convert from and to strings and you can iterate over the enum values. It adds the “enum_cast” feature.

Find it here: GitHub – Neargye/magic_enum: Static reflection for enums (to string, from string, iteration) for modern C++, work with any enum type without any macro or boilerplate code

Drawbacks

  • It’s a third-party library.
  • Only works in C++17.
  • You need specific versions of your compiler for it to work (Clang >= 5, MSVC >= 15.3 and GCC >= 9).
  • You have a few other constraints related to the implementation of the library. Check the Limitations page of the documentation (magic_enum/limitations.md at master · Neargye/magic_enum · GitHub)

Using a dedicated function with an exception

Static version

constexpr is a magnificent tool that allows us to statically define stuff. When used as the return value of a function, it allows us to evaluate the return value of the function at compile time.

In this version, I added an exception in the default, so if it happens so that an item is added, the exception will be raised.

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

constexpr const char* EsperToString(Esper e) throw()
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
        default: throw std::invalid_argument("Unimplemented item");
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Dynamic version

The thing is, having several returns in a constexpr function is C++14. Prior to C++14, you can remove the constexpr specifier to write a dynamic version of this function

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

const char* EsperToString(Esper e) throw()
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
        default: throw std::invalid_argument("Unimplemented item");
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Prior to C++11, you can remove the enum class specifier and use a plain enum instead.

Drawbacks

  • Having several returns in a constexpr function is C++14 (for the static version).
  • Specific to each enum and very verbose.
  • Is exception-unsafe.

Using a dedicated exception-safe function

Static version

Sometimes you prefer a code that doesn’t throw, no matter what. Or maybe you are like me and compile with -Werror. If so, you can write an exception-safe function without the default case.

You just have to watch out for warnings when you need to add an item.

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

constexpr const char* EsperToString(Esper e) noexcept
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Dynamic version

Again, a dynamic version without constexpr:

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

const char* EsperToString(Esper e) noexcept
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Prior to C++11, you can remove the enum class specifier and use a plain enum instead.

Drawbacks

  • Having several returns in a constexpr function is C++14 (for the static version).
  • Specific to each enum and very verbose.
  • The warnings are prone to be ignored.

Using macros

Macros can do many things that dynamic code can’t do. Here are two implementations using macros.

Static version

#include <iostream>

#define ENUM_MACRO(name, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10)\
    enum class name { v1, v2, v3, v4, v5, v6, v7, v8, v9, v10 };\
    const char *name##Strings[] = { #v1, #v2, #v3, #v4, #v5, #v6, #v7, #v8, #v9, #v10};\
    template<typename T>\
    constexpr const char *name##ToString(T value) { return name##Strings[static_cast<int>(value)]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Dynamic version

Very similar to the static one, but if you need this in a version prior to C++11, you’ll have to get rid of the constexpr specifier. Also, since it’s a pre-C++11 version, you can’t have enum class, you’ll have to go with a plain enum instead.

#include <iostream>

#define ENUM_MACRO(name, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10)\
    enum name { v1, v2, v3, v4, v5, v6, v7, v8, v9, v10 };\
    const char *name##Strings[] = { #v1, #v2, #v3, #v4, #v5, #v6, #v7, #v8, #v9, #v10 };\
    const char *name##ToString(int value) { return name##Strings[value]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Kvin) << std::endl;
}

Drawbacks

  • Uses macros (I could — has and will — write an article about why macros are bad practice in C++, but won’t do it here. For now, just keep in mind that if you don’t know why macros can be bad, then you shouldn’t use them)
  • You need to write a different macro each time you need a reflective enum with a different number of items (with a different macro name, which is upsetting).

Using macros and boost

We can work around the “fixed number of enum items” drawback of the previous version by using Boost.

Static version

#include <iostream>
#include <boost/preprocessor.hpp>

#define PROCESS_ONE_ELEMENT(r, unused, idx, elem) \
  BOOST_PP_COMMA_IF(idx) BOOST_PP_STRINGIZE(elem)

#define ENUM_MACRO(name, ...)\
    enum class name { __VA_ARGS__ };\
    const char *name##Strings[] = { BOOST_PP_SEQ_FOR_EACH_I(PROCESS_ONE_ELEMENT, %%, BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) };\
    template<typename T>\
    constexpr const char *name##ToString(T value) { return name##Strings[static_cast<int>(value)]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Here, the PROCESS_ONE_ELEMENT “converts” the item to its stringized version (calling BOOST_PP_STRINGIZE), and the BOOST_PP_SEQ_FOR_EACH_I iterates over every item of __VA_ARGS__ (which is the whole macro’s parameter pack).

Dynamic version

Again, it’s a very similar version of the static one, but without the constexpr or other C++11 specifiers.

#include <iostream>
#include <boost/preprocessor.hpp>

#define PROCESS_ONE_ELEMENT(r, unused, idx, elem) \
  BOOST_PP_COMMA_IF(idx) BOOST_PP_STRINGIZE(elem)

#define ENUM_MACRO(name, ...)\
    enum name { __VA_ARGS__ };\
    const char *name##Strings[] = { BOOST_PP_SEQ_FOR_EACH_I(PROCESS_ONE_ELEMENT, %%, BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) };\
    const char *name##ToString(int value) { return name##Strings[value]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Kvin) << std::endl;
}

Drawbacks

  • Uses macros.
  • Uses Boost.

NB: While still being a third-party library, the boost library is often more accepted than other libraries (such as the little-known Magic Enum library), that’s (inter alia) why this version may be preferred to the first one.

Wrapping up

Here is a little summary of the methods described here:

NameIs static?Is generic?3rd party librariesUses macros?Is exception-safe?
Magic EnumYes (C++17)YesYes (Magic Enum)NoNo
Function w/ exceptionYes (C++14)NoNoNoNo
Function w/o exception Yes (C++14)NoNoNoYes
MacroYes (C++11)NoNoYesYes
Macro & BoostYes (C++11)YesYes (Boost)YesYes

Again, if you know a good way to convert enums to string, please say so in the comments.

Thanks for reading and see you next week!

Author: Chloé Lourseyre

How to choose between a setter and a reference-getter?

Author: Chloé Lourseyre

Context

When you implement a class, you often have to implement accessors to one or several attributes of this class to edit them.

There are two ways to do that:

  • A setter, a method that takes the new value for the attribute as argument.
  • A reference-getter, a method that returns a reference to the attribute itself.

Here is a short example that shows how to access the attribute bar with each method.

class Foo {
    int m_bar;

public:
    // Setter
    void set_bar(int bar) {
        m_bar = bar;
    }

    // Reference-getter
    int & get_bar() {
        return m_bar;
    }
};

int main() {
    Foo foo;

    // Editing via setter
    foo.set_bar(42);

    // Editing via reference-getter
    foo.get_bar() = 84;

    return 0;
}

Some may argue that there are more methods to do that, but I assert that they are just variations of one of these two.

The setter

A setter is a write-only interface to a class. You provide a value, and the class is updated accordingly.

Often, it will more or less directly update an attribute by copying/moving the data you provide.

Examples

// Most simple setter
void set_foo(int foo) {
    m_foo = foo;
}
// A setter that performs a test before edition
void set_foo(Foo foo) {
    if (foo.is_valid())
        m_foo = foo;
}
// A move-setter
void set_big_foo(BigFoo && big_foo) {
    m_big_foo = std::forward<BigFoo>(big_foo);
}

The reference-getter

A reference-getter is a method that directly returns the reference to an attribute to access and edit it.

This is peculiarly useful on object attributes so you can call non-const methods directly on them.

Examples

// Here is the implementation of the reference-getter
Foo & MyClass::get_foo() {
    return m_foo;
}

// ...

// Used to edit an attribute
myClass.getFoo().bar = 42;

// Used to call a non-const method
myClass.getFoo().udpate();

How to choose

Well, this is pretty simple as soon as you understand the difference.

The setter is required when you need to re-create the data and put it in the place of the existing one. This is suited when editing simple data (integers, floating values, pointers, etc.) or if you require a brand new value. Also, use setters if you explicitly want the user to be unable to read the data, only edit it.

The reference-getter is required when it is the attribute’s data that is edited (and not the attribute as a whole). Often, you will edit a part of the attribute or call a non-const method over it.

In other words, a setter replaces the attributes, and the reference-getter edits the attribute.

Example

Take the following code:

#include <vector>

using namespace std;

struct Item
{
    bool validity;
    int value;
};

class Foo
{
public:
    Foo(size_t size) :
        m_max_size(size),
        m_data(size, {true, 0})
    {}

    void set_max_size(size_t max_size) {
        m_max_size = max_size;
    }

    Item & get_item(size_t index) {
        return m_data.at(index);
    }

    size_t get_data_size() const {
        return m_data.size();
    }

private:
    bool m_max_size;
    std::vector<Item> m_data;
};

static void set_foo_size(Foo & foo, size_t new_size)
{
    foo.set_max_size(new_size);
    for (size_t i = new_size ; i < foo.get_data_size() ; ++i)
        foo.get_item(i).validity = false;
}

There, we have a simple little class that holds a collection of data (Items). These data can be valid or invalid (true is valid, false is invalid).

Then we implement a little function that resizes the collection without deleting any element. Out-of-bounds elements are made invalid instead.

We access the value m_max_size using a setter because it’s a plain-old integer that is replaced when the collection is resized.

We access each Item of m_data using a reference-getter because we do not always want to erase the item, sometimes we just want to edit a part of it.

Alternative

We could have edited the validity using a more specific setter, like this:

class Foo {

        // ...

        void set_item_validity(size_t index, bool validity) {
                m_data.at(index).validity = validity;
        }

        // ...

};

Doing so would prevent us from editing the value of the Item, so this decision entirely depends on your implementation details.

However, please consider bad practice to provide setters to both value and validity. Doing so for a 2-attributes data bucket may not be that inconvenient, but as soon as your implementation grows your codebase will be polluted by useless accessors. You need full access? Then get full access.

Wrapping up

This may look like a trivial subject to many of you, but there are a lot of developers that are still confused and use one when they need the other. Stay vigilant.

Thanks for reading and see you next week!

Author: Chloé Lourseyre

Should every variable be const by default?

Author: Chloé Lourseyre

A bit of terminology

In the title, I used the word “const” because const is a keyword every C++ developer knows (I hope). However, the generic concept this article is about is actually called “immutability”, and not “constness”. const is a keyword used in some languages (C++, Javascript, etc.) but the concept of immutability exists in other languages (in some of them, there is even no keyword attached to the concept, like in Rust).

To be a little more inclusive, I will use the words immutable and immutability instead of const and constness.

Nota bene: in C++ the keyword const does not completely refer to immutability. For instance, you can use the keyword const in a function prototype to indicate you won’t modify it, but you can pass a mutable object as this parameter. However, this article only refers to const as a way to render data immutable, so I would be grateful if you ignore situations where const does not refer to inherently immutable data. That is also another reason I wish to use the word immutable and not const.

Some very good reasons to make your variables immutable

Safety

What is called const-correctness in C++ is the fact that if you state that a given variable is immutable, then it won’t be modified.

Const-correctness is reached when you use the const keyword to tell when a variable won’t be modified. Then, you won’t be able to inadvertently modify something you shouldn’t and you give insight to the compiler and the developers of the data you manipulate.

Documentation

These five characters tell a lot. If you find yourself in a reliable code, where you are sure that const is used properly, then this keyword -or its absence- serves as an indication of the intended mutability of the variable.

Optimization

The compiler can (and will) optimize the code according to the const directives.

Knowing that some values are immutable, the compiler will be able to make assumptions and use syntactic shortcuts.

You can feed on J. Turner’s talk on the subject of practical performance: Jason Turner: Practical Performance Practices – YouTube.

Source

If you still doubt that immutability is useful in C++ (or are just curious, which you should be), please read these:

Are there downsides to using immutability whenever possible?

Since immutability is a technical restriction, there is no technical downside to using it.

However, there may be some non-technical downside of using const. First, it increases the verbosity. Even if it’s a 5-letters word that often has the same color as the type (in most IDEs), some people might find this impairing. There is also inertia: if you need to modify the variable, you’ll have to go and remove the keyword. Some may consider it a safeguard, others an impairment.

Are these arguments enough to not use immutability as soon as you can? It’s up to you, but in my humble opinion, I think not.

Other languages?

Other languages make variables immutable by default.

One famous example is the Rust language. In Rust, you don’t have the const keyword. Instead, every variable is by default immutable, and you can add the mut keyword to make them mutable.

For instance, in Rust, the following code doesn’t compile:

fn main() {
    let foo = 2;
    println!("Foo: {}", foo);
    foo += 40; // error[E0384]: cannot assign twice to immutable variable `foo`
    println!("Foo: {}", foo);
}

But this is correct:

fn main() {
    let mut foo = 2;
    println!("Foo: {}", foo);
    foo += 40;
    println!("Foo: {}", foo);
}

The reason mentioned in the Rust Handbook is the following: “This is one of many nudges Rust gives you to write your code in a way that takes advantage of the safety and easy concurrency that Rust offers.”

Rust is a pretty modern language that aims to provide safety to the developer along with top-tier performances. I agree with its intention to make variables immutable by default because of the safety it brings.

Is it reasonable to change the standard to make variables immutable by default?

If one fateful day the C++ committee decides to make every variable const by default… no one will upgrade to the vNext of C++.

Retro compatibility is important to smooth the transition between vPrevious and vNext. There are few historical examples where a feature has been removed from the standard, and for each, there is a really good reason why (often being years of deprecation).

But we can’t seriously consider removing the keyword const and add the keyword mut just for the sake of what is, in the end, syntactic sugar.

What to do then?

I for one chose to use the const keyword every single time I declare a variable. I only remove it when I actually write an instruction that requires it to be mutable (or when the compiler realizes it needs to modify a constant variable and outputs an error). This is a little blunt, but in my opinion, it’s the best way to ensure to catch the habit of putting const by default.

Of course, you may try to be smart at the moment you declare the variable and try to foresee the use you’ll make of it, but I guarantee you will not be 100% reliable and miss opportunities to make variables immutable.

So here’s my advice: until you catch the habit of putting const everywhere, use const by default without double thinking. You may have regular easy-to-solve compilation errors, but you’ll catch a good habit.

And what about constexpr?

constexpr is a modifier that indicates that the value of a variable can be evaluated at compile time. Its purpose is to transfer some evaluation to the compilation phase.

In a sense, it is stronger immutability than the keyword const. All I said in this article about immutability applies to constexpr: use it as often as you can.

However, unlike const, I don’t advise trying to use it everywhere and see afterward if it needs to be removed, since it’s way rarer and a bit more obvious to use. But keep in mind to check if you can use constexpr every time to declare a variable.

Thank you for reading and see you next week!

Author: Chloé Lourseyre

Exceptions are just fancy gotos

Author: Chloé Lourseyre

About goto

Back to basics: why is goto evil?

I could just throw the link to the article E. W. Dijkstra wrote (Go To Statement Considered Harmful (arizona.edu)), which explains pretty well why goto is evil, but I’d like to go a little further than that.

I will provide here a list of reasons to not use goto, summarizing the article of Dijkstra and adapting its reasoning to modern programming:

  1. The goto control flow is unstructured, unlike the other control flows in C++. if and loop statements are geographically linked to the code they control. It is either linked to a block (with its own lifecycle) or a single instruction. while reading it, any developer can see and understand what code is nested within the if or the loop and where it ends. Functions, another control flow in C++, are access points. Their signature is a contract that is secured by the compiler. You give input, you get a return value. gotos are unstructured as they are a one-way trip across code, with no geographical attachment like ifs and loops and no entry-exit guarantee like functions. This is very prone to unmaintainable, spaghetti code. Long story short: it breaks the program continuity.
  2. Regarding what it can do, goto is probably the most unsafe keyword in the language. I won’t give lengthy examples here, but you can try it out yourself: the number of stupid things you are allowed to do with goto is surprisingly high. And if you do stupid things, you may have crashes, sometimes random, memory corruption, undefined behavior…
  3. Here and today, in the era of modern C++, nobody uses goto anymore. That means most of us are unfamiliar with it. Most developers know they shouldn’t use it, but a whole lot of them don’t know why. Thus, if they ever encounter a goto in their codebase, they are most likely to either refactor the code (and possibly cause a regression, since they are unfamiliar with it) or leave it that way without trying to understand what it does. Since it’s complicated to use, the fact that nobody uses it regularly is yet another argument against it.

There are specific contexts and language where goto can be considered a good practice, but in modern C++, goto is a no-go.

About control flow and spaghetti code

The first point of the previous section (which is the main argument against goto) is talking about control flow and spaghetti code. But why is this important?

You can see an execution of your program as a rope that is unwound alongside your code as it is executed. As long as there are only simple instructions, the rope is unwound normally. When there is control flow, the rope will behave differently.

When the execution encounters a for or while, the rope will trace loops around the attached block, one rope loop for each execution loop that is performed. When the execution encounters an if (and possibly an else), the rope may or may not jump above the block of statements attached to it, depending on the condition, then continuing its course.

When the execution encounters a function, a strand of the rope will do a curl where the function is, then come back at the rope when it’s over, making the rope whole again.

However, when a goto statement is encountered, sometimes you will have no choice but to stretch the rope across your entire code to reach the target label. If there are multiple gotos at multiple locations, then the rope will cross itself over and over again.

If you have good control flow, then you will be able to follow the rope from the beginning to the end. But if your control flow is messy, then the rope will overlap itself all over and you will have trouble understanding any of it. This is what we call spaghetti code, a control flow that looks like a plate full of spaghetti.

Since gotos make the rope cross over the program, it is very prone to spaghetti code.

Is goto really evil?

But, despite all that, can’t we imagine a simple, safe way to use goto? After all, goto is only evil if it’s used to make spaghetti code, but if we write a code that only uses goto locally, in a closed and controlled space, then the code would not be very spaghetti-ish, would it?

There are designs where the use of goto makes the code clearer than with the use of other control flows.

The classic example is with nested loops:

//...

bool should_break = false;
for (int i = 0 ; i < size_i ; ++i)
{
    for (int j = 0 ; j < size_j ; ++j)
    {
        if (condition_of_exit(i,j))
        {
            should_break = true;
            break;
        }
    }
    if (should_break)
        break;
}

//...

If we write it using goto, it will be shorter and a bit clearer:

// ...

for (int i = 0 ; i < size_i ; ++i)
{
    for (int j = 0 ; j < size_j ; ++j)
    {
        if (condition_of_exit(i,j))
            goto end_of_nested_loop;
    }
}
end_of_nested_loop:

// ...

See? goto is not so useless after all!

So should we use this keyword in those specific cases where it makes the code clearer? I think not.

It’s hard to find a variety of examples where goto is better than any other control flow, and nested loops like this one are very scarce. And even if the code is shorter, I don’t find it clearer. There are 2 block levels of difference between the goto and its label, making the jump from the former to the latter counter-intuitive. The human factor remains a huge problem.

So is goto really evil? No, but it’s still an absolute bad practice.

About exceptions

Exceptions: the modern way to break control flow

Exceptions are a way to manage errors in your code. You can also use it as a standard control flow, since you can customize your own exceptions, throw them and catch them however you want.

I like to see exceptions as “dangling ifs”: if your code performs adequately, everything will be good, but if something is off, you just throw everything upwards, hoping that something in the higher-level program will catch it.

Exceptions do break the standard control flow. To take the image of the rope again, when you call a function that can throw an exception, then a strand of the rope will go into the function (just like before), but you’ll have no guarantee that the strand will be returned to the rope and the place where you called the function. It can be re-attached anywhere above in the call stack. The only way to prevent that is to try-catch all exceptions when you call the function, but this is only possible if you know what to do in every case of error.

Moreover, when you write a function in which you may throw an exception, you have no way to know if and where it will be caught.

Though this is not as bad as a goto, because it is more controlled, you can still easily write spaghetti code. With this hindsight, we can consider that exceptions are a fancy, “modern” equivalent of goto.

We can even write the nested-loops-escape with exceptions:

//...

try 
{
    for (int i = 0 ; i < size_i ; ++i)
    {
        for (int j = 0 ; j < size_j ; ++j)
        {
            if (condition_of_exit(i,j))
                throw;
        }
    }
}
catch (const std::exception& e)
{ /* nothing */ }

//...

I wouldn’t recommend it though. Exceptions are evil.

Are exceptions really evil?

…are they? Not really.

In the previous section, I stated that goto isn’t inherently evil, but is bad practice because it is really error-prone for what there is to gain from it. The same goes for exception: it’s not pure evilness, it’s a just feature.

In my opinion, unlike goto, there are ways to use exceptions safely.

One major difference between exceptions and gotos

To understand how to correctly use exceptions, we must understand the differences between them and gotos.

Exceptions do not send the execution somewhere undefined, it sends the execution upward in the stack. It can be a few blocks upward (like in the nested-loops example) or several function calls upward.

When to use exceptions?

Sending the program execution upward is still pretty undefined, and in most cases, we don’t want to break the program continuity.

In most cases.

There is one specific situation when we want the execution to stop abruptly: when someone misuses a feature in a way that could cause an unwanted behavior (or worse).

When you write a feature, you want to be able to stop any degraded state from happening, and instead send the execution upward and say “something unexpected happened, I can’t proceed”. Not only this will prevent any unwanted behavior, but it will also warn the user that they misused your feature, forcing them to either correct their use of it or handle the error they caused.

When you write a feature, there is a virtual wall between you and the user, with a tiny hole that is represented by the feature’s interface. It’s up to each of you to handle how the fictional rope behaves on each side of the wall.

A feature can be a whole library or a single class, but as long as it’s encapsulated, it’s okay to use exception as part of their interface.

A good example of that is the at() method of std::vector<>. The method’s goal is to return the nth element of the vector, but there is a chance that the user ask for an out-of-bound element. In that case, throwing an exception is a good way to stop the vector from causing an undefined behavior. If the user catches the exception, they can write a code to execute in case of OOB index. If not, then the program stops and indicates that an out-of-bound has been raised, forcing the user to secure their code or handle the degraded state.

In any case, you must document every exception your code can throw.

Wrapping up

I feel like many of the articles I write could just be concluded by “just don’t be stupid”, but I always try to give a good summary anyway, because not being stupid is actually harder than it looks (I am myself sometimes quite stupid, and it happens more times than I would admit).

It is especially hard to not be stupid with exceptions, considering how easy it is to blow everything up. I will try to summarize the good practice to apply regarding exception in three points:

  • Do not use exceptions for flow control.
  • You can use exceptions as part of your feature interface.
  • Document every exception your feature can throw.

Thanks for reading and see you next week!

Author: Chloé Lourseyre

Dealing with integer overflows

Author: Chloé Lourseyre

Integer overflows are a pain. There is no consistent and easy way to detect them, let alone to statically detect them, and the will make your program inconsistent.

This article will talk about both signed and unsigned integer overflow, without distinction. Even if signed overflow is undefined behavior while unsigned is not, 99.9% of the time you want neither of them.

How to detect integer overflows?

While they are few, there are ways to detected integer overflows. They just either lack consistency or easiness.

I will present a few way to detect overflow in this section. If you happen to know other ways to do so, feel free to share it in comments, so it can benefit to everyone.

UBSan

Undefined Behavior Sanitizer, UBSan for short,  is a runtime undefined behaviour checker.

It has the ability to detect integer overflows in the form of compilation options (though it is supposed to check UBs, it also do us the favor to check unsigned overflows):

clang++ -fsanitize=signed-integer-overflow -fsanitize=unsigned-integer-overflow

It is implemented for both clang and GCC:

I may be wrong, but I did not see any integration in other compilers.

The main downside of a dynamic checker like this one is that you’ll have to do a full compilation and an exhaustive test run of your program to detect overflows. If your test run is not exhaustive enough, you run the risk to let an overflow slip.

Write adequate unit tests

If your project implement unit tests (I could argue that every project should, but it’s not always up to the developers), then you have a pretty direct way to check for overflows.

For the data and functions that can accept very high numbers, just throw big ints into them. You can then check if the features perform correct evaluations, throws an exception, returns an error code, or do whatever it is supposed to do in these cases.

If you don’t know what result to expect when you put a big int into them, because the result would be too big, and there is no error-handling, then your feature is unsafe. You must put so error-handling in them to ensure no overflow will happen.

Sometime, detecting a potential overflow this way will requires weighty refactoring. Don’t be afraid to do it, you are better safe than sorry.

Don’t put your code in a situation prone to overflows

The best way be sure there is no overflow is to prevent overflows. If you ensure that no overflow can arise from the code you write, you won’t have to detect them.

The next section will give a few practices to help you in that regard.

How to prevent integer overflows?

Use 64-bits integers

One very good way to prevent integer overflows is to use int64_t to implement integers. In most case, 64-bits ints will not commit overflow, unlike their 32-bits counterparts.

There is actually very few downsides in using int64_t instead of int32_t. Most of the time, you won’t care about the performance gap or the size gap between them. Only if you work on embedded software or on processing algorithms you may care about performance or data size constraints (by “processing algorithms” I mean algorithms supposed to have top-notch performance, like in signal processing).

Just note that longer integers does not always mean slower calculation, and with the full set of pointer / reference / forwarding, you don’t copy whole data structures very often.

And in any case, even if you are performance and/or size sensitive, always remember:

First, be safe. Then, be fast.

(citation by me)

Trying to over-optimize and risk integer overflow is always worse than to write safe code then use tools to pinpoint where the program must be optimized.

So I recommend that, unless your integer has a hard limit and is very small, use a int64_t.

About int64_t performances

I ran a few basic operations (addition, multiplication and division) on both int32_t and int64_t to see if there were any significant differences.

Using clang, using two different optimization levels, here are the results:

This may not surprise you, but in every case it’s the division which is the slowest. If you don’t have to use divisions, you will notice no performance gap between 32 bits and 64 bits. However, if you opt for divisions, 32-bits integers are more suitable (but just because you’re using divisions you won’t have top-notch performance).

Gentle reminder about basic types

If you ever wonder why I used the types int32_t and int64_t instead of the plain old int and long, it’s because these two type don’t have a pre-defined size.

The only size constraints that the C++ standard applies on int and long are the following:

  • int is at least 16 bits
  • long is at least 32 bits.
  • The size of long is greater than or equal to the size of int, bool and wchar_t, and is lesser than or equal to the size of long long
  • The size of int is greater than or equal to the size of short and is lesser than or equal to the size of long

Because of that, please refrain from using plain old ints and longs when you want to avoid overflows.

Don’t assume that because a value is in range, it’s overflow-safe

Let’s say, you have a int32_t my_val that represents a data which max value is one billion (1 000 000 000). Since the max value of a int32_t is 231-1 (2 147 483 647), you may think it won’t cause overflow.

But one fateful day, an random dev unknowingly writes this:

#include <cstdlib>

const int32_t C_FOO_FACTOR = 3;

int32_t evaluate_foo(int32_t my_val)
{
    // foo is always C_FOO_FACTOR times my_val
    return my_val * C_FOO_FACTOR;
}

You called it? Integer overflow. Indeed, there are values of my_val that can cause an overflow when multiplied by 3.

So whose fault it is? Should we check for an overflow when we add or multiply? How could we do this?

Well, there is one simple practice that can help you avert most of the cases similar to this example. When you have to store an integer that is relatively big, even if it can’t overflow by himself, just put it in a bigger data type.

For instance, I never put a value that can be bigger than 215 in a data type that can hold the max value of 231. This way, even if we multiply the value with itself, it doesn’t overflow.

With this method we can keep lower data types in smaller structure with no side-effect. C_FOO_FACTOR can stay as a int32_t, the result will be adequately promoted if it’s used in an operation including a bigger type.

E.g.:

#include <cstdlib>

const int32_t C_FOO_FACTOR = 3;

int64_t evaluate_foo(int64_t my_val)
{
    // foo is always C_FOO_FACTOR times my_val
    return my_val * C_FOO_FACTOR; // The result of the multiplication is a int64_t
}

Use auto

Yes, auto can sometime be a lifesaver when you’re not 100% sure of the types you’re dealing with.

For instance:

#include <cstdlib>

int32_t  C_FOO = 42;

int64_t compute_bar();

int main()
{
    // Big risk of overflow overflow here
    int32_t foo_bar = compute_bar() + C_FOO;

    // Probably no overflow here
    auto better_foo_bar = compute_bar() + C_FOO;
}

Here, auto is useful because it prevents the error committed line 10, where the result of the operation compute_bar() + C_FOO, which is a int64_t, is converted back to a int32_t. Line 13, auto will become int64_t so no overflow will occur because of that.

(Note: integer demotion — meaning converting back to a smaller type — is actually an overflow).

There is also another specific case, one that doesn’t occur often, where auto can be useful. Consider the following code:

#include <cstdlib>

int32_t  C_FOO = 42;

int32_t compute_bar();

int main()
{
    // Big risk of overflow overflow here
    auto foo_bar = compute_bar() + C_FOO;
}

There, the return value of compute_bar() is int32_t. But later, the author of this function, seeing that the type is too small, changes it to a int64_t, like this:

#include <cstdlib>

int32_t  C_FOO = 42;

int64_t compute_bar();

int main()
{
    // Big risk of overflow overflow here
    auto foo_bar = compute_bar() + C_FOO;
}

Here, the auto automatically “promoted” to int64_t, avoiding an implicit conversion. If it was int32_t instead of auto at the beginning, then there would have a risk that the developer who edited the signature of compute_bar() did not correct the type of the variable foo_bar, without rising any compilation error or warning. So the usage of auto in this case made us dodge the bullet.

Wrapping up…

Always beware, when you’re dealing with big integers, to use big data type. Use auto when you’re unsure of what you’re dealing with, and use analyzers if you think the code may hold an overflow. And, as always, write good unit tests.

If you personally know of other ways to detect and/or prevent integer overflows, feel free to share in comments.

This is overhaul the best you can do to avoid and detect integer overflows.

Thanks for reading and see you next week!

Author: Chloé Lourseyre