[History of C++] The genesis of casting.

Author: Chloé Lourseyre
Editor: Peter Fordham

C-style casts

First of all, to understand the rationale behind the design of C++ casts, I think it’s important to remind you how C-style casts work, both in C and in C++

In C1

In C, you have two ways to cast:

  • You perform a value cast, an arithmetic conversion from a numeral type into another numeral type. You can have data loss if the targeted type is narrower than the origin type (for instance, when you cast float into long, or if you cast long into int).
  • You perform a pointer cast, which converts a pointer of a type into a pointer of another type. This can work well, as in this example (https://onlinegdb.com/sYFCGeZmH), but it can quickly be error-prone, like in this example (https://onlinegdb.com/pWovM17X4) where types are not exactly the same and in this example (https://onlinegdb.com/HHjNS9NSb) where one structure in bigger than the other2.

Though it is a C feature that has its uses and misuses, in the C++ language this is a behavior we want to avoid.

In C++

The C-style cast does not work in exactly C++ the same way it works in C (even though the final behaviors are similar.

When you perform a C-cast in C++, the compiler tries the following cast operation, in order, until it finds something that it can compile:

  1. const_cast
  2. static_cast
  3. static_cast followed by const_cast
  4. reinterpret_cast
  5. reinterpret_cast followed by const_cast

This is a process that is not appreciated by C++ developer (to say the least) because the cast that is performed is not explicit and does not catch potential errors at compile time.

You can find more information about this behavior on the following page: Explicit type conversion – cppreference.com

Run-Time Type Information

Original idea and controversies

What we call Run-Time Type Information, often shortened to RTTI is a mechanism that allows the type of an object to be determined during program execution.

This is used in polymorphism, where you can manipulate an object through its base class interface (thus, you don’t know at compile-time which derived class you are manipulating).

RTTI for C++ was drafted from the earliest version, but its development and implementation were postponed in hope that it would prove unnecessary.

Some people, at that time, raised their voice against the feature, saying that this would need too much support, was to heavy to implement, too expensive, complicated and confusing, “inherently evil” (against the spirit of the language), seen as the beginning of an avalanche of new features, etc.

However, Bjarne Stroustrup finally decided that it was worth implementing, for three reasons: it is important to some people, it is harmless to those who won’t use it, and without it libraries will develop their own RTTI anyway, leading to inconsistency.

In the end, RTTI was implemented in three parts:

  • The dynamic_cast operator, allowing a pointer to a derived class to be obtained from a pointer to the base class — only if the pointer is effectively of the derived class.
  • The typeid operator, allowing identification of the exact type of an object given an object of the base class.
  • The type_info structure, giving additional run-time info on a given type.

Early in the process (and the main reason he decided to wait until RTTI was needed before implementing it), Stroustrup detected numerous misuses of the feature, and some people even labelled it as a “Dangerous feature”.

However, there is a major difference between a feature that can be misused and a feature that will be misused. That difference resides in education, design, testing, etc. But this has a cost, and the real question is: are the benefits of such a dangerous features worth the effort necessary to keep misuses at a anecdotal level?

The final decision was yes: it is worth the shot. But not all developers agreed at that time.

Syntax

Since casts couldn’t be inherently made safe, Stroustrup wanted to provide a syntax that both signaled the use of an unsafe feature and discouraged its use when there were alternatives.

The C++ crew originally considered either using Checked<T*>(p); for run-time checked conversion and Unchecked<T*>(p); for unchecked conversion, or using (virtual T*)p for dynamic cast only.

But in regard the constraints and the fact that dynamic_casts and “standard” casts (which we know call static_cast) are two whole different operations, the old syntax was abandoned in favor to more verbose unary operators. These are the operators we know today, dynamic_cast<T*>(p) and static_cast<T*>(p) (and, later, the other casts).

typeid() and type_info

The first implementation of RTTI only provided dynamic_cast. However, soon people wanted to know more about the types they were dynamically manipulating leading to the creation of typeid() and type_info.

The typeid() method can be called on any polymorphic object and returns a reference to a type_info that holds all the needed information. The reason that it returns a reference and not a pointer is to avoid pointer comparison and arithmetic on it.

type_infos are uncopiable, polymorphic, comparable, sortable (so it can be used in hashmaps and such) and hold the name of the type.

Uses and Misuses

Now, there are two categories of types: those that have type information at run time and those that don’t. It was decided that only be the polymorphic classes, i.e. the classes who can be manipulated though base classes, will hold RTTI.

At first, people wondered if it would not cause some problems (and frustration) because sometimes it is hard to tell (as a developer) if a function is polymorphic or not. But this is not a big issue, because the compiler is able to tell at compile time if the use of typeid and of type_of is illegal or not.

The main issue that was anticipated was the over-use of RTTI where it isn’t needed. For instance, we can expect such code:

void rotate(const Shape& r)
{
    if (typeid(r) == typeid(Circle)) 
    {
        // do nothing
    }
    else if (typeid(r) == typeid(Triangle)) 
    {
        // rotate triangle
    }
    else if (typeid(r) == typeid(Square)) 
    {
        // rotate square
    }
}

However, this is a broken use of RTTI because it does not correctly handle classes derived from the ones mentioned. A better way to do this would be via virtualization.

Another misuse would be with unnecessary type-checking, like in the following code:

Crate* foobar(Crate* crate, MyContainer* cont)
{
    cont->put(crate);

    // do things...

    Object* obj = cont->get();
    Crate* cr = dynamic_cast<Crate*>(obj)
    if (cr)
        return cr;
    // else, handle error
}

Here, we manually check the type of the object in MyContainer, although it would be better to use a templated version of the container, like so:

Crate* foobar(Crate* crate, MyContainer<Crate>* cont)
{
    cont->put(crate);

    // do things...

    return cont->get();
}

Here, no need to check for errors and, most of all, no use of RTTI.

Theses two misuses of C++ RTTI are most commonly performed by developers following the guidelines of other languages (like C, Pascal, etc.) where such code is accepted, even encouraged. But it doesn’t fit the C++ design.

Abandoned features

Here is a list of features that have been considered for the C++ RTTI, but not adopted in the end:

  • Meta-Objects: it replaces the type_info by a mecanism (the meta-object) that can accept (at run time) requests to perform any operation that can be requested of an object of the language. However it embeds an interpreter for the complete language, which is a threat to its efficiency.
  • The Type-inquiry Operator: An alternative to the dynamic_cast was an operator that can say if an object is of a derived class or not. If so, it would allow us to then cast it (old-style) to the derived class in order to use it. However, dynamic_cast and static_cast can both be applied to pointers and hold different result, so we needed to make the distinction, because old-style-casting pointers would not always give us the result we expect. Plus decorrelating the check and the cast can cause mismatch.
  • Type Relations: Using comparison operators (such as < and <=) was suggested, but it was judged “too cute” (meaning it is a non-mathematical interpretation of the operator, giving meaning to an operation that has no mathematical meaning). Plus, this has the same check/cast decorrelation as the type-inquiry operator.
  • Multi-methods: it is the ability to select a virtual function based on more than one object. Such mechanism may be useful to people who develop binary operators. However, at that time, Stroustrup was not familiar with the concept and decided it would be implemented only if needed later.
  • Unconstrained Methods: this is the mechanism that allows a polymorphic object to call any method that could be called, checking at run time if it can effectively be called, handling errors accordingly. However, with the dynamic_cast we can check this ourselves, which is more efficient and type-safe.
  • Checked Initialization: this is the ability to initialize a derived class object from a polymorphic object, checking at run-time if the type actually match. However there was syntax complication, error-handling uncertainties and it can be done using, again, a dynamic_cast.

C++-style casts

Problems and consequencies

The C-style cast is (quoting B.S.) “a sledgehammer”. It means that when you write (B)expr you say “make expr a B, and whatever happens happens.”. This can become very unfortunate when const or volatile qualifiers are involved.

In addition to that, the syntax is simplistic. Hard to see, hard to parse, and you need an overuse of parentheses when you want to use a derived method in a polymorphic context3.

Thus, it was decided to separate the different ways to cast into separate operators. This way, when you write a cast, you write how you want to cast. Plus, this adds some verbosity to the operation which makes parsing easier and warns the reader that a potentially dangerous operation is happening.

Since there are really bad behaviors (from the C++ point of view) in C-style casts, there are C++ specific cast operators that are meant to not be used (to be isolated from “good” cast operators). These behavior are not deprecated from the language because in some specific contexts they can be useful, but they need to be separated from the others so they can not be used by accident and it is obvious when they are used.

The different casting operators

dynamic_cast

I won’t talk much about dynamic_cast, since this operator is covered in the previous section (about RTTI). Just keep in mind that the keyword dynamic_cast is the one associated with the RTTI solution.

dynamic_cast makes a conversion that is checked dynamically, i.e. at run-time. If you want a static check, i.e. at compile-time, you would prefer static_cast.

static_cast

The static_cast can be described as the inverse operation to the implicit conversion. If A can be implicitly converted to B, then B can be static_casted to A. The operator can also perform any conversion that can be implicitly done.

This alone covers the majority of conversion that does not require dynamic type checking.

static_cast respect constness (making it safer than C-style casts) and is static (any error will be detected at compile time).

Whenever it is relevant, the static_casting to a user-defined type seeks any single-parameter constructor that can match the conversion (if you try to statically cast an Foo into a Bar, the compiler will look for the Bar(Foo) constructor) or any relevant cast operator. See user-defined conversion function – cppreference.com for more info.

Also, you cannot perform a static_cast to or from a pointer to an incomplete type (which can be done using another C++-style cast).

reinterpret_cast

The reinterpret_cast holds the “unsafe” part of the C-style cast. With it, you can cast values from a class to another unrelated class, or from and to a pointer to an incomplete type.

This conversion basically reinterprets the argument its given. You can thus convert from pointer to function and from pointer to member.

This is inherently unsafe and must be performed with great caution. Wherever you write or see a reinterpret_cast, you know you must be extra careful. Using reinterpret_cast is almost as unsafe as C-style casts.

reinterpret_cast can easily lead to undefined behavior if not used following a specific set of rules (which you can find on its documentation page: reinterpret_cast conversion – cppreference.com)

For instance: if you use reinterpret_cast to go from one pointer type to another and then dereference that pointer to access it’s content, that’s likely undefined behavior.

const_cast

The goal to this operator is that the const and volatile qualifiers are never silently casted away.

To perform this operation, the source and destination types must be the same, except the const and volatile qualifiers which can differ.

This is a very dangerous operation and must be use with great caution. Always remember that casting away const from an object originally defined as const is undefined behavior.

bit_cast

Not really historical (it was introduced in C++20) but std::bit_cast was basically made to replace the std::memcpy() manual conversion.

The bit_cast can be undefined if there is no value of the destination type corresponding to the value representation produced (just like with memcpy).

Unlike reinterpret_cast, if you to go from one pointer type to another and then dereference that pointer to access it’s content using bit_cast it’s not undefined behavior if you know for sure that those bits are a valid representation of the target type. The difference here is subtle but it allows the compiler to safely make lots of cases work efficiently and do the right thing in more complex cases without invoking undefined behavior. Typical use case is for serialization.

Wrapping up

Historically, the way C-cast operator was split into four C++ operators follows three simple rules:

  • If you need to check the types dynamically, then use dynamic_cast.
  • If you can check the types statically, then use static_cast.
  • In any other case, it is reinterpret_cast or const_cast that you need, but this is very dangerous.

I’ll add to that that, in any situation, do no perform reinterpret_cast or const_cast unless you know know what you are doing. You should never ever perform these cast only because the other ones did not work.

RTTI in its wholeness is a useful –but totally optional– feature. But it is not a simple to master.

In modern C++, we want to perform checks as much as possible at compile time (for security and performance), so when we are able, we want to use static features instead of dynamic ones.

Of course, you should not force static code where dynamic code would be better, but you should always think of a static solution before a dynamic one.

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Sources

Notes

1. As much as I consider myself an expert in the C++ language, my knowledge of the C language is much more limited. There may be errors in this subsection. If so, please tell me in comments so I can edit the article.

2. You can also cast away the const qualifier through the pointer cast (https://onlinegdb.com/8HIJIeonA) but I don’t think it’s a whole different way to cast.

3. For instance, if px is a pointer to an object of type X (implemented as B) and B is a derived class of X that has a method g. you need to write ((B*)px)->g() to call g from px. A simpler syntax could have been px->B::g().

[History of C++] Templates: from C-style macros to concepts

Author: Chloé Lourseyre
Editor: Peter Fordham

Introduction: Parametrized types

Templates are the C++ feature (or group of features, as the word is used in several contexts) that implement parametrized types.

The notion of a parametrized type is very important in modern programming and consists of using a type as a parameter of a feature, which means you can use that feature with different types, the same way you use a feature with different values.

The most simple example is with std::vector. When you declare a vector as such: std::vector<int> foo;, the type int is parametrized. You could have put another type, like double, void* a user-defined class or even another list instead of int.

It is a way to achieve metaprogramming, a programming technique that aims to apply programs on other program data.

For the rest of the article I will use the word “template” to refer to either the notion of parametrized types or the C++ template implementation (unless I want to explicitly make the distinction).

Before Templates

Before the creation of template, in early C++, people still had to writer C-style macros to emulate templates.

One way of doing this is like that:

foobar.h
void foobar(FOOBAR_TYPE my_val);
foobar.cpp
void foobar(FOOBAR_TYPE my_val)
{
    // do stuff
}
main.cpp
#define FOOBAR_TYPE int
#include "foobar.h"
#include "foobar.cpp" // Only do this in a source file
#undef FOOBAR_TYPE

#define FOOBAR_TYPE double
#include "foobar.h"
#include "foobar.cpp" // Only do this in a source file
#undef FOOBAR_TYPE

int main()
{
    int toto = 42;
    double tata = 84;
    foobar(toto);
    foobar(tata);
}

Don’t do this at home, though! This is not something we ought to do nowadays (especially the #include "foobar.cpp" part). Also note that that code takes advantage of function overloading and therefore does not compile in C.

With our modern eyes, this seems very limited and error-prone. But the interesting thing is that even before C++ templates were implemented in early C++ design teams could use the macro approach to gain experience with them.

Timing

Templates were introduced in Release 3.0 of the language, in October 1991. In The Design and Evolution of C++, Stroustrup reveals that it was a mistake to introduce this feature so late, and in retrospect it would have been better to do so in Release 2.0 (June, 1989) at the expense of less important features, like multiple inheritance:

Also, adding multiple inheritance in Release 2.0 was a mistake. Multiple inheritance belongs in C++, but is far less important than parameterized types — and to some people, parameterized types are again less important than exception handling.

Bjarne Stroustrup, The Design and Evolution of C++, chapter 12: Multiple Inheritance, §1 – Introduction

From today, it is clear that Stroustrup was right and that templates have impacted the scenery of C++ much, much more than multiple inheritance.

This addition came late because it was really time-consuming to explore the design and implementation issues.

Needs and goals

The original need for templates was to express parametrization of container classes. But to do that job, macros were too limited. They fail to obey scope and type rules and don’t interact well with tools (especially debuggers). Before C++ templates, it was very hard to maintain code that used parametrized types, you needed the lowest level of abstraction and you needed to add each parametrized type manually.

The first concerns regarding templates were whether templates would be easy to use and templated objects be as easy to use as hand-coded objects, whether the compilation and linking speed would be significantly impacted and whether they would be portable.

The build process of templates

Syntax

The angle brackets

Designing the syntax of a feature is not an easy job, and requires extensive discussion and refinement.

The choice of the brackets <...> for the template parameter was made because, even though parentheses would have been easier to parse, they are overused in C++ and brackets are (empirically) more pleasant for a human reader.

However, this causes a problem for nested brackets, such as:

List<List<int>> a;

In the code above, in earlier C++, you would get a compilation error. The closing >> are seen by the compiler as operator>> and not two closing brackets.

A lexical trick was added later in the language (in C++141) so that this was not seen as a syntax error anymore.

The template argument

Initially, the template argument would have been placed just after the object name:

class Foo<class T>
{
    // ...
};

However, that caused two major issues:

  • It is a bit too hard to read for parsers and humans. Since the template syntax in nested within the syntax of the class, it is a bit tough to detect.
  • In the case of function templates, the templated type can be used before it is declared. For instance, in this declaraction: T at<class T>(const std::vector<T>& v, size_t index) { return v[index]; }, since T is the return type it is parsed before we even know it is a template parameter.

Both issues are resolved if we put the template argument before the declaration, and this is what was done:

template<class T> class Foo
{
    // ...
};

template<class T> T at(const std::vector<T>& v, size_t index) { return v[index]; }

Constraints of template parameters

In C++, the constraints on template arguments are implicit2.

The dilemma over if the constraints should be explicit in the template argument (like below) or if they should be deduced from the usage occurred. An example of such explicit constraint would be like this:

template < class T {
        int operator==(const T&, const T&); 
        T& operator=(const T&);
        bool operator<(const T&, int);
    };
>
class Foo {
    // ...
};

But this was judged to be way to verbose to be readable and it would require way more templates for the same number of features. Moreover, this kind of over-restricts the class you’re implementing, giving constraints that excludes some implementations that would have been perfectly fine and correct without them3.

However, having explicit constraints was not off the table, but it is just that function type is a too specific way to express this.

This could have been achieved through derivation: by specifying that your templated type must derive from another class, you can have explicit constraint on this type.

template <class T>
class TBase {
    int operator==(const T&, const T&); 
    T& operator=(const T&);
    bool operator<(const T&, int);
};

template <class T : TBase>
class Foo {
    // ...
};

However this generates more issues. The programmers are, because of that, encouraged to express constraints as classes, leading to an overuse of inheritance. There is a loss in expressivity and semantics, because “T must have be comparable to an int” become “T must inherit from TBase”. In addition to that, you could not express constraints on type that can’t have a base class, like ints and doubles.

This is mainly why we did not have explicit constraints on template parameters in C++4 for a long time.

However, the discussion on template constraints was revived in the late 2010s and a new notion made its appearance in C++20: Concepts (c.f. Modern evolutions – Concepts below).

Templated object generation

How templates are compiled is very simple: for every set of template parameters that is used on the templated object, the compiler will generate a implementation of this object using explicitly those parameters.

So, writting this:

template <class T> class Foo { /* ... do things with T ... */ };
template <class T, class U> class Bar { /* ... do things with T  and U... */ };

Foo<int> foo1;
Foo<double> foo2;
Bar<int, int> bar1;
Bar<int, double> bar2;
Bar<double, double> bar3;
Bar< Foo<int>, Foo<long> > bar4;

Is the same thing as writing this:

class Foo_int { /* ... do things with int ... */ };
class Foo_double { /* ... do things with double ... */ };
class Foo_long { /* ... do things with long ... */ };
class Bar_int_int { /* ... do things with int  and int... */ };
class Bar_int_double { /* ... do things with int  and double... */ };
class Bar_double_double { /* ... do things with double  and double... */ };
class Bar_Foo_int_Foo_long { /* ... do things with Foo_int  and Foo_long... */ };

Foo_int foo1;
Foo_double foo2;
Bar_int_int bar1;
Bar_int_double bar2;
Bar_double_double bar3;
Bar_Foo_int_Foo_long bar4;

… only it is more verbose (even more in real code) and less readable.

Class templates

Templates were imagined and designed primarily for classes, mostly to allow for the implementation of standard containers. They were designed to be as simple to use as standard classes and as efficient as macros. These two facts were decided so that low-level arrays could be abandoned when they were not specifically needed (in low-level programming) and that templatized containers would be preferred in the higher levels.

In addition to type argument, template can take non-type argument, like this:

template <class T, int Size>
class MyContainer {
    T m_collection[Size];
    int m_size;
public:
    MyContainer(): m_size(Size) {}
    // ...
};

This was introduced in order to allow for static sizing of containers. Carrying the size in the type information make the implementations more efficient, because you don’t have to track it separately and you don’t loose it through pointer decay as you do with C-style arrays.

class Foo;

int main()
{
    Foo[700] fooTable; // low-level container
    MyContainer<Foo, 700> fooCnt; // high-level container, as efficient as the previous one
}

Function templates

The idea of function templates comes from the need of having templated class methods and from the idea that function templates are in the logical extension of class templates.

Today, the most obvious examples we can provide are the STL algorithms (std::find(), std::find_first_of() , std::merge(), etc.). Though at its creation, the STL algorithms did not exist, it was these kind of functions that inspired function templates (the most symbolic being sort()).

The main issue with function templates was deducing the function template arguments and return type, so we don’t have to explicitly specify them at each function call-site.

In this context, it has been decided that template argument could be deduced (when possible) and specified (when needed). This was extremely useful to specify return values, because they can not always be deduced, such as in this example:

template <class TTo, class TFrom>
TTo convert(TFrom val)
{
    return val;
}

int main()
{
    int val = 4;
    convert(val); // Error: TTo is ambiguous
    convert<double, int>(val) // Correct: TTo is double; TFrom is int
    convert<double>(val) // Correct: TTo is double; TFrom is int; 
}

As you can see on line 12, the trailing template arguments can be omitted as can be trailing function arguments (when they have a default value).

The way templates are generated (see section Templated objects generation section above) works perfectly fine with function overloading. The only subtlety is when there are both non-templated and templated overloads of a function. Then, the non-templated overload is called if there is a perfect match, else it will be the templated overload that will be called if there is a perfect match possible, else we apply ordinary overload resolution.

Template instantiation

At the beginning, explicit template instantiation was not intended. It was because it would create hard issues in some specific circumstances, like if two unrelated parts of a program both request the same instantiation of a templated object, which would have to be done without code replication and without disturbing dynamic linking. This is why implicit templates instantiation was preferred at first.

The first automatic implementation for template instantiation was as follows: when the linker is run, it searches for missing template instantiations. If found, the compiler is invoked again to produce the missing code. This process is repeated until there are no more missing template instantiations.

However, this process had several problems, one of them being very poor compile and link performance.

It is to mitigate this that optional explicit template instantiation is allowed.

The development of the template instantiation process had many more issues, such as the point of instantiation (the “name problem”, i.e. pinpointing which declaration the names used in a template definition refer to), dependencies problems, solving ambiguities, etc. Discussing all of these would require a dedicated article.

Modern evolutions

Templates are a feature that continued to evolve even as we entered the Modern C++ era (beginning with C++11).

Variadic templates

Variadic templates are templates that have at least one parameter pack. In C++, a parameter pack is a way to say that a function or a template has a variable number of arguments.

For example, the following function declaration uses a parameter pack:

void foobar(int args...);

And can be called with any number of argument (greater than one).

foobar(1);
foobar(42, 666);
foobar(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);

Variadic templates allows you to have a variable number of arguments, that can be of different types.

With that, we can write more generic functions. For instance:

#include <iostream>

struct MyLogger 
{
    static int log_counter;

    template<typename THead>
    static void log(const THead& h)
    {
        std::cout << "[" << (log_counter++) << "] " << h << std::endl;
    }

    template<typename THead, typename ...TTail>
    static void log(const THead& h, const TTail& ...t)
    {
        log(h);
        log(t...);
    }
};

int MyLogger::log_counter = 0;

int main()
{
    MyLogger::log(1,2,3,"FOO");
    MyLogger::log('f', 4.2);
}

This generates the following output:


[0] 1
[1] 2
[2] 3
[3] FOO
[4] f
[5] 4.2

It is safe to assume that a significant motivation for the addition variadic templates and parameter packs was to be able to allow the implementation more generic functions, even if it may lead to voluminous generated code (for instance, in the previous example, the MyLogger class has 8 instantiations of the function log5).

Full details are available here: Parameter pack(since C++11) – cppreference.com.

Concepts

Concepts are a C++20 feature that aims to give the developer a way to declare constraints over template parameters. This leads to clearer code (with a higher level of abstraction) and clearer error message (if any).

For instance, here is a concept declaration:

template<typename T_>
concept Addable = requires(T_ a, T_ b)
{
    a + b;
};

And here are example of its usage:

template<typename T_>
requires Addable<T_>
T_ foo(T_ a, T_ b);

template<typename T_>
T_ bar(T_ a, T_ b) requires Addable<T_>;

auto l = []<typename T_> requires Addable<T_> (T_ a, T_ b) {};

Before that, template-related error were barely readable. Concepts were a highly anticipated feature of C++20.

A good overview of concepts can be found on Oleksandr Koval’s blog: All C++20 core language features with examples | Oleksandr Koval’s blog (oleksandrkvl.github.io)

Deduction guides

Template deduction guides are a C++17 feature and are patterns associated with a templated object that tell the compiler how to translate a set of parameter (and their types) into template arguments.

For instance:

template<typename T_>
struct Foo
{
  T_ t;
};
 
Foo(const char *) -> Foo<std::string>;
 
Foo foo{"A String"};

In this code, the object foo is a Foo<std::string> and not a Foo<const char*>, and thus foo.t is a std::string. Thanks to the deduction guide, the compiler understand that3 when we use a const char*, we want to use the std::string instantiation of the template.

This is peculiarly useful for object such as vectors, which can have this kind of constructor:

template<typename Iterator> vector(Iterator b, Iterator e) -> vector<typename std::iterator_traits<Iterator>::value_type>;

This way, if we call the vector constructor with an iterator, the compiler will be able to deduce the templated parameter of the vector.

Substitution Failure Is Not An Error

Substitution Failure Is Not An Error, SFINAE for short, is a rule that applies during template overloaded function resolution.

It basically means that if the (deduced or explicitly specified) type for the template parameter fails, the specialization is discarded instead of causing a compile error.

For instance, take the following code:

struct Foo {};
struct Bar { Bar(Foo){} }; // Bar can be created from Foo
 
template <class T>
auto f(T a, T b) -> decltype(a+b); // 1st overload
 
Foo f(Bar, Bar);  // 2nd overload
 
Foo a, b;
Foo x3 = f(a, b);

Instinctively, we could think that this is the first overload that is called on the highlighted line (because the template instantiation using Foo as T is a better overload that the second one, which requires a conversion).

However, the expression (a+b) is ill-formed with Foo. Instead of generating an error, the overload auto f(Foo a, Foo b) -> decltype(a+b); is discarded. Thus, this is the other overload that is called, with an implicit conversion.

This kind of substitution occurs in all types used in the function type, all types used in the template parameter declarations. Since C++11, it also occurs in all expressions used in the function type and all expressions used in a template parameter declaration. Since C++20, it also occurs in all expressions used in the explicit specifier.

The full documentation about SFINAE can be found here: SFINAE – cppreference.com.

Other features in C++20

Templates continue to evolve. Here are a small list of the C++20 templates feature I couldn’t fit in this article:

  • Template parameter list for generic lambdas. Sometimes generic lambdas are too generic. C++20 allows to use familiar template function syntax to introduce type names directly.
  • Class template argument deduction for aggregates. In C++17 to use aggregates with class template argument deduction we need explicit deduction guides, that’s unnecessary now.
  • Class types in non-type template parameters. Non-type template parameters now can be of literal class types.
  • Generalized non-type template parameters. Non-type template parameters are generalized to so-called structural types.
  • Class template argument deduction for alias templates. Class template argument deduction works with type aliases now.

Exceptions and templates: two sides of the same coin

I did not talk about exceptions in this article, but for Stroustrup, exceptions and templates are complementary features:

To my mind, templates and exceptions are two sides of the same coin: templates allow a reduction in the number of run-time errors by extending the range of problems handled by static type checking; exceptions provide a mechanism for dealing with the remaining run-time errors. Templates make exception handling manageable by reducing the need for run-time error handling to the essential cases. Exceptions make general template-based libraries manageable by providing a way for such libraries to report errors.

Bjarne Stroustrup, The Design and Evolution of C++, chapter 15: Templates, §1 – Introduction

So, by design, templates and exception are closely intermingled, in addition to raising the level of abstraction for error-handling.

However, exception and templates (especially templates) have evolved greatly since then, so I think this may not be true anymore.

Wrapping up

In my opinion, templates are the biggest fish in the C++ metaphorical pond. We will never talk enough about them, and I suspect they will continue to evolve for decades.

This is so because in modern C++ one of the key idea is to write intentions instead of actions. We want higher levels of abstraction and more metaprogramming. It is only normal that template are at the hearts of the modern evolutions of the language.

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Sources

Notes

1. I managed to locate this change in the GCC compiler at release 6 (https://godbolt.org/z/vndGdd7Wh), suggesting that this indeed occurred with C++14. I managed to see the same thing with the clang compiler at release 6 (https://godbolt.org/z/ssfxvb4cM), proving this right.

2. This is called duck-typing, from the saying if it looks like a duck, swims like a duck and it quacks like a duck then it probably is a duck.

3. I have no concrete example to provide and I’m pretty much paraphrasing Stroustrup in his retrospection, but the idea is that by having user-defined constraints, you close some doors that you didn’t even know existed and that others could have exploited. I’ve done and seen very interesting things using templates, and the fact that the only constraint we have is that the templated code makes sense with their given parameters opens up as many possibilities as we can imagine.

4. There were other tries to imagine a way to specify constraints, but to no avail. More details in section §15.4 of Stroustrup: The Design and Evolution of C++.

5. These instantiations are (and according to the assembly – Compiler Explorer (godbolt.org)):

  • log(int);
  • log(char[4]);
  • log(char);
  • log(double);
  • log(int,int,int,char[4]);
  • log(int,int,char[4]);
  • log(int,char[4]);
  • log(char,double);

[History of C++] Explanation on why the keyword `class` has no more reason to exist

Author: Chloé Lourseyre
Editor: Peter Fordham

Introduction to the new concept : History of C++

A few months back (at the start of this blog) I was thinking about interesting things you can find in C++, then I realized one thing: the keyword class doesn’t really have a good reason to exists!

This may seem a bit harsh said that way, but I will explain my statement later in the article.

Anyway, studying the reason for the word class to exist lead me to look into the history of the language. It was very interesting and I really enjoyed it.

So I decided to write a mini-series of articles about the history of C++ aiming to present concepts that may be outdated today, in C++20, and explain why some strange or debatable things were made that way in the past.

Sources

For this miniseries, I have three main sources:

I have a few things to say about them.

First, Sibling Rivalry: C and C++ and A History of C++: 1979-1991 are both fully available on Stroustrup’s website (just follow the links).

Second, I find it quite sad that I could only find sources from one author. Sure, Bjarne Stroustrup is probably the best individual to talk about his own work, but I would have like the insight of other authors (if you know any, please tell me in the comments).

Why the keyword class could simply not exist in C++20?

Today, in C++20, we have two very similar keywords that work almost exactly the same: class and struct.

There is one and only one difference between class and struct: by default, the members and the inheritance of struct are public, but in those of class are private.

Here are three reason as to why this tiny difference is not worth a different keyword1:

  • In practice, the default access modifier is almost never used, in my experience. Most developers don’t use default access modifier and prefer to specify it.
  • In 2021, a good code is a clear code. Explicitly writing the access specifier is -in that regard- better than leaving it implicit. This may be arguable on solo or small projects, but when you start to develop with many peers, it is better to write a few more characters to be sure that the code is clear for everyone.
  • Having two keywords is more ambiguous than having one. I have very often encountered developers that think there are more differences than that between class and struct, and they have sometimes even told me what they thought these additional differences were. If there was only one keyword, this confusion wouldn’t exist.

I can hear that some people have counter-arguments. The ones I heard a lot are:

  • This is syntactic sugar2.
  • They actually use the implicit access specifier3.
  • There is semantics behind the use of each keyword that go beyond the sole technical meaning4.

All in all, what I’m trying to say is that the language would practically be the same if the keyword class did not exist. So, in the mindset of C++20, we could ask ourselves “what is the purpose of adding a keyword that is neither needed nor useful?”.

I know one thing: that class is one of the oldest C++-specific keywords. Let’s dive into the history of C++ to understand why it exists.

History of the keyword class

Birth

The first official appearance of the keyword class was in Classes: An Abstract Data Type Facility for the C language (Bjarne Stroustrup, 1980), and was actually not talking about C++, but about what we call C with classes.

What is “C with classes”? I think I’ll dedicate a whole article to this subject so I’ll keep it short here. It is C++ immediate predecessor, started in 1979. The original goal was to add modularity to the C language, inspired by Simula5 classes. At first, it was a mere tool, but soon evolved to a whole language. However, since C with Classes was a mild success while needing continuous support, Bjarne decide to abandon it and create a new language, using his experience of C with Classes, and that aimed to be more popular. He called this new language C++.

The choice of the word “class” directly comes from Simula, and the fact that Stroustrup dislikes inventing new terminology.

You’ll find more about C with Classes in the book The Design and Evolution Of C++ (Bjarne Stroustrup), where a whole section is dedicated to it.

So the keyword class was actually born within the predecessor of C++. In terms of design, it’s one of the oldest concepts and even the motivation behind the creation of C with classes.

Original difference between struct and class

In C with Classes, struct and class and quite different.

Structures works just like in C, they are simple data structure, whereas it is within classes that the concept of methods are introduced.

At that time, there was a real difference between structures and classes, thus the distinction6.

Into C++

The two of the greatest features of early C++ were virtual functions and functions overloading.

In addition to that, namespaces rules where introduced to define on how scopes names would behave. Among those rules:

  • Names are private unless they are explicitly declared public.
  • A class is a scope (implying that classes nest properly).
  • C structure names don’t nest (even when they are lexically nested).

These rules make structures and classes behave differently in terms of scopes and names.

For instance, this was legal:

struct outer {
    struct inner {
        int i;
    };
};

struct inner a = { 1 };

But replacing struct with class provoked a compilation error.

In later C++, the code above doesn’t compile (it needs to be outer::inter a = {1};).

“Fusion” with the keyword struct

It’s difficult to say when this occurred specifically, because none of the sources I found clearly state “This is when structures and classes became the same concept, but we can investigate.

According to The C++ Programming Language – Reference Manual (Bjarne Stroustrup, 1984), the first published document specifically about C++:

(listing derived types)

classes containing a sequence of objects of various types, a set of functions for manipulating these objects, and a set of restrictions on the access of these objects and function;

structures which are classes without access restrictions;

Bjarne Stroustrup, The C++ Programming Language – Reference Manual, §4.4 Derived types

Moreover, if we take a look at the feedback Stroustrup gives about virtual functions and the object layout model (concepts introduced in 1986):

At his point, the object model becomes real in the sense that an object is more than the simple aggregation of the data members of a class. […] Then why did I not at this point choose to make structs and classes different notions?

My intent was to have a single concept: a single set of layout rules, a single set of lookup rules, a single set of resolution rules, etc. […]

Bjarne Stroustrup, The Design and Evolution of C++, §3.5.1 The Object Layout Model

Even though it seems that structures couldn’t hold private members at that time (the private keyword didn’t even exist then!), we can safely say this was the moment where structures and classes were “fused”.

It was at the creation of C++ that the structures from C and the classes from Simula merged together.

But when did they actually become the same?

But technically, structs and classes became what they are today the moment the keyword private was invented. It seems that happened the same time the keyword protected was introduced, for Release 1.2 in 1987.

From then to now

Despite all that, despite the fact that class is technically useless, there is a lot more to it.

The keyword class has acquired semantics.

Indeed, nowadays, writing the keyword class means that you are implementing a class that is not a data bucket, whereas the keyword struct is mainly used for data buckets and similar data structures. In their technicity, these words do not differ, but because of usage, because these keywords have history, they acquired more meaning than the sole technical meaning.

Go see Jonathan Boccara’s very relevant article: The real difference between struct and class – Fluent C++ (fluentcpp.com) for more details. This article is inspired from the Core Guidelines.

The fact that nowadays’ class has lived more than forty years makes it very different from the class from 1980 and the class that it would be if it were introduced in 2020.

But the question that immediately pops up in my head is the following: should we continue to use class this way? Should we stick to the semantics it acquired or should we seek evolution towards a more modern meaning? The answer is simple: it’s up to each of us. We, C++ devs, are the ones that make the language evolve, every day, in every single line of code.

The Core Guidelines tells us how we should use each feature today, but maybe tomorrow someone (you?) will come up with a better, safer, clearer way to code. Understanding what were structs and classes in the past and what they are today is the first step to define what they will be tomorrow.

Wrapping up

The best way to summarize the history of these two keyword is this way: “The structures from C and the classes from Simula merged together at the creation of C++.”, but we can also say that thanks to that, despite representing the same feature, the both have different meaning.

This article is not a pamphlet against class and I will not conclude this article with an half-educated half-authoritative argument like I often do7. Instead, I will tell you that I realized how important it is to contextualize articles like the ones I publish on this blog with the version of C++ that is used and the articles and books that are used as examples and inspirations.

I think it s important to understand history to be able to judge the practices we have today. Do we do something out of habit or is there a real advantage to it? It s a question that needs to be asked every day, or else we’ll end up writing outdated code in an outdated mindset.

How C++ developers think8 evolves from decade to decade. Each eon, developers have different mindsets, different goals, different issues, a different education and so on… I don’t blame how people coded in the past, but I do blame those who code now like we coded a decade ago, and in the future I hope to see my peers blaming me when I code in “old C++”.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

To go further

Here are two thoughts that are a bit off-topic.

Namespaces in C

Something that wasn’t inherited from C into C++ was the struct namespace.

Indeed, in C, the namespace containing the struct names isn’t the same as the gobal C namespace. In C struct foo and foo aren’t refereeing to the same object. In C++, assuming that foo is a structure, struct foo and foo are the same name.

There is a way, in C, to link these two namespace, using typedef. To learn more about that, read this article: How to use the typedef struct in C (educative.io)

The class keyword in templates

Maybe have you already seen this syntax:

template <class C_>
void foo(C_ arg)
{
    // ...
}

What does class mean in this context?

It actually means nothing more that “a type, whatever which one”.

This is a bit confusing, because we may think that, as type, C_ is supposed to be a class. But it is not. Later, the typename keyword was introduced to lift this confusion, but we can still use class if we want to.

Annotations

1. Since struct and class are so similar, I choose to consider class to be the keyword in excess, simply because struct exists in C and not class, and that it is the process of the keyword class that brought them both so close.

2. This argument is not true. By essence, syntactic sugar is supposed to make the code easier to read or write. Since struct/class is just a substitute of a keyword for another, there is no gain in clarity whatsoever. The only reason the code may seem easier to read is because of the semantics these keyword hold, but not the syntax itself.

3. Yes, I know, some people actually use the implicit. I do too. But what you forget when you say such a thing is that most of the software industry doesn’t think like you nor write code like you. The statement of the struct/class duplicity comes from an empiric fact. Our own individual practices can never be an argument against that fact.

4. While this is factually true, the reasoning is upside-down. It’s because of their history that they have different semantic meaning. If they were created today, from nothing, they would not have that semantics, only their duplicity. I’ll talk about that near the end of the article.

5. Simula is the name of two simulation programming languages, Simula I and Simula 67, developed in the 1960s at the Norwegian Computing Center in Oslo, and is considered the first object-oriented programming language. Simula is very unknown amongst the community of developers, but has greatly influenced other famous languages. Simula-type objects are reimplemented in Object Pascal, Java, C#, and, of course, C++.

6. At that time, you could emulate any structure with a class, but it was still interesting, all the more in the mindset of then, to make the distinction.

7. I tend to always agree with the C++ Core Guidelines (isocpp.github.io), even if I am always trying to be critical as of our habits and practices. But keep in mind that the guidelines we have today may be different from the one we’ll have tomorrow.

8. I think this statement is actually true for every languages, but C++ is the perfect archetype as it is one of the oldest most-used language on the market, today in 2021.

Sources

In order of appearance:

Off-topic: Feedspot’s Top 30 C++ Programming Blogs and Websites

Recently, Feedspot made a top-list of the best C++ programming blogs and website, and Belay the C++ ended up in 9th position!

You can find the top-list here: Top 30 C++ Programming Blogs and Websites You Must Follow in 2021 (feedspot.com)

Yet another reason to not use printf (or write C code in general)

Author: Chloé Lourseyre

Recently, Joe Groff @jckarter tweeted a very interesting behavior inherited from C:

Obviously, it’s a joke, but we’re gonna talk more about what’s happening in the code itself.

So, what’s happening?

Just to be 100% clear, double(2101253) does not actually double the value of 2101253. It’s a cast from int to double.

If we write this differently, we can obtain this:

#include <cstdio>

int main() {
    printf("%d\n", 666);
    printf("%d\n", double(42));
}

On the x86_64 gcc 11.2 compiler, the prompt is as follows:

666
4202506

So we can see that the value 4202506 has nothing to do with the 666 nor the 42 values.

In fact, if we launch the same code in the x86_64 clang 12.0.1 compiler, things are a little bit different:

666
4202514

You can see the live results here: https://godbolt.org/z/c6Me7a5ee

You may have guessed it already, but this comes from line 5, where we print a double as an int. But this is not some kind of conversion error (of course that your computer knows how to convert from double to int, it will do it fine if this was what was happening), the issue comes from somewhere else.

The truth

If we want to understand how it works that way, we’ll have to take a look at the assembly code (https://godbolt.org/z/5YKEdj73r):

.LC0:
        .string "%d\n"
main:
        push    rbp
        mov     rbp, rsp
        mov     esi, 666
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     rax, QWORD PTR .LC1[rip]
        movq    xmm0, rax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 1
        call    printf
        mov     eax, 0
        pop     rbp
        ret
.LC1:
        .long   0
        .long   1078263808

(use this Godbolt link to have a clearer matching between the C++ code and the assembly instructions: https://godbolt.org/z/5YKEdj73r)

In the yellow zone of the assembly code (lines 6-to 9, the equivalent to printf("%d\n", 666);) we can see that everything’s fine, the 666 value is put in the esi register and then the function printf is called. So it’s an educated guess to say that when the printf function reads a %d in the string it is given, it’ll look in the esi register for what to print.

However, we can see in the blue part of the code (lines 10 to 14, the equivalent to printf("%d\n", double(42));) the value is put in another register: the xmm0 register. Since it is given the same string as before, it’s pretty guessable that the printf function will look into the esi register again, whatever there is in there.

We can prove that statement pretty easily. Take the following code:

#include <cstdio>

int main() {
    printf("%d\n", 666);
    printf("%d %d\n", double(42), 24);
}

It’s the same code, with an additional integer that is print in the second printf instruction.

If we look at the assembly (https://godbolt.org/z/jjeca8qd7):

.LC0:
        .string "%d %d\n"
main:
        push    rbp
        mov     rbp, rsp
        mov     esi, 666
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     rax, QWORD PTR .LC1[rip]
        mov     esi, 24
        movq    xmm0, rax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 1
        call    printf
        mov     eax, 0
        pop     rbp
        ret
.LC1:
        .long   0
        .long   1078263808

The double(42) value still goes into the xmm0 register, and the 24 integer, logically, ends up in the esi register. Thus, this happens in the output:

666
24 0

Why? Well, since we asked for two integers, the printf call will look into the first integer register (esi) and print its content (24, as we stated above), then look in the following integer register (edx) and print whatever is in it (incidentally 0).

In the end, the behavior we see occurs because of how the x86_64 architecture is made. If you want to learn more about that, follow these links:

What does the doc say?

The truth is that according to the reference (printf, fprintf, sprintf, snprintf, printf_s, fprintf_s, sprintf_s, snprintf_s – cppreference.com):

If a conversion specification is invalid, the behavior is undefined.

And this same reference is unambiguous about the %d conversion specifier:

converts a signed integer into decimal representation [-]dddd.
Precision specifies the minimum number of digits to appear. The default precision is 1.
If both the converted value and the precision are ​0​ the conversion results in no characters.

So, giving a double to a printf argument where you are supposed to give a signed integer is UB. So it was our mistake to write this in the first place.

This actually generates a warning with clang. But with gcc, you’ll have to activate -Wall to see any warning about that.

Wrapping up

The C language is a very, very old language. It’s older than the C++ (obviously) that is itself very old. As a reminder, the first edition of the K&R has been printed in 1978. This was thirteen years before my own birth. And unlike us humans, programming languages don’t age well.

I could have summarized this article with a classic “don’t perform UB”, but I think it’s a bit off-purpose this time. So I’ll go and say it: don’t use printf at all.

The problem is not with printf itself, it’s with using a feature from another language1 that was originally published forty-three years ago. In short: don’t write C code.

Thanks for reading and see you next week!

1. Yeah, like it or not, but C and C++ and different languages. Different purpose, different intentions, different meta. That is exactly why I always deny job offers that have the tag “C/C++” because they obviously can’t pick a side.

Author: Chloé Lourseyre

Best ways to convert an enum to a string

Author: Chloé Lourseyre

One of the oldest problem C++ developers ever encountered is how to print the value of an enum type.

All right, I may be a little overly dramatic here, but this is an issue many C++ developers encountered, even the most casual ones.

The thing is, there isn’t one true answer to that issue. It depends on many things such as your constraints, your needs, and, as always, the C++ version of your compiler.

This article is a small list of ways to add reflection to enums.

NB: If you know of a way that isn’t listed here and has its own benefits, feel free to share in the comments.

Magic Enum library

Magic Enum is a header-only library that gives static reflection to enums.

You can convert from and to strings and you can iterate over the enum values. It adds the “enum_cast” feature.

Find it here: GitHub – Neargye/magic_enum: Static reflection for enums (to string, from string, iteration) for modern C++, work with any enum type without any macro or boilerplate code

Drawbacks

  • It’s a third-party library.
  • Only works in C++17.
  • You need specific versions of your compiler for it to work (Clang >= 5, MSVC >= 15.3 and GCC >= 9).
  • You have a few other constraints related to the implementation of the library. Check the Limitations page of the documentation (magic_enum/limitations.md at master · Neargye/magic_enum · GitHub)

Using a dedicated function with an exception

Static version

constexpr is a magnificent tool that allows us to statically define stuff. When used as the return value of a function, it allows us to evaluate the return value of the function at compile time.

In this version, I added an exception in the default, so if it happens so that an item is added, the exception will be raised.

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

constexpr const char* EsperToString(Esper e) throw()
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
        default: throw std::invalid_argument("Unimplemented item");
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Dynamic version

The thing is, having several returns in a constexpr function is C++14. Prior to C++14, you can remove the constexpr specifier to write a dynamic version of this function

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

const char* EsperToString(Esper e) throw()
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
        default: throw std::invalid_argument("Unimplemented item");
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Prior to C++11, you can remove the enum class specifier and use a plain enum instead.

Drawbacks

  • Having several returns in a constexpr function is C++14 (for the static version).
  • Specific to each enum and very verbose.
  • Is exception-unsafe.

Using a dedicated exception-safe function

Static version

Sometimes you prefer a code that doesn’t throw, no matter what. Or maybe you are like me and compile with -Werror. If so, you can write an exception-safe function without the default case.

You just have to watch out for warnings when you need to add an item.

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

constexpr const char* EsperToString(Esper e) noexcept
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Dynamic version

Again, a dynamic version without constexpr:

#include <iostream>

enum class Esper { Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek };

const char* EsperToString(Esper e) noexcept
{
    switch (e)
    {
        case Esper::Unu: return "Unu";
        case Esper::Du: return "Du";
        case Esper::Tri: return "Tri";
        case Esper::Kvar: return "Kvar";
        case Esper::Kvin: return "Kvin";
        case Esper::Ses: return "Ses";
        case Esper::Sep: return "Sep";
        case Esper::Ok: return "Ok";
        case Esper::Naux: return "Naux";
        case Esper::Dek: return "Dek";
    }
}

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Prior to C++11, you can remove the enum class specifier and use a plain enum instead.

Drawbacks

  • Having several returns in a constexpr function is C++14 (for the static version).
  • Specific to each enum and very verbose.
  • The warnings are prone to be ignored.

Using macros

Macros can do many things that dynamic code can’t do. Here are two implementations using macros.

Static version

#include <iostream>

#define ENUM_MACRO(name, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10)\
    enum class name { v1, v2, v3, v4, v5, v6, v7, v8, v9, v10 };\
    const char *name##Strings[] = { #v1, #v2, #v3, #v4, #v5, #v6, #v7, #v8, #v9, #v10};\
    template<typename T>\
    constexpr const char *name##ToString(T value) { return name##Strings[static_cast<int>(value)]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Dynamic version

Very similar to the static one, but if you need this in a version prior to C++11, you’ll have to get rid of the constexpr specifier. Also, since it’s a pre-C++11 version, you can’t have enum class, you’ll have to go with a plain enum instead.

#include <iostream>

#define ENUM_MACRO(name, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10)\
    enum name { v1, v2, v3, v4, v5, v6, v7, v8, v9, v10 };\
    const char *name##Strings[] = { #v1, #v2, #v3, #v4, #v5, #v6, #v7, #v8, #v9, #v10 };\
    const char *name##ToString(int value) { return name##Strings[value]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Kvin) << std::endl;
}

Drawbacks

  • Uses macros (I could — has and will — write an article about why macros are bad practice in C++, but won’t do it here. For now, just keep in mind that if you don’t know why macros can be bad, then you shouldn’t use them)
  • You need to write a different macro each time you need a reflective enum with a different number of items (with a different macro name, which is upsetting).

Using macros and boost

We can work around the “fixed number of enum items” drawback of the previous version by using Boost.

Static version

#include <iostream>
#include <boost/preprocessor.hpp>

#define PROCESS_ONE_ELEMENT(r, unused, idx, elem) \
  BOOST_PP_COMMA_IF(idx) BOOST_PP_STRINGIZE(elem)

#define ENUM_MACRO(name, ...)\
    enum class name { __VA_ARGS__ };\
    const char *name##Strings[] = { BOOST_PP_SEQ_FOR_EACH_I(PROCESS_ONE_ELEMENT, %%, BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) };\
    template<typename T>\
    constexpr const char *name##ToString(T value) { return name##Strings[static_cast<int>(value)]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Esper::Kvin) << std::endl;
}

Here, the PROCESS_ONE_ELEMENT “converts” the item to its stringized version (calling BOOST_PP_STRINGIZE), and the BOOST_PP_SEQ_FOR_EACH_I iterates over every item of __VA_ARGS__ (which is the whole macro’s parameter pack).

Dynamic version

Again, it’s a very similar version of the static one, but without the constexpr or other C++11 specifiers.

#include <iostream>
#include <boost/preprocessor.hpp>

#define PROCESS_ONE_ELEMENT(r, unused, idx, elem) \
  BOOST_PP_COMMA_IF(idx) BOOST_PP_STRINGIZE(elem)

#define ENUM_MACRO(name, ...)\
    enum name { __VA_ARGS__ };\
    const char *name##Strings[] = { BOOST_PP_SEQ_FOR_EACH_I(PROCESS_ONE_ELEMENT, %%, BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) };\
    const char *name##ToString(int value) { return name##Strings[value]; }

ENUM_MACRO(Esper, Unu, Du, Tri, Kvar, Kvin, Ses, Sep, Ok, Naux, Dek);

int main()
{
    std::cout << EsperToString(Kvin) << std::endl;
}

Drawbacks

  • Uses macros.
  • Uses Boost.

NB: While still being a third-party library, the boost library is often more accepted than other libraries (such as the little-known Magic Enum library), that’s (inter alia) why this version may be preferred to the first one.

Wrapping up

Here is a little summary of the methods described here:

NameIs static?Is generic?3rd party librariesUses macros?Is exception-safe?
Magic EnumYes (C++17)YesYes (Magic Enum)NoNo
Function w/ exceptionYes (C++14)NoNoNoNo
Function w/o exception Yes (C++14)NoNoNoYes
MacroYes (C++11)NoNoYesYes
Macro & BoostYes (C++11)YesYes (Boost)YesYes

Again, if you know a good way to convert enums to string, please say so in the comments.

Thanks for reading and see you next week!

Author: Chloé Lourseyre

About sizes

Author: Chloé Lourseyre

If I’d come up with a pop quiz about sizes in C++, most C++ developers would fail it (I know I would), because sizes in C++ are complicated.

The size of every fundamental type is not fixed, they are always implementation-defined.

Sill, the standard defines constraints on these sizes. These constraints take the form of one of these:

  • A comparison of the types’ sizeof.
  • The type’s minimum number of bits.

What is sizeof()?

One of the most widespread (and harmless) misconceptions about type sizes is that a byte holds 8 bits.

Although this is mostly true in practice, this is technically false.

A byte is actually defined as the size of a char. Though a char must have at least 8 bits, it can hold more. sizeof(N) returns the number of bytes of the type N, and every type size is expressed in terms of a char length. Thus, a 4-byte int is at least 32-bit, but you may not assume more from it (it could be 128-bit if a char is 32-bit).

The actual size of a byte is recorded in CHAR_BIT

Summary of sizes in C++

Here are all the definitions and restrictions of fundamental types’ sizes in C++:

  • 1 ≡ sizeof(char) ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long) ≤ sizeof(long long)
  • 1 ≤ sizeof(bool) ≤ sizeof(long)
  • sizeof(char) ≤ sizeof(wchar_t) ≤ sizeof(long)
  • sizeof(float) ≤ sizeof(double) ≤ sizeof(long double)
  • sizeof(N) ≡ sizeof(unsigned N) ≡ sizeof(signed N)
  • A char has at least 8 bits
  • A short has at least 16 bits
  • A long has at least 32 bits

… and nothing more.

Fun fact: according to this definition, it is technically possible that all fundamentals have 32 bits.

Two words of wisdom

Since the sizes of the fundamental types entirely depend on the architecture, it may sometimes be hard to write consistent code.

#include <limits>

limits of the standard library contains the upper and lower limits of every fundamental type, and allows you to know if a given type is signed or not.

Example:

#include <limits>
#include <iostream>

int main()
{
    std::cout << "largest double == " << std::numeric_limits<double>::max() << std::endl;
    std::cout << "char is signed == " << std::numeric_limits<char>::is_signed << std::endl;
}

More at std::numeric_limits – cppreference.com.

Reminder: signed integer overflow is undefined behavior. Using limits helps you in preventing that from happening.

#include <cstdint>

Sometimes you may want to deal with fixed-sized types (in terms of bit) and not rely on the implementation specifics. For instance, if you implement serialization, work on very limited memory space, or develop cross-platform software, you may want to explicitly provide the bit-size of the type you’re using.

You can do so with the library cstdint which contains fixed-size types.

Here are a few of them:

int8_t
int16_t
int32_t
int64_t
Signed integer type with width of exactly 8, 16, 32 and 64 bits respectively
with no padding bits and using 2’s complement for negative values
(provided only if the implementation directly supports the type)
int_least8_t
int_least16_t
int_least32_t
int_least64_t
Smallest signed integer type with width of at least 8, 16, 32 and 64 bits respectively
intmax_tMaximum-width signed integer type
uint8_t
uint16_t
uint32_t
uint64_t
Unsigned integer type with width of exactly 8, 16, 32 and 64 bits respectively
(provided only if the implementation directly supports the type)
uint_least8_t
uint_least16_t
uint_least32_t
uint_least64_t
smallest unsigned integer type with width of at least 8, 16, 32 and 64 bits respectively
uintmax_tmaximum-width unsigned integer type

More at Fixed width integer types (since C++11) – cppreference.com.

Wrapping up

If you want to read more about type size, refer to section §6.2.8 of The C++ Langage (Bjarne Stroustrup). More broadly, you can read about types and declarations in the whole section §6 of the book.

You can also refer to Fundamental types – cppreference.com for online documentation

Thanks for reading and see you next week!

Author: Chloé Lourseyre

How to choose between a setter and a reference-getter?

Author: Chloé Lourseyre

Context

When you implement a class, you often have to implement accessors to one or several attributes of this class to edit them.

There are two ways to do that:

  • A setter, a method that takes the new value for the attribute as argument.
  • A reference-getter, a method that returns a reference to the attribute itself.

Here is a short example that shows how to access the attribute bar with each method.

class Foo {
    int m_bar;

public:
    // Setter
    void set_bar(int bar) {
        m_bar = bar;
    }

    // Reference-getter
    int & get_bar() {
        return m_bar;
    }
};

int main() {
    Foo foo;

    // Editing via setter
    foo.set_bar(42);

    // Editing via reference-getter
    foo.get_bar() = 84;

    return 0;
}

Some may argue that there are more methods to do that, but I assert that they are just variations of one of these two.

The setter

A setter is a write-only interface to a class. You provide a value, and the class is updated accordingly.

Often, it will more or less directly update an attribute by copying/moving the data you provide.

Examples

// Most simple setter
void set_foo(int foo) {
    m_foo = foo;
}
// A setter that performs a test before edition
void set_foo(Foo foo) {
    if (foo.is_valid())
        m_foo = foo;
}
// A move-setter
void set_big_foo(BigFoo && big_foo) {
    m_big_foo = std::forward<BigFoo>(big_foo);
}

The reference-getter

A reference-getter is a method that directly returns the reference to an attribute to access and edit it.

This is peculiarly useful on object attributes so you can call non-const methods directly on them.

Examples

// Here is the implementation of the reference-getter
Foo & MyClass::get_foo() {
    return m_foo;
}

// ...

// Used to edit an attribute
myClass.getFoo().bar = 42;

// Used to call a non-const method
myClass.getFoo().udpate();

How to choose

Well, this is pretty simple as soon as you understand the difference.

The setter is required when you need to re-create the data and put it in the place of the existing one. This is suited when editing simple data (integers, floating values, pointers, etc.) or if you require a brand new value. Also, use setters if you explicitly want the user to be unable to read the data, only edit it.

The reference-getter is required when it is the attribute’s data that is edited (and not the attribute as a whole). Often, you will edit a part of the attribute or call a non-const method over it.

In other words, a setter replaces the attributes, and the reference-getter edits the attribute.

Example

Take the following code:

#include <vector>

using namespace std;

struct Item
{
    bool validity;
    int value;
};

class Foo
{
public:
    Foo(size_t size) :
        m_max_size(size),
        m_data(size, {true, 0})
    {}

    void set_max_size(size_t max_size) {
        m_max_size = max_size;
    }

    Item & get_item(size_t index) {
        return m_data.at(index);
    }

    size_t get_data_size() const {
        return m_data.size();
    }

private:
    bool m_max_size;
    std::vector<Item> m_data;
};

static void set_foo_size(Foo & foo, size_t new_size)
{
    foo.set_max_size(new_size);
    for (size_t i = new_size ; i < foo.get_data_size() ; ++i)
        foo.get_item(i).validity = false;
}

There, we have a simple little class that holds a collection of data (Items). These data can be valid or invalid (true is valid, false is invalid).

Then we implement a little function that resizes the collection without deleting any element. Out-of-bounds elements are made invalid instead.

We access the value m_max_size using a setter because it’s a plain-old integer that is replaced when the collection is resized.

We access each Item of m_data using a reference-getter because we do not always want to erase the item, sometimes we just want to edit a part of it.

Alternative

We could have edited the validity using a more specific setter, like this:

class Foo {

        // ...

        void set_item_validity(size_t index, bool validity) {
                m_data.at(index).validity = validity;
        }

        // ...

};

Doing so would prevent us from editing the value of the Item, so this decision entirely depends on your implementation details.

However, please consider bad practice to provide setters to both value and validity. Doing so for a 2-attributes data bucket may not be that inconvenient, but as soon as your implementation grows your codebase will be polluted by useless accessors. You need full access? Then get full access.

Wrapping up

This may look like a trivial subject to many of you, but there are a lot of developers that are still confused and use one when they need the other. Stay vigilant.

Thanks for reading and see you next week!

Author: Chloé Lourseyre

You shouldn’t assume accessors are fast

Author: Chloé Lourseyre

“Accessors are fast.” I heard this leitmotiv so many times over the years that I couldn’t not write about it.

Explanation

What’s the use for accessors?

During the researches I made for this article, I was surprised that there actually is a formal definition of accessors1.

You can find this definition in section § 18.3.5 Accessor functions of the fourth edition of The C++ Programming Language (Bjarne Stroustrup).

Long story short, the definition says that, though is unadvised to abuse that feature, you can design a method to read and edit private attributes of a class to write a clean algorithm (I’m paraphrasing).

So, in short, accessors are used as an interface between the private members of a class and the user of this class.

Like any kind of interface, accessors can thus obfuscate how the actual access of the private members is done, to encapsulate more complex code and ease how the class is used.

1. I’m not discussing the fact that there is a pseudo-consensual definition of accessors. However, since this article aims to deconstruct that definition, I won’t rely on it to build my arguments. You can find this pseudo-consensual definition in the following paragraph, where I describe why it is a bad one.

People don’t use accessors properly…

Every day I see people using accessors improperly. Here are a few examples:

  • Writing the simplest get/set for every member of a class without thinking. If your class makes every attributes available with no subtlety and nothing else is private, then it’s not a class, it’s a struct2.
  • Writing accessors that are never used. Don’t write something you don’t need, it’s just polluting the codebase.
  • Realizing that there are not accessors for a given attribute and implementing them without questioning. Sometimes (often), there is a good reason that an attribute doesn’t have accessors.
  • Declaring a getter that is not const (even though it should be). By default, make everything const unless you need it non-const.
  • Declaring a getter that is const even though it shouldn’t be. This one is very rare, but I did see several times writing a dangerous const-cast just to make a getter const. You should avoid writing const-casts at all costs (and the examples I recall there wasn’t a good reason for it).

There are a lot of other bad practices that are too verbose to describe in this article (which purpose is not to teach you to use accessors properly).

2. Here, I abusively use the word class to name a data structure with private members and the struct to name a data structure that is nothing more than a data bucket. This is done on purpose to smooth the read of the article.

…leading to false assumptions

Bad practices lead to a bad way of thinking. And a bad way of thinking leads to false assumptions.

The most widespread false assumption over accessors is the following:

All accessors should be fast to execute.

Assuming accessors are fast is a useless limitation

Without talking about the fact that it can be harmful (if you use a function thinking it is fast, the day it is not you may have a bad surprise), by making this assumption you put a limitation on yourself.

This limitation is not effectively useful (at best, it saves you the time to read the documentation of the accessors — which should not be overlooked anyway), it restrains your ability to innovate and do something useful with your accessor.

Demonstration with a useful pattern

I’m going to show you a pattern that I love to use because it is really useful and saves a lot of lines of code.

Consider this: you have to implement a class that, given several input parameters, provides several outputs (two in our example). You need to use this class to fill two arrays with the outputs, but the parameters don’t always change (so you don’t always have to perform the computation). Since the computation is time-consuming, you don’t want to re-compute when you don’t have to3.

One (naïve) way to do this would be that (full code here https://godbolt.org/z/a4x769jra) :

class FooBarComputer
{
public:
    FooBarComputer();

    // These update the m_has_changed attribute if relevant
    void set_alpha(int alpha);
    void set_beta(int beta);
    void set_gamma(int gamma);

    int get_foo() const;
    int get_bar() const;

    bool has_changed() const;
    void reset_changed();

    void compute();

private:
    int  m_alpha, m_beta, m_gamma;
    int m_foo, m_bar;
    bool m_has_changed;
};


//  ...


//==============================
bool FooBarComputer::has_changed() const
{
    return m_has_changed;
}

void FooBarComputer::reset_changed()
{
    m_has_changed = false;
}


//==============================
// Output getters
int FooBarComputer::get_foo() const
{
    return m_foo;
}

int FooBarComputer::get_bar() const
{
    return m_bar;
}


//  ...


//==============================
// main loop
int main()
{
    std::vector<int> foo_vect, bar_vect;
    FooBarComputer fbc;

    for (int i = 0 ; i < LOOP_SIZE ; ++i)
    {
        fbc.set_alpha( generate_alpha() );
        fbc.set_beta( generate_beta() );
        fbc.set_gamma( generate_gamma() );

        if ( fbc.has_changed() )
        {
            fbc.compute();
            fbc.reset_changed();
        }

        foo_vect.push_back( fbc.get_foo() );
        bar_vect.push_back( fbc.get_bar() );
    }
}

However, it is possible to write a better version of this class by tweaking the getter, like so (full code here https://godbolt.org/z/aqznsr6KP):

class FooBarComputer
{
public:
    FooBarComputer();

    // These update the m_has_changed attribute if relevant
    void set_alpha(int alpha);
    void set_beta(int beta);
    void set_gamma(int gamma);

    int get_foo();
    int get_bar();

private:
    void check_change();
    void compute();

    int  m_alpha, m_beta, m_gamma;
    int m_foo, m_bar;
    bool m_has_changed;
};


//  ...


//==============================
void FooBarComputer::check_change()
{
    if (m_has_changed)
    {
        compute();
        m_has_changed = false;
    }
}


//==============================
// Output getters
int FooBarComputer::get_foo()
{
    check_change();
    return m_foo;
}

int FooBarComputer::get_bar()
{
    check_change();
    return m_bar;
}


//  ...


//==============================
// main loop
int main()
{
    std::vector<int> foo_vect, bar_vect;
    FooBarComputer fbc;

    for (int i = 0 ; i < LOOP_SIZE ; ++i)
    {
        fbc.set_alpha( generate_alpha() );
        fbc.set_beta( generate_beta() );
        fbc.set_gamma( generate_gamma() );

        foo_vect.push_back( fbc.get_foo() );
        bar_vect.push_back( fbc.get_bar() );
    }
}

Here are the benefits:

  • The main code is shorter and clearer.
  • You don’t need to make the compute() public anymore.
  • You don’t need has_changed(), instead, you have a check_change() that is private.
  • Users of your class will be less likely to misuse it. Since the compute() is now lazy, you are sure that no one will call the compute() recklessly.

This is that last point that is the most important. Any user could have made the mistake to omit the if conditional in the first version of the loop, rendering it inefficient.

3. If you want, you can imagine that the parameters change only once a hundred times. Thus, this is important to not call the compute() every time.

Isn’t it possible to do otherwise?

The most recurring argument I hear about this practice is the following: “Well, why don’t you just name your method otherwise? Like computeFoo()?”.

Well, the method does not always perform the computation so it’s wrong to call it that way. Plus, semantically, compute refers to a computation and does not imply that the value is returned. Even if some people do it, I don’t like to use the word compute that way.

“Then call it computeAndGetFoo() and be done with it!”

Except, again, it’s not what it does. It does not always compute, so a more appropriate name would be sometimesComputeAndAlwaysGet(), which is plain ridiculous for such a simple method.

“Then find a proper name!”

I did. It’s getFoo(). It’s what it does, it gets the foo. The fact that the compute() is lazy does not change the fact that it’s a getter. Plus, this is mentioned in the comments that the get may be costly, so read the documentation and everything will be fine.

“Couldn’t you put the check into the compute() instead of the getter?”

I could have, but to what end? Making the compute() public is useless since you need the getters to access the data.

There is a way to do otherwise…

There actually is a way to do otherwise, which is especially useful when you also need a getter than you are sure will never call the compute (this is useful if you need a const getter for instance).

In that case, I advise that you don’t use the same name for both getters, because they don’t do the same thing. Using the same name (with the only difference being the constness) will be confusing.

I personally use the following names:

  • getActiveFoo()
  • getPassiveFoo()

I like this way of writing it because you explicitly specify if the getter is potentially costly or not. Moreover, it also implies that the passive getter may give you an outdated value.

Plus, anyone who tries to call getFoo() will not succeed and have to choose between both versions. Forcing the user to think about what they write is always a good thing.

Most importantly: spread the word

Since it is useful to put performance-sensitive code in getters, you will find developers who do it.

The most dangerous behavior is ignoring that fact and letting other people assume that accessors are fast to execute.

You should spread the word around you that accessors are, at the end of the day, like any other method: they can be long to execute, and everyone must read the associated documentation before using them.

In this article, I talked about execution time performance, but this also applies to any other form of performance: memory consumption, disk access, network access, etc.

Thank you for reading and see you next week!

Author: Chloé Lourseyre

How many languages should a (C++) expert speak?

Author: Chloé Lourseyre

This article is based on a Lightning Talk I made during the CppCon 2020: How many languages a (C++) expert should speak ? – Thomas Lourseyre – CppCon 2020 – YouTube.

The speaker is a bit stressed and fumbles with words, but I’ll try and elaborate on what he wants to says.

Why learning other languages?

Assuming you are a C++ developer, I’ll explain the reasons why you would learn other languages using two quotes.

If all you have is a hammer, everything looks like a nail.

Maslow’s hammer

Say you are given the task to solve a programming problem/issue/dilemma. One of the first questions you will have to ask yourself is “What language should I use to solve this?”. If you only know C++, you can only answer “C++”. Obviously, C++ can not always be the perfect answer.

Even if you are not an expert in other languages, if you know the basics and can say “This language is better suited than C++” then all you’ll have to do is to find the appropriate expert. This is an insight you can’t have if you only know one language.

Those who cannot remember the past are condemned to repeat it.

George Santayana

The programming industry is not a pink paradise where all projects start from scratch and you have all the tools you want in their latest versions. Sometimes (or often, depending on your position in the industry) you will have to work with legacy code. Sometimes it will be written in C++ and you’ll have to port it to another language, and sometimes it will be written in another language and you will need to translate it to C++.

In both cases, this is work you can only do if you know other languages than C++.

Asking the real question

With that said, I can safely state that the real question is not “Should I learn other languages?” but rather the following:

How many other languages should I learn?

The language knowledge table

I designed a table that ranks the languages one knows depending on their knowledge of them:

Level of knowledgeDetailsPersonal examples
Expert levelIt’s OK if you only know one.C++
Practical levelLanguages you tried and loved, and you practice regularly.Python, C, Rust, Bash
Documentation levelLanguages you practiced but didn’t like that much. You know how to fetch documentation though.Java, C#, js, basic, perl
Hello word! levelYou don’t know much about it…Ruby, AS, Go, etc.

You may have noticed that my personal examples have evolved since last year – it’s a good thing.

The levels of knowledge

There are four levels of knowledge depicted in this table:

  • Expert level: These are the languages for which you are expertly reliable. A lot of developers have none, and most of the remaining have one. Some top-tier experts have several, but it requires to be very knowledgeable in both, so it’s quite rare. It is not a mandatory field and it’s OK if you have only one.
  • Practical level: These are the languages you practice regularly but you can not say you’re an expert of. Usually, these are the languages you love and use for your personal projects.
  • Documentation level: There are probably languages you tried to learn, practiced a bit but did not like. Maybe there are some languages you had to learn and practice for a specific project but immediately forgot afterward. These are languages you are not very knowledgeable in, but you know the basics and are able de to document yourself if needed.
  • Hello world! level: These are the languages you basically know nothing about. Languages that you know by name, but wouldn’t be able to give the core specification. Maybe you wrote a “Hello world!” someday, but not much more.

The only useful levels of knowledge are expert, practical, and documentation. As soon as you can at least know where to fetch documentation, you’ll be able to make something out of the language. The Hello world! level is useless in that regard.

How to promote a language?

So your goal will be to try to promote (in the table) as many languages as possible. But the efforts it takes to promote a language depend on the level of knowledge this language stands on:

  • From practical to expert: Becoming an expert in any field is time-consuming. You are not to become an expert unless you want to spend a lot of time on the language. You may want to, but it is a big investment of time you cannot do for the sole sake of being more polyvalent.
  • From documentation to practical: If you try and force yourself to practice a language you don’t love, there is a good chance you will hate the said language even more. You can’t force love, and even you may have a good surprise (and end up loving a language you didn’t like at first sight), over-practicing a language you dislike for the sake of being more polyvalent is just pointless and self-harm.
  • From Hello world! to documentation: What does it take to try and practice a new language to the point you make an opinion about it? In a matter of a few hours, you will be able to read the letter of intent of the language, learn the basics, and grasp how its documentation is organized. You may like it or not, but at least you will be a bit knowledgeable about it.

The most feasible promotion thus is from Hello world! to documentation level. If you are curious and try to learn the whys and hows of as many languages as you can, you’ll have a broader view of the world of programming.

So, how many languages should you speak?

The answer I provide you today, regarding all we said, is the following: as many as your curiosity can bring you to.

One of the main qualities of an expert (be it in C++ or not) is their curiosity. This is what pushes you to learn more, every day, about every subject. As long as you stay curious, you will learn and grow.

Don’t waste your curiosity on a single subject, curiosity is best employed in width that in depth.

On what basis should you learn and/or keep your knowledge of other languages?

The answer to that question is always closely dependent on the free time you can allocate to the activity of coding.

If you have an hour of free time you want to invest in development, pick a language you know by name but never practiced. Read the not of intent and write a simple program. This should be enough for you to know if you want to invest time into this language. If not, keep in mind where you can find the documentation in case you need it later (or bookmark it in your favorite browser, if you don’t want to remember it).

If a language appeals to you, then try to evaluate how much time you can spend training. You can follow tutorials, you can do exercises, you can develop new projects.

There is a good way to keep your knowledge of your favorite languages: Platforms like CodinGame offers training and challenges and include tons of different languages. Try it, it’s very addictive: Coding Games and Programming Challenges to Code Better (codingame.com).

Is it only restricted to C++ experts?

I can hear some of you sighing in front of your screen, “I’m not even a C++ expert, should I even bother learning another language?”.

All the arguments I detailed above are valid for both experts and non-experts. As a non-expert, in addition to that, you are (in my experience) more likely to switch to another language sooner or later. You could wait to get onto a project to learn the language it features, but then you’ll be taking the risk to end up coding in a language you dislike or don’t understand.

Knowing other languages not only makes you more appealing to those who seek new collaborators (most recruiters will prefer someone who trained a bit in the language over someone who never touched it), but you will have a small insight into what the language is all about.

Curiosity is good quality, in software development.

Thanks for your attention and see you next week!

Author: Chloé Lourseyre

Should every variable be const by default?

Author: Chloé Lourseyre

A bit of terminology

In the title, I used the word “const” because const is a keyword every C++ developer knows (I hope). However, the generic concept this article is about is actually called “immutability”, and not “constness”. const is a keyword used in some languages (C++, Javascript, etc.) but the concept of immutability exists in other languages (in some of them, there is even no keyword attached to the concept, like in Rust).

To be a little more inclusive, I will use the words immutable and immutability instead of const and constness.

Nota bene: in C++ the keyword const does not completely refer to immutability. For instance, you can use the keyword const in a function prototype to indicate you won’t modify it, but you can pass a mutable object as this parameter. However, this article only refers to const as a way to render data immutable, so I would be grateful if you ignore situations where const does not refer to inherently immutable data. That is also another reason I wish to use the word immutable and not const.

Some very good reasons to make your variables immutable

Safety

What is called const-correctness in C++ is the fact that if you state that a given variable is immutable, then it won’t be modified.

Const-correctness is reached when you use the const keyword to tell when a variable won’t be modified. Then, you won’t be able to inadvertently modify something you shouldn’t and you give insight to the compiler and the developers of the data you manipulate.

Documentation

These five characters tell a lot. If you find yourself in a reliable code, where you are sure that const is used properly, then this keyword -or its absence- serves as an indication of the intended mutability of the variable.

Optimization

The compiler can (and will) optimize the code according to the const directives.

Knowing that some values are immutable, the compiler will be able to make assumptions and use syntactic shortcuts.

You can feed on J. Turner’s talk on the subject of practical performance: Jason Turner: Practical Performance Practices – YouTube.

Source

If you still doubt that immutability is useful in C++ (or are just curious, which you should be), please read these:

Are there downsides to using immutability whenever possible?

Since immutability is a technical restriction, there is no technical downside to using it.

However, there may be some non-technical downside of using const. First, it increases the verbosity. Even if it’s a 5-letters word that often has the same color as the type (in most IDEs), some people might find this impairing. There is also inertia: if you need to modify the variable, you’ll have to go and remove the keyword. Some may consider it a safeguard, others an impairment.

Are these arguments enough to not use immutability as soon as you can? It’s up to you, but in my humble opinion, I think not.

Other languages?

Other languages make variables immutable by default.

One famous example is the Rust language. In Rust, you don’t have the const keyword. Instead, every variable is by default immutable, and you can add the mut keyword to make them mutable.

For instance, in Rust, the following code doesn’t compile:

fn main() {
    let foo = 2;
    println!("Foo: {}", foo);
    foo += 40; // error[E0384]: cannot assign twice to immutable variable `foo`
    println!("Foo: {}", foo);
}

But this is correct:

fn main() {
    let mut foo = 2;
    println!("Foo: {}", foo);
    foo += 40;
    println!("Foo: {}", foo);
}

The reason mentioned in the Rust Handbook is the following: “This is one of many nudges Rust gives you to write your code in a way that takes advantage of the safety and easy concurrency that Rust offers.”

Rust is a pretty modern language that aims to provide safety to the developer along with top-tier performances. I agree with its intention to make variables immutable by default because of the safety it brings.

Is it reasonable to change the standard to make variables immutable by default?

If one fateful day the C++ committee decides to make every variable const by default… no one will upgrade to the vNext of C++.

Retro compatibility is important to smooth the transition between vPrevious and vNext. There are few historical examples where a feature has been removed from the standard, and for each, there is a really good reason why (often being years of deprecation).

But we can’t seriously consider removing the keyword const and add the keyword mut just for the sake of what is, in the end, syntactic sugar.

What to do then?

I for one chose to use the const keyword every single time I declare a variable. I only remove it when I actually write an instruction that requires it to be mutable (or when the compiler realizes it needs to modify a constant variable and outputs an error). This is a little blunt, but in my opinion, it’s the best way to ensure to catch the habit of putting const by default.

Of course, you may try to be smart at the moment you declare the variable and try to foresee the use you’ll make of it, but I guarantee you will not be 100% reliable and miss opportunities to make variables immutable.

So here’s my advice: until you catch the habit of putting const everywhere, use const by default without double thinking. You may have regular easy-to-solve compilation errors, but you’ll catch a good habit.

And what about constexpr?

constexpr is a modifier that indicates that the value of a variable can be evaluated at compile time. Its purpose is to transfer some evaluation to the compilation phase.

In a sense, it is stronger immutability than the keyword const. All I said in this article about immutability applies to constexpr: use it as often as you can.

However, unlike const, I don’t advise trying to use it everywhere and see afterward if it needs to be removed, since it’s way rarer and a bit more obvious to use. But keep in mind to check if you can use constexpr every time to declare a variable.

Thank you for reading and see you next week!

Author: Chloé Lourseyre