Peter Fordham (Editor) – Page 2

Is my cat Turing-complete?

Author: Chloé Lourseyre
Editor: Peter Fordham

This article is an adaptation of a Lightning Talk I gave at CppCon2021. Link here : https://www.youtube.com/watch?v=RtqTGSOdmBo

I’ll touch on a lighter subject this week, nonetheless quite important: is my cat Turing-complete?

Meet Peluche

Peluche (meaning “plush” in French) is a smooth cat that somehow lives in my house.

She will be our test subject today.

Is Peluche Turing-complete?

What is Turing-completeness

Turing-completeness is the notion that if a device can emulate a Turing machine, then it can perform any kind of computation¹.

It means that any machine that implements the eight following instructions is a computer (and can thus execute any kind of computation):

. and ,: Inputting and outputting a value
+ and -: Increase and decrease the value contained in a memory cell².
> and <: Shift the current memory tape left or right.
[ and ]: Performing loops.

So, if Peluche can perform these eight instructions, we can consider her Turing-complete.

Proof of the Turing-completeness

Input and output

First, I tried to poke Peluche if I could get a reaction.

She looked at me, then just turned around.

So here it is: I poked her, and I got a reaction. So she can process inputs and give outputs.

Input/output confirmed!

Increase and decrease memory value

The other day, I came back from work to this:

Kibbles everywhere…

But then I took a closer look and realized that the slabs could be numbered, like this:

This looks pretty much like a memory tape to me! Since she can spill kibbles on the tiles and then eat them directly from the floor, she can increase and decrease the values contained in a given memory cell.

Increase/decrease confirmed!

Shift the current memory cell left or right

Another time, I was doing the dishes and inadvertently spilled some water on Peluche. She began to run everywhere around the kitchen, making a huge mess.

If you look close (at the tip of the red arrow), you may notice that while making this mess, she displaced her bowl.

Displacing her bowl means she will spill her kibbles in another tile. This counts as shifting the memory head to edit another memory cell.

Shift of the memory tape confirmed!

Perform loops

So, after this mess, I (obviously) had to clean up.

No more than five minutes later, I went back to the kitchen to this:

Yeah… she can DEFINITELY perform loops…

Loops confirmed!

We have just proven that Peluche is, indeed, Turing-complete. So now, how can we use her to perform high-performance computations?

What to do with her?

Now that I’ve proven that Peluche is Turing-complete, I can literally do anything with her!

Thus, I tried to give her simple code to execute³:

😾😾😾😾😾😾😾😾
😿
🐈😾
🐈😾😾
🐈😾😾😾
🐈😾😾😾😾
🐈😾😾😾😾😾
🐈😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾😾😾😾😾😾😾
🐈😾😾😾😾😾😾😾😾😾😾😾😾😾😾😾😾
😻😻😻😻😻😻😻😻😻😻😻😻😻😻😻😻🐾
😸
🐈🐈🐈🐈🐈🐈🐈🐈🐈🙀😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🐾🐾🙀😾😾😾😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🐾🐾🐾🙀😾😾😾😾😻😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🐾🐾🐾🙀😾😾😾😾😻😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🙀😾😻😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🙀😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🙀😾😻😻😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🙀😾😻😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈😾😾🙀🐾🐾😻😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🐾🐾🐾🙀😾😾😾😾😻😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐈🐾🐾🐾🐾🙀😾😾😾😾😻😻😻😻😻😻😻😻😻😻😻😻😻
🐈🐈🐈🐈🐈🐈🐾🐾🙀😾😾
😻😻😻😻😻😻🙀

The result was final: she wouldn’t do a thing.

Though they can, maybe cats are not designed to execute code after all?

About “cat-computing”

Jokes aside, cat-computing is the name I give to this generalized practice. In my experience, it happens quite often that when someone discovers a new feature of a language, they begin to use it everywhere, just because they can and they want to.

However, just like you can execute code using a cat⁴ but shouldn’t, it’s not because you can use a feature that you should.

They were too busy wondering if they could to think about whether they should.
Dr Ian Malcolm, Jurassic Park

Wrapping up

Cat-computing seems to be a rookie mistake (and it is), but even the most experienced developers sometimes make rookie mistakes (and there’s no shame in that).

Every three years, a new version of C++ is published. Every time, it makes me want to use the new features in every possible situation. Though this is a good opportunity to build some experience around that (one of the best ways to avoid misuses of a feature is to perform these misuses once, in my opinion), this is also favorable ground for acquiring bad practices.

Always ask yourself if a feature is necessary⁵ before using it, or else you may do cat-computing.

Also, cat-computing is animal abuse, so don’t do it 😠.

Thanks for reading and see you next week!

(No cats were harmed during the writing of this article, but one was gently poked.)

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Notes

This is a simplified definition, very inaccurate but accurate enough for this example. If you want the real definition, go there: Turing completeness – Wikipedia
I did not state it explicitly, but a Turing machine has a “memory tape” with “memory cells” on it. The machine is always pointing to a memory cell, which is the mentioned “current” memory cell.
You may not be able to read this sample of code — this is a fancy new language I designed called “braincat”.
Actually, you can’t execute code using a cat, I know, but it’s for the sake of the metaphor that I assume you can.
Of course, necessity occurs when there is a known benefit to the feature. I’m not talking about “absolute necessity” but about “practical necessity”.

Duff’s device in 2021

Author: Chloé Lourseyre
Editor: Peter Fordham

This year at the CppCon, Walter E. Brown made a Lightning Talk about Duff’s device (I’ll put a Youtube link here as soon as it’s available).

Duff’s device is a pretty old contraption and I was wondering “How useful can it be in 2021, with C++20 and all?”.

So here we go.

What is Duff’s device?

What we call Duff’s device (named after its creator Tom Duff) is a way to implement manual loop unrolling in the C language.

Loop unrolling is an execution time optimization technique in which we reduce the number loop control by manually “unrolling” the loop content. It trades execution time for binary size (because your code is generally larger when you use this technique).

The principle of Duff’s device is to perform several computations (usually four to eight) in the same loop, so the loop control is evaluated one out of a few computations instead of each computation.

So, instead of doing this:

void execute_loop(int & data, const size_t loop_size)
{
    for (int i = 0 ; i < loop_size ; ++i)
    {
        computation(data);
    }
}

We do something that looks like this:

void execute_loop(int & data, const size_t loop_size)
{
    for (int i = 0 ; i < loop_size/4 ; ++i)
    {
        computation(data);
        computation(data);
        computation(data);
        computation(data);
    }
}

However, as you might have noticed, if loop_size if not a multiple of 4, the function performs the wrong number of calls to computation(). To rectify this, Duff’s device uses the C switch fall-through functionality, just like this:

void execute_loop(int & data, const size_t loop_size)
{
    size_t i = 0;
    switch(loop_size%4)
    {
        do{
            case 0: computation(data);
            case 3: computation(data);
            case 2: computation(data);
            case 1: computation(data);
            ++i;
        } while (i < (loop_size+3)/4);
    }
}

This is a bit more alien written that way, so I’ll explain it here :

At the beginning of the function we enter the switch statement and immediately check for modulo of loop_size. Depending on the result, we end up in one of the four cases. Then, because of the switch fallthrough, we end up doing a different number of computation depending on this modulo. This allows us to rectifies the problem of doing the wrong number of computations when loop_size is not a multiple of 4.

But then what happens? After falling through, the program encounters the while keyword. Thus, since it’s technically inside a do while() loop, the program goes back to the do statement and continue the loop as normal.

After the first occurrence, the case N labels are ignored, so it is as if it was falling through again.

You can check the numbers if you like: every time, we end up doing the correct number of computations.

Is Duff’s device worth it?

Duff’s device is from another time, another era (heck, it’s even from another language), so my first reaction about it in 2021 would be “This kind of device is probably counter-productive, I’d rather let the compiler do the optimization for me.”

But I want tangible proof of that. So what about a few benchmarks?

Benchmarks

To do the benchmarks, I used this code: Quick C++ Benchmarks – Duff’s device (quick-bench.com).

Here are the results¹:

Compiler	Optimization option	Basic loop (cpu_time)	Duff’s device (cpu_time)	Duff’s device / Basic loop
Clang 12.0	-Og	7.5657e+4	7.2965e+4	– 3.6%
Clang 12.0	-O1	7.0786e+4	7.3221e+4	+ 3.4%
Clang 12.0	-O2	1.2452e-1	1.2423e-1	– 0.23%
Clang 12.0	-O3	1.2620e-1	1.2296e-1	– 2.6%
GCC 10.3	-Og	4.7117e+4	4.7933e+4	+ 1.7%
GCC 10.3	-O1	7.0789e+4	7.2404e+4	+ 2.3%
GCC 10.3	-O2	4.1516e-6	4.1224e-6	– 0.70%
GCC 10.3	-O3	4.0523e-6	4.0654e-6	+ 0.32%

In this case, the difference is insignificant (3,5% on a benchmark really is not much, in live code this would be diluted into the rest of the codebase). Plus, whether it is the basic loop or Duff’s device which is the fastest depends on the optimization level and compiler.

After that, I used a more simple version of computation() (one the compiler will optimize easier), this one: Quick C++ Benchmarks – Duff’s device (quick-bench.com).

This give these results:

Compiler	Optimization option	Basic loop (cpu_time)	Duff’s device (cpu_time)	Duff’s device / Basic loop
Clang 12.0	-Og	5.9463e+4	5.9547e+4	+ 0.14%
Clang 12.0	-O1	5.9182e+4	5.9235e+4	+ 0.09%
Clang 12.0	-O2	4.0450e-6	1.2233e-1	+ 3 000 000%
Clang 12.0	-O3	4.0398e-6	1.2502e-1	+ 3 000 000%
GCC 10.3	-Og	4.2780e+4	4.0090e+4	– 6.3%
GCC 10.3	-O1	1.1299e+4	5.9238e+4	+ 420%
GCC 10.3	-O2	3.8900e-6	3.8850e-6	– 0.13%
GCC 10.3	-O3	5.3264e-6	4.1162e-6	– 23%

This is interesting, because we can see that Clang can, on its own, greatly optimize the basic loop without managing to optimize Duff’s device (with -O2 and -O3 the basic loop is 30 000 times faster than Duff’s device; this is because the compiler optimize the basic loop into a single operation, but consider Duff’s device too complicated to be optimized).

On the other hand, GCC does not manage to optimize the basic much more than Duff’s device in the end, so if at -O1 the basic loop is more than 5 times faster, with -O3 Duff’s device is almost 23% faster (which is significant)².

Readability and semantics

At first glance, Duff’s device is a very odd contraption. However, it is relatively well known among C and C++ developers (especially the oldest ones). Plus, we already have a name for it and a pretty good Wikipedia page to explains how it works.

As long as you identify it as such in the comments, I think it s pretty safe to use Duff’s device, even if you know your coworkers don’t know about it (you can even put the Wikipedia link in the comments if you like!).

Trying to seek a very specific case

Principle

The loop unrolling specifically aims to reduce the number of loop controls that are evaluated. So I set up a specific case where the loop control (and index increment) are both heavier to evaluate.

So instead of using an integer as the loop index, I used this class:

struct MyIndex
{
  int index;
  
  MyIndex(int base_index): index(base_index) {}
  
  MyIndex& operator++() 
  {  
    if (index%2 == 0)
      index+=3;
    else
      index-=1;
    return *this;
  }

  bool operator<(const MyIndex& rhs)
  {
    if (index%3 == 0)
      return index < rhs.index;
    else if (index%3 == 1)
      return index < rhs.index+2;
    else
      return index < rhs.index+6;
  }
};

Each time we increment or compare the MyIndex, we perform at least one modulo operation (a pretty heavy arithmetic operation).

And I ran the benchmarks on it.

Benchmarks

So I use the following code: Quick C++ Benchmarks – Duff’s device with strange index (quick-bench.com)

This give these results:

Compiler	Optimization option	Basic loop (cpu_time)	Duff’s device (cpu_time)	Duff’s device / Basic loop
Clang 12.0	-Og	2.0694e+5	5.9710e+4	– 71%
Clang 12.0	-O1	1.8356e+5	5.8805e+4	– 68%
Clang 12.0	-O2	1.2318e-1	1.2582e-1	+ 2.1%
Clang 12.0	-O3	1.2955e-1	1.2553e-4	– 3.1%
GCC 10.3	-Og	6.2676e+4	4.0014e+4	– 36%
GCC 10.3	-O1	7.0324e+4	6.0959e+4	– 13%
GCC 10.3	-O2	6.5143e+4	4.0898e-6	– 100%
GCC 10.3	-O3	4.1155e-6	4.0917e-6	– 0.58%

Here, we can see that Duff’s device is always better in the low optimization levels, but never in a significant advantage at -O3. This means that the compiler manages to optimize the basic loop as much as Duff’s device in the higher grades of optimization. This is significantly different from the previous results.

Why are the results so inconsistent?

The benchmark show very inconsistent results: for instance, how come that in the context of the simple computation(), with GCC and -O1, the basic loop is more than five times faster than Duff’s device, whereas with -O3, it’s Duff’s device that is 23% faster? How come that for the same code, Clang show totally different results than GCC and show that the basic loop is thirty thousand times faster with -O2 and -O3?

This is because each compiler has its own ways to optimize these kinds of loops at different level of optimization.

I you want to look into it, you can compare the assembly code generated by each compiler, just like in this example: Compiler Explorer (godbolt.org) where the GCC and the Clang version of the -O3 level of optimization are put side-by-side.

I would have loved detailing that here, but unfortunately it would take more than one whole article to analyze them all. If you, reader of this article, want to take things into your own hands and perform the analysis yourself, I’ll be more than glad to publish your results on this blog (you can contact me here).

Wrapping up

I will summarize the results in the following chart, which indicate which device is best in the different implementations we saw:

Compiler	Optimization option	Complex `computation()`	Trivial `computation()`	Heavy loop control
Clang 12.0	-Og	None	None	Duff’s device
Clang 12.0	-O1	None	None	Duff’s device
Clang 12.0	-O2	None	Basic loop	None
Clang 12.0	-O3	None	Basic loop	None
GCC 10.3	-Og	None	None	Duff’s device
GCC 10.3	-O1	None	Basic loop	Duff’s device
GCC 10.3	-O2	None	None	Duff’s device
GCC 10.3	-O3	None	Duff’s device	None

How to interpret these results?

First, when we have a complex computation and a trivial loop control, there is no significant difference between the both.

Second, when to computation is trivial, it’s often the basic loop which is better, but not always.

Third, as expected, it is Duff’s device which is preferred with a heavy loop control, but it is not always necessary.

And finally, the results will almost always depend on your implementation. While doing my research for this article, I found myself trying several implementations of the code I used to illustrate Duff’s device, and I often ended up with pretty different benchmarks each time I made a tiny edit on the code.

My point here is that sometimes Duff’s device is better than a basic basic loop, and sometimes it’s the other way around (even if, most of the time, there is no major difference).

In conclusion, Duff’s device if still worth considering³, but you’ll have to do your own benchmarks to be sure where it is indeed useful. However, Duff’s device does add more verbosity to the code. Even if it’s easy to document (as stated before), you may not have the time (or not want to spend the time) to do benchmarks and consider Duff’s device. It’s up to you.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Notes

The “cpu_time” mentioned in the chart is an abstract unit of measure, prompted by quick-bench. It has no use on its own, it s only used to compare each benchmark. That’s why the order of magnitude may vary from one line to another. You want to look at the last column.
The results presented here also depends on the implementation of each compute_*(). For instance, if you evaluate (loop_size+3)/4 each loop instead of using a const size_t to store it, GCC results are very different and Duff’s device is no longer significantly the best with -O3.
I’ll just add this note here to remind you one trivial fact: execution time optimization is only worth considering when your code is time-sensitive. If you work on a non-time-sensitive code, you shouldn’t even consider Duff’s device in the first place. When possible, keep it simple, and keep in mind the 80:20 rule.

Pragma: once or twice?

Author: Chloé Lourseyre
Editor: Peter Fordham

Context

Header guards

Every C++ developer has been taught header guards. Header guards are a way to prevent a header being included multiple times which would be problematic because it would mean that the variables, function and classes in that header would be defined several times, leading to a compilation error.

Example of a header guard:

#ifndef HEADER_FOOBAR
#define HEADER_FOOBAR

class FooBar
{
    // ...
};

#endif // HEADER_FOOBAR

For those who are not familiar with it, here is how it works: the first time the file is included, the macro HEADER_FOOBAR is not defined. Thus, we enter into the #ifndef control directive. In there, we defined HEADER_FOOBAR and the class FooBar. Later, if we include the file again, since HEADER_FOOBAR is defined, we don’t enter into the #ifndef again, so the class FooBar is not defined a second time.

#pragma once

#pragma is a preprocessor directive providing additional information to the compiler, beyond what is conveyed in the language itself.

Any compiler is free to interpret pragma directive as it wishes. However, over the years, some pragma directives have acquired significant popularity and are now almost-standard (such as #pragma once, which is the topic of this article, or #pragma pack).

#pragma once is a directive that indicates to the compiler to include the file only once. The compiler manages itself how it remembers which files are already included or not.

So, instinctively, we can think that the #pragma once directive does the job of a header guard, but with only one line and without having to think of a macro name.

Today?

In the past, the #pragma once directive was not implemented for every compiler, so it was less portable than header guards.

But today, in C++, I did not find a single instance of compiler that does not implement this directive.

So why bother using header guards anymore? Answer: because of the issue I’m about to describe.

A strange issues with #pragma once

There is actually one kind of issue that can occur with #pragma once that cannot occur with header guards.

Say, for instance, that your header file is duplicated somewhere. This is a not-so-uncommon issue that may have multiple causes:

You messed up a merge and your version control system duplicated some files.
The version control system messed up the move of some files and they ended up duplicated.
Your filesystem has two separate mount points that gives a path to the same files. They all appear as two different sets of files (since they are present on both disks).
Someone duplicates one of your file in another part of the project for its personal use, without renaming anything (this is bad manners, but it happens).

(please note that I encountered each of these four issues at some point in my career).

When this happens, when you have the same file duplicated, header guards and #pragma once do not behave the same way:

Since the macros that guard each file have the same name, the header guards will work perfectly fine and only include one file.
Since, from the FS point of view, files are different, #pragma once will behave as if they are different file, and thus include each file separately. This leads to a compilation error.

Issues with header guards?

Header guards can have issues too. You can have typos in the macro names, rendering your guards useless. That can’t happen with #pragma once, also it’s possible for macro name to clash if they are badly chosen, also can’t happen with #pragma once.

However, these issues can be easily avoided (typos are easy to detect and name clashes are prevented if you have a good naming convention).

A huge benefit though!

There is also a usage of header guards that is very useful for testing and that is not possible with #pragma once.

Say you want to test the class Foo (in file Foo.h) that uses the class Bar (in file Bar.h). But, for testing purpose, you want to stub class Bar.

One option header guards allows you is to create your won mock of class Bar in file BarMock.h. If the mock uses the same headers guards than the original Bar, then in you test, when you include BarMock.h then Foo.h, the header Bar.h will not be included (because the mock is already included and has the same guards).

So, should I use #pragma once or header guards?

This question is a bit difficult to answer. Let’s take a look at the cons of each method:

#pragma once are non-standard and are a major issue when you end up in a degraded environment.
Header guards may have issues if handled improperly.

In my opinion, #pragma directives are to be avoid when possible. If, in practice, they work, they are not formally standard.

Dear C++20, what about Modules?

Modules, one of the “big four” features of C++20, changes our vision of the “classical build process”. Instead of having source and header files, we can now have modules. They overcome the restrictions of header files and promise a lot: faster build-times, fewer violations of the One-Definition-Rule, less usage of the preprocessor.

Thanks to modules, we can say that #pragma once and header guards issues are no more.

To learn more about modules, check these out:

Wrapping up

This article, talking about pragmas and header guards, targets project that are prior to C++20. If you are in this case and hesitate between #pragma once and header guards, maybe it’s time to upgrade to C++20?

If you can’t upgrade to C++20 (few industrial project can), then choose wisely between #pragma once and header guards,

Thanks for reading and see you next time!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addendum

Source

[History of C++] The genesis of casting.

Author: Chloé Lourseyre
Editor: Peter Fordham

C-style casts

First of all, to understand the rationale behind the design of C++ casts, I think it’s important to remind you how C-style casts work, both in C and in C++

In C¹

In C, you have two ways to cast:

You perform a value cast, an arithmetic conversion from a numeral type into another numeral type. You can have data loss if the targeted type is narrower than the origin type (for instance, when you cast float into long, or if you cast long into int).
You perform a pointer cast, which converts a pointer of a type into a pointer of another type. This can work well, as in this example (https://onlinegdb.com/sYFCGeZmH), but it can quickly be error-prone, like in this example (https://onlinegdb.com/pWovM17X4) where types are not exactly the same and in this example (https://onlinegdb.com/HHjNS9NSb) where one structure in bigger than the other².

Though it is a C feature that has its uses and misuses, in the C++ language this is a behavior we want to avoid.

In C++

The C-style cast does not work in exactly C++ the same way it works in C (even though the final behaviors are similar.

When you perform a C-cast in C++, the compiler tries the following cast operation, in order, until it finds something that it can compile:

const_cast
static_cast
static_cast followed by const_cast
reinterpret_cast
reinterpret_cast followed by const_cast

This is a process that is not appreciated by C++ developer (to say the least) because the cast that is performed is not explicit and does not catch potential errors at compile time.

You can find more information about this behavior on the following page: Explicit type conversion – cppreference.com

Run-Time Type Information

Original idea and controversies

What we call Run-Time Type Information, often shortened to RTTI is a mechanism that allows the type of an object to be determined during program execution.

This is used in polymorphism, where you can manipulate an object through its base class interface (thus, you don’t know at compile-time which derived class you are manipulating).

RTTI for C++ was drafted from the earliest version, but its development and implementation were postponed in hope that it would prove unnecessary.

Some people, at that time, raised their voice against the feature, saying that this would need too much support, was to heavy to implement, too expensive, complicated and confusing, “inherently evil” (against the spirit of the language), seen as the beginning of an avalanche of new features, etc.

However, Bjarne Stroustrup finally decided that it was worth implementing, for three reasons: it is important to some people, it is harmless to those who won’t use it, and without it libraries will develop their own RTTI anyway, leading to inconsistency.

In the end, RTTI was implemented in three parts:

The dynamic_cast operator, allowing a pointer to a derived class to be obtained from a pointer to the base class — only if the pointer is effectively of the derived class.
The typeid operator, allowing identification of the exact type of an object given an object of the base class.
The type_info structure, giving additional run-time info on a given type.

Early in the process (and the main reason he decided to wait until RTTI was needed before implementing it), Stroustrup detected numerous misuses of the feature, and some people even labelled it as a “Dangerous feature”.

However, there is a major difference between a feature that can be misused and a feature that will be misused. That difference resides in education, design, testing, etc. But this has a cost, and the real question is: are the benefits of such a dangerous features worth the effort necessary to keep misuses at a anecdotal level?

The final decision was yes: it is worth the shot. But not all developers agreed at that time.

Syntax

Since casts couldn’t be inherently made safe, Stroustrup wanted to provide a syntax that both signaled the use of an unsafe feature and discouraged its use when there were alternatives.

The C++ crew originally considered either using Checked<T*>(p); for run-time checked conversion and Unchecked<T*>(p); for unchecked conversion, or using (virtual T*)p for dynamic cast only.

But in regard the constraints and the fact that dynamic_casts and “standard” casts (which we know call static_cast) are two whole different operations, the old syntax was abandoned in favor to more verbose unary operators. These are the operators we know today, dynamic_cast<T*>(p) and static_cast<T*>(p) (and, later, the other casts).

typeid() and type_info

The first implementation of RTTI only provided dynamic_cast. However, soon people wanted to know more about the types they were dynamically manipulating leading to the creation of typeid() and type_info.

The typeid() method can be called on any polymorphic object and returns a reference to a type_info that holds all the needed information. The reason that it returns a reference and not a pointer is to avoid pointer comparison and arithmetic on it.

type_infos are uncopiable, polymorphic, comparable, sortable (so it can be used in hashmaps and such) and hold the name of the type.

Uses and Misuses

Now, there are two categories of types: those that have type information at run time and those that don’t. It was decided that only be the polymorphic classes, i.e. the classes who can be manipulated though base classes, will hold RTTI.

At first, people wondered if it would not cause some problems (and frustration) because sometimes it is hard to tell (as a developer) if a function is polymorphic or not. But this is not a big issue, because the compiler is able to tell at compile time if the use of typeid and of type_of is illegal or not.

The main issue that was anticipated was the over-use of RTTI where it isn’t needed. For instance, we can expect such code:

void rotate(const Shape& r)
{
    if (typeid(r) == typeid(Circle)) 
    {
        // do nothing
    }
    else if (typeid(r) == typeid(Triangle)) 
    {
        // rotate triangle
    }
    else if (typeid(r) == typeid(Square)) 
    {
        // rotate square
    }
}

However, this is a broken use of RTTI because it does not correctly handle classes derived from the ones mentioned. A better way to do this would be via virtualization.

Another misuse would be with unnecessary type-checking, like in the following code:

Crate* foobar(Crate* crate, MyContainer* cont)
{
    cont->put(crate);

    // do things...

    Object* obj = cont->get();
    Crate* cr = dynamic_cast<Crate*>(obj)
    if (cr)
        return cr;
    // else, handle error
}

Here, we manually check the type of the object in MyContainer, although it would be better to use a templated version of the container, like so:

Crate* foobar(Crate* crate, MyContainer<Crate>* cont)
{
    cont->put(crate);

    // do things...

    return cont->get();
}

Here, no need to check for errors and, most of all, no use of RTTI.

Theses two misuses of C++ RTTI are most commonly performed by developers following the guidelines of other languages (like C, Pascal, etc.) where such code is accepted, even encouraged. But it doesn’t fit the C++ design.

Abandoned features

Here is a list of features that have been considered for the C++ RTTI, but not adopted in the end:

Meta-Objects: it replaces the type_info by a mecanism (the meta-object) that can accept (at run time) requests to perform any operation that can be requested of an object of the language. However it embeds an interpreter for the complete language, which is a threat to its efficiency.
The Type-inquiry Operator: An alternative to the dynamic_cast was an operator that can say if an object is of a derived class or not. If so, it would allow us to then cast it (old-style) to the derived class in order to use it. However, dynamic_cast and static_cast can both be applied to pointers and hold different result, so we needed to make the distinction, because old-style-casting pointers would not always give us the result we expect. Plus decorrelating the check and the cast can cause mismatch.
Type Relations: Using comparison operators (such as < and <=) was suggested, but it was judged “too cute” (meaning it is a non-mathematical interpretation of the operator, giving meaning to an operation that has no mathematical meaning). Plus, this has the same check/cast decorrelation as the type-inquiry operator.
Multi-methods: it is the ability to select a virtual function based on more than one object. Such mechanism may be useful to people who develop binary operators. However, at that time, Stroustrup was not familiar with the concept and decided it would be implemented only if needed later.
Unconstrained Methods: this is the mechanism that allows a polymorphic object to call any method that could be called, checking at run time if it can effectively be called, handling errors accordingly. However, with the dynamic_cast we can check this ourselves, which is more efficient and type-safe.
Checked Initialization: this is the ability to initialize a derived class object from a polymorphic object, checking at run-time if the type actually match. However there was syntax complication, error-handling uncertainties and it can be done using, again, a dynamic_cast.

C++-style casts

Problems and consequencies

The C-style cast is (quoting B.S.) “a sledgehammer”. It means that when you write (B)expr you say “make expr a B, and whatever happens happens.”. This can become very unfortunate when const or volatile qualifiers are involved.

In addition to that, the syntax is simplistic. Hard to see, hard to parse, and you need an overuse of parentheses when you want to use a derived method in a polymorphic context³.

Thus, it was decided to separate the different ways to cast into separate operators. This way, when you write a cast, you write how you want to cast. Plus, this adds some verbosity to the operation which makes parsing easier and warns the reader that a potentially dangerous operation is happening.

Since there are really bad behaviors (from the C++ point of view) in C-style casts, there are C++ specific cast operators that are meant to not be used (to be isolated from “good” cast operators). These behavior are not deprecated from the language because in some specific contexts they can be useful, but they need to be separated from the others so they can not be used by accident and it is obvious when they are used.

The different casting operators

dynamic_cast

I won’t talk much about dynamic_cast, since this operator is covered in the previous section (about RTTI). Just keep in mind that the keyword dynamic_cast is the one associated with the RTTI solution.

dynamic_cast makes a conversion that is checked dynamically, i.e. at run-time. If you want a static check, i.e. at compile-time, you would prefer static_cast.

static_cast

The static_cast can be described as the inverse operation to the implicit conversion. If A can be implicitly converted to B, then B can be static_casted to A. The operator can also perform any conversion that can be implicitly done.

This alone covers the majority of conversion that does not require dynamic type checking.

static_cast respect constness (making it safer than C-style casts) and is static (any error will be detected at compile time).

Whenever it is relevant, the static_casting to a user-defined type seeks any single-parameter constructor that can match the conversion (if you try to statically cast an Foo into a Bar, the compiler will look for the Bar(Foo) constructor) or any relevant cast operator. See user-defined conversion function – cppreference.com for more info.

Also, you cannot perform a static_cast to or from a pointer to an incomplete type (which can be done using another C++-style cast).

reinterpret_cast

The reinterpret_cast holds the “unsafe” part of the C-style cast. With it, you can cast values from a class to another unrelated class, or from and to a pointer to an incomplete type.

This conversion basically reinterprets the argument its given. You can thus convert from pointer to function and from pointer to member.

This is inherently unsafe and must be performed with great caution. Wherever you write or see a reinterpret_cast, you know you must be extra careful. Using reinterpret_cast is almost as unsafe as C-style casts.

reinterpret_cast can easily lead to undefined behavior if not used following a specific set of rules (which you can find on its documentation page: reinterpret_cast conversion – cppreference.com)

For instance: if you use reinterpret_cast to go from one pointer type to another and then dereference that pointer to access it’s content, that’s likely undefined behavior.

const_cast

The goal to this operator is that the const and volatile qualifiers are never silently casted away.

To perform this operation, the source and destination types must be the same, except the const and volatile qualifiers which can differ.

This is a very dangerous operation and must be use with great caution. Always remember that casting away const from an object originally defined as const is undefined behavior.

bit_cast

Not really historical (it was introduced in C++20) but std::bit_cast was basically made to replace the std::memcpy() manual conversion.

The bit_cast can be undefined if there is no value of the destination type corresponding to the value representation produced (just like with memcpy).

Unlike reinterpret_cast, if you to go from one pointer type to another and then dereference that pointer to access it’s content using bit_cast it’s not undefined behavior if you know for sure that those bits are a valid representation of the target type. The difference here is subtle but it allows the compiler to safely make lots of cases work efficiently and do the right thing in more complex cases without invoking undefined behavior. Typical use case is for serialization.

Wrapping up

Historically, the way C-cast operator was split into four C++ operators follows three simple rules:

If you need to check the types dynamically, then use dynamic_cast.
If you can check the types statically, then use static_cast.
In any other case, it is reinterpret_cast or const_cast that you need, but this is very dangerous.

I’ll add to that that, in any situation, do no perform reinterpret_cast or const_cast unless you know know what you are doing. You should never ever perform these cast only because the other ones did not work.

RTTI in its wholeness is a useful –but totally optional– feature. But it is not a simple to master.

In modern C++, we want to perform checks as much as possible at compile time (for security and performance), so when we are able, we want to use static features instead of dynamic ones.

Of course, you should not force static code where dynamic code would be better, but you should always think of a static solution before a dynamic one.

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Sources

Notes

1. As much as I consider myself an expert in the C++ language, my knowledge of the C language is much more limited. There may be errors in this subsection. If so, please tell me in comments so I can edit the article.

2. You can also cast away the const qualifier through the pointer cast (https://onlinegdb.com/8HIJIeonA) but I don’t think it’s a whole different way to cast.

3. For instance, if px is a pointer to an object of type X (implemented as B) and B is a derived class of X that has a method g. you need to write ((B*)px)->g() to call g from px. A simpler syntax could have been px->B::g().

[History of C++] Templates: from C-style macros to concepts

Author: Chloé Lourseyre
Editor: Peter Fordham

Introduction: Parametrized types

Templates are the C++ feature (or group of features, as the word is used in several contexts) that implement parametrized types.

The notion of a parametrized type is very important in modern programming and consists of using a type as a parameter of a feature, which means you can use that feature with different types, the same way you use a feature with different values.

The most simple example is with std::vector. When you declare a vector as such: std::vector<int> foo;, the type int is parametrized. You could have put another type, like double, void* a user-defined class or even another list instead of int.

It is a way to achieve metaprogramming, a programming technique that aims to apply programs on other program data.

For the rest of the article I will use the word “template” to refer to either the notion of parametrized types or the C++ template implementation (unless I want to explicitly make the distinction).

Before Templates

Before the creation of template, in early C++, people still had to writer C-style macros to emulate templates.

One way of doing this is like that:

foobar.h

void foobar(FOOBAR_TYPE my_val);

foobar.cpp

void foobar(FOOBAR_TYPE my_val)
{
    // do stuff
}

main.cpp

#define FOOBAR_TYPE int
#include "foobar.h"
#include "foobar.cpp" // Only do this in a source file
#undef FOOBAR_TYPE

#define FOOBAR_TYPE double
#include "foobar.h"
#include "foobar.cpp" // Only do this in a source file
#undef FOOBAR_TYPE

int main()
{
    int toto = 42;
    double tata = 84;
    foobar(toto);
    foobar(tata);
}

Don’t do this at home, though! This is not something we ought to do nowadays (especially the #include "foobar.cpp" part). Also note that that code takes advantage of function overloading and therefore does not compile in C.

With our modern eyes, this seems very limited and error-prone. But the interesting thing is that even before C++ templates were implemented in early C++ design teams could use the macro approach to gain experience with them.

Timing

Templates were introduced in Release 3.0 of the language, in October 1991. In The Design and Evolution of C++, Stroustrup reveals that it was a mistake to introduce this feature so late, and in retrospect it would have been better to do so in Release 2.0 (June, 1989) at the expense of less important features, like multiple inheritance:

Also, adding multiple inheritance in Release 2.0 was a mistake. Multiple inheritance belongs in C++, but is far less important than parameterized types — and to some people, parameterized types are again less important than exception handling.
Bjarne Stroustrup, The Design and Evolution of C++, chapter 12: Multiple Inheritance, §1 – Introduction

From today, it is clear that Stroustrup was right and that templates have impacted the scenery of C++ much, much more than multiple inheritance.

This addition came late because it was really time-consuming to explore the design and implementation issues.

Needs and goals

The original need for templates was to express parametrization of container classes. But to do that job, macros were too limited. They fail to obey scope and type rules and don’t interact well with tools (especially debuggers). Before C++ templates, it was very hard to maintain code that used parametrized types, you needed the lowest level of abstraction and you needed to add each parametrized type manually.

The first concerns regarding templates were whether templates would be easy to use and templated objects be as easy to use as hand-coded objects, whether the compilation and linking speed would be significantly impacted and whether they would be portable.

The build process of templates

Syntax

The angle brackets

Designing the syntax of a feature is not an easy job, and requires extensive discussion and refinement.

The choice of the brackets <...> for the template parameter was made because, even though parentheses would have been easier to parse, they are overused in C++ and brackets are (empirically) more pleasant for a human reader.

However, this causes a problem for nested brackets, such as:

List<List<int>> a;

In the code above, in earlier C++, you would get a compilation error. The closing >> are seen by the compiler as operator>> and not two closing brackets.

A lexical trick was added later in the language (in C++14¹) so that this was not seen as a syntax error anymore.

The template argument

Initially, the template argument would have been placed just after the object name:

class Foo<class T>
{
    // ...
};

However, that caused two major issues:

It is a bit too hard to read for parsers and humans. Since the template syntax in nested within the syntax of the class, it is a bit tough to detect.
In the case of function templates, the templated type can be used before it is declared. For instance, in this declaraction: T at<class T>(const std::vector<T>& v, size_t index) { return v[index]; }, since T is the return type it is parsed before we even know it is a template parameter.

Both issues are resolved if we put the template argument before the declaration, and this is what was done:

template<class T> class Foo
{
    // ...
};

template<class T> T at(const std::vector<T>& v, size_t index) { return v[index]; }

Constraints of template parameters

In C++, the constraints on template arguments are implicit².

The dilemma over if the constraints should be explicit in the template argument (like below) or if they should be deduced from the usage occurred. An example of such explicit constraint would be like this:

template < class T {
        int operator==(const T&, const T&); 
        T& operator=(const T&);
        bool operator<(const T&, int);
    };
>
class Foo {
    // ...
};

But this was judged to be way to verbose to be readable and it would require way more templates for the same number of features. Moreover, this kind of over-restricts the class you’re implementing, giving constraints that excludes some implementations that would have been perfectly fine and correct without them³.

However, having explicit constraints was not off the table, but it is just that function type is a too specific way to express this.

This could have been achieved through derivation: by specifying that your templated type must derive from another class, you can have explicit constraint on this type.

template <class T>
class TBase {
    int operator==(const T&, const T&); 
    T& operator=(const T&);
    bool operator<(const T&, int);
};

template <class T : TBase>
class Foo {
    // ...
};

However this generates more issues. The programmers are, because of that, encouraged to express constraints as classes, leading to an overuse of inheritance. There is a loss in expressivity and semantics, because “T must have be comparable to an int” become “T must inherit from TBase”. In addition to that, you could not express constraints on type that can’t have a base class, like ints and doubles.

This is mainly why we did not have explicit constraints on template parameters in C++⁴ for a long time.

However, the discussion on template constraints was revived in the late 2010s and a new notion made its appearance in C++20: Concepts (c.f. Modern evolutions – Concepts below).

Templated object generation

How templates are compiled is very simple: for every set of template parameters that is used on the templated object, the compiler will generate a implementation of this object using explicitly those parameters.

So, writting this:

template <class T> class Foo { /* ... do things with T ... */ };
template <class T, class U> class Bar { /* ... do things with T  and U... */ };

Foo<int> foo1;
Foo<double> foo2;
Bar<int, int> bar1;
Bar<int, double> bar2;
Bar<double, double> bar3;
Bar< Foo<int>, Foo<long> > bar4;

Is the same thing as writing this:

class Foo_int { /* ... do things with int ... */ };
class Foo_double { /* ... do things with double ... */ };
class Foo_long { /* ... do things with long ... */ };
class Bar_int_int { /* ... do things with int  and int... */ };
class Bar_int_double { /* ... do things with int  and double... */ };
class Bar_double_double { /* ... do things with double  and double... */ };
class Bar_Foo_int_Foo_long { /* ... do things with Foo_int  and Foo_long... */ };

Foo_int foo1;
Foo_double foo2;
Bar_int_int bar1;
Bar_int_double bar2;
Bar_double_double bar3;
Bar_Foo_int_Foo_long bar4;

… only it is more verbose (even more in real code) and less readable.

Class templates

Templates were imagined and designed primarily for classes, mostly to allow for the implementation of standard containers. They were designed to be as simple to use as standard classes and as efficient as macros. These two facts were decided so that low-level arrays could be abandoned when they were not specifically needed (in low-level programming) and that templatized containers would be preferred in the higher levels.

In addition to type argument, template can take non-type argument, like this:

template <class T, int Size>
class MyContainer {
    T m_collection[Size];
    int m_size;
public:
    MyContainer(): m_size(Size) {}
    // ...
};

This was introduced in order to allow for static sizing of containers. Carrying the size in the type information make the implementations more efficient, because you don’t have to track it separately and you don’t loose it through pointer decay as you do with C-style arrays.

class Foo;

int main()
{
    Foo[700] fooTable; // low-level container
    MyContainer<Foo, 700> fooCnt; // high-level container, as efficient as the previous one
}

Function templates

The idea of function templates comes from the need of having templated class methods and from the idea that function templates are in the logical extension of class templates.

Today, the most obvious examples we can provide are the STL algorithms (std::find(), std::find_first_of() , std::merge(), etc.). Though at its creation, the STL algorithms did not exist, it was these kind of functions that inspired function templates (the most symbolic being sort()).

The main issue with function templates was deducing the function template arguments and return type, so we don’t have to explicitly specify them at each function call-site.

In this context, it has been decided that template argument could be deduced (when possible) and specified (when needed). This was extremely useful to specify return values, because they can not always be deduced, such as in this example:

template <class TTo, class TFrom>
TTo convert(TFrom val)
{
    return val;
}

int main()
{
    int val = 4;
    convert(val); // Error: TTo is ambiguous
    convert<double, int>(val) // Correct: TTo is double; TFrom is int
    convert<double>(val) // Correct: TTo is double; TFrom is int; 
}

As you can see on line 12, the trailing template arguments can be omitted as can be trailing function arguments (when they have a default value).

The way templates are generated (see section Templated objects generation section above) works perfectly fine with function overloading. The only subtlety is when there are both non-templated and templated overloads of a function. Then, the non-templated overload is called if there is a perfect match, else it will be the templated overload that will be called if there is a perfect match possible, else we apply ordinary overload resolution.

Template instantiation

At the beginning, explicit template instantiation was not intended. It was because it would create hard issues in some specific circumstances, like if two unrelated parts of a program both request the same instantiation of a templated object, which would have to be done without code replication and without disturbing dynamic linking. This is why implicit templates instantiation was preferred at first.

The first automatic implementation for template instantiation was as follows: when the linker is run, it searches for missing template instantiations. If found, the compiler is invoked again to produce the missing code. This process is repeated until there are no more missing template instantiations.

However, this process had several problems, one of them being very poor compile and link performance.

It is to mitigate this that optional explicit template instantiation is allowed.

The development of the template instantiation process had many more issues, such as the point of instantiation (the “name problem”, i.e. pinpointing which declaration the names used in a template definition refer to), dependencies problems, solving ambiguities, etc. Discussing all of these would require a dedicated article.

Modern evolutions

Templates are a feature that continued to evolve even as we entered the Modern C++ era (beginning with C++11).

Variadic templates

Variadic templates are templates that have at least one parameter pack. In C++, a parameter pack is a way to say that a function or a template has a variable number of arguments.

For example, the following function declaration uses a parameter pack:

void foobar(int args...);

And can be called with any number of argument (greater than one).

foobar(1);
foobar(42, 666);
foobar(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);

Variadic templates allows you to have a variable number of arguments, that can be of different types.

With that, we can write more generic functions. For instance:

#include <iostream>

struct MyLogger 
{
    static int log_counter;

    template<typename THead>
    static void log(const THead& h)
    {
        std::cout << "[" << (log_counter++) << "] " << h << std::endl;
    }

    template<typename THead, typename ...TTail>
    static void log(const THead& h, const TTail& ...t)
    {
        log(h);
        log(t...);
    }
};

int MyLogger::log_counter = 0;

int main()
{
    MyLogger::log(1,2,3,"FOO");
    MyLogger::log('f', 4.2);
}

This generates the following output:


[0] 1
[1] 2
[2] 3
[3] FOO
[4] f
[5] 4.2

It is safe to assume that a significant motivation for the addition variadic templates and parameter packs was to be able to allow the implementation more generic functions, even if it may lead to voluminous generated code (for instance, in the previous example, the MyLogger class has 8 instantiations of the function log⁵).

Full details are available here: Parameter pack(since C++11) – cppreference.com.

Concepts

Concepts are a C++20 feature that aims to give the developer a way to declare constraints over template parameters. This leads to clearer code (with a higher level of abstraction) and clearer error message (if any).

For instance, here is a concept declaration:

template<typename T_>
concept Addable = requires(T_ a, T_ b)
{
    a + b;
};

And here are example of its usage:

template<typename T_>
requires Addable<T_>
T_ foo(T_ a, T_ b);

template<typename T_>
T_ bar(T_ a, T_ b) requires Addable<T_>;

auto l = []<typename T_> requires Addable<T_> (T_ a, T_ b) {};

Before that, template-related error were barely readable. Concepts were a highly anticipated feature of C++20.

A good overview of concepts can be found on Oleksandr Koval’s blog: All C++20 core language features with examples | Oleksandr Koval’s blog (oleksandrkvl.github.io)

Deduction guides

Template deduction guides are a C++17 feature and are patterns associated with a templated object that tell the compiler how to translate a set of parameter (and their types) into template arguments.

For instance:

template<typename T_>
struct Foo
{
  T_ t;
};
 
Foo(const char *) -> Foo<std::string>;
 
Foo foo{"A String"};

In this code, the object foo is a Foo<std::string> and not a Foo<const char*>, and thus foo.t is a std::string. Thanks to the deduction guide, the compiler understand that3 when we use a const char*, we want to use the std::string instantiation of the template.

This is peculiarly useful for object such as vectors, which can have this kind of constructor:

template<typename Iterator> vector(Iterator b, Iterator e) -> vector<typename std::iterator_traits<Iterator>::value_type>;

This way, if we call the vector constructor with an iterator, the compiler will be able to deduce the templated parameter of the vector.

Substitution Failure Is Not An Error

Substitution Failure Is Not An Error, SFINAE for short, is a rule that applies during template overloaded function resolution.

It basically means that if the (deduced or explicitly specified) type for the template parameter fails, the specialization is discarded instead of causing a compile error.

For instance, take the following code:

struct Foo {};
struct Bar { Bar(Foo){} }; // Bar can be created from Foo
 
template <class T>
auto f(T a, T b) -> decltype(a+b); // 1st overload
 
Foo f(Bar, Bar);  // 2nd overload
 
Foo a, b;
Foo x3 = f(a, b);

Instinctively, we could think that this is the first overload that is called on the highlighted line (because the template instantiation using Foo as T is a better overload that the second one, which requires a conversion).

However, the expression (a+b) is ill-formed with Foo. Instead of generating an error, the overload auto f(Foo a, Foo b) -> decltype(a+b); is discarded. Thus, this is the other overload that is called, with an implicit conversion.

This kind of substitution occurs in all types used in the function type, all types used in the template parameter declarations. Since C++11, it also occurs in all expressions used in the function type and all expressions used in a template parameter declaration. Since C++20, it also occurs in all expressions used in the explicit specifier.

The full documentation about SFINAE can be found here: SFINAE – cppreference.com.

Other features in C++20

Templates continue to evolve. Here are a small list of the C++20 templates feature I couldn’t fit in this article:

Template parameter list for generic lambdas. Sometimes generic lambdas are too generic. C++20 allows to use familiar template function syntax to introduce type names directly.
Class template argument deduction for aggregates. In C++17 to use aggregates with class template argument deduction we need explicit deduction guides, that’s unnecessary now.
Class types in non-type template parameters. Non-type template parameters now can be of literal class types.
Generalized non-type template parameters. Non-type template parameters are generalized to so-called structural types.
Class template argument deduction for alias templates. Class template argument deduction works with type aliases now.

Exceptions and templates: two sides of the same coin

I did not talk about exceptions in this article, but for Stroustrup, exceptions and templates are complementary features:

To my mind, templates and exceptions are two sides of the same coin: templates allow a reduction in the number of run-time errors by extending the range of problems handled by static type checking; exceptions provide a mechanism for dealing with the remaining run-time errors. Templates make exception handling manageable by reducing the need for run-time error handling to the essential cases. Exceptions make general template-based libraries manageable by providing a way for such libraries to report errors.
Bjarne Stroustrup, The Design and Evolution of C++, chapter 15: Templates, §1 – Introduction

So, by design, templates and exception are closely intermingled, in addition to raising the level of abstraction for error-handling.

However, exception and templates (especially templates) have evolved greatly since then, so I think this may not be true anymore.

Wrapping up

In my opinion, templates are the biggest fish in the C++ metaphorical pond. We will never talk enough about them, and I suspect they will continue to evolve for decades.

This is so because in modern C++ one of the key idea is to write intentions instead of actions. We want higher levels of abstraction and more metaprogramming. It is only normal that template are at the hearts of the modern evolutions of the language.

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Sources

Notes

1. I managed to locate this change in the GCC compiler at release 6 (https://godbolt.org/z/vndGdd7Wh), suggesting that this indeed occurred with C++14. I managed to see the same thing with the clang compiler at release 6 (https://godbolt.org/z/ssfxvb4cM), proving this right.

2. This is called duck-typing, from the saying if it looks like a duck, swims like a duck and it quacks like a duck then it probably is a duck.

3. I have no concrete example to provide and I’m pretty much paraphrasing Stroustrup in his retrospection, but the idea is that by having user-defined constraints, you close some doors that you didn’t even know existed and that others could have exploited. I’ve done and seen very interesting things using templates, and the fact that the only constraint we have is that the templated code makes sense with their given parameters opens up as many possibilities as we can imagine.

4. There were other tries to imagine a way to specify constraints, but to no avail. More details in section §15.4 of Stroustrup: The Design and Evolution of C++.

5. These instantiations are (and according to the assembly – Compiler Explorer (godbolt.org)):

log(int);
log(char[4]);
log(char);
log(double);
log(int,int,int,char[4]);
log(int,int,char[4]);
log(int,char[4]);
log(char,double);

[History of C++] Explanation on why the keyword `class` has no more reason to exist

Author: Chloé Lourseyre
Editor: Peter Fordham

Introduction to the new concept : History of C++

A few months back (at the start of this blog) I was thinking about interesting things you can find in C++, then I realized one thing: the keyword class doesn’t really have a good reason to exists!

This may seem a bit harsh said that way, but I will explain my statement later in the article.

Anyway, studying the reason for the word class to exist lead me to look into the history of the language. It was very interesting and I really enjoyed it.

So I decided to write a mini-series of articles about the history of C++ aiming to present concepts that may be outdated today, in C++20, and explain why some strange or debatable things were made that way in the past.

Sources

For this miniseries, I have three main sources:

I have a few things to say about them.

First, Sibling Rivalry: C and C++ and A History of C++: 1979-1991 are both fully available on Stroustrup’s website (just follow the links).

Second, I find it quite sad that I could only find sources from one author. Sure, Bjarne Stroustrup is probably the best individual to talk about his own work, but I would have like the insight of other authors (if you know any, please tell me in the comments).

Why the keyword class could simply not exist in C++20?

Today, in C++20, we have two very similar keywords that work almost exactly the same: class and struct.

There is one and only one difference between class and struct: by default, the members and the inheritance of struct are public, but in those of class are private.

Here are three reason as to why this tiny difference is not worth a different keyword¹:

In practice, the default access modifier is almost never used, in my experience. Most developers don’t use default access modifier and prefer to specify it.
In 2021, a good code is a clear code. Explicitly writing the access specifier is -in that regard- better than leaving it implicit. This may be arguable on solo or small projects, but when you start to develop with many peers, it is better to write a few more characters to be sure that the code is clear for everyone.
Having two keywords is more ambiguous than having one. I have very often encountered developers that think there are more differences than that between class and struct, and they have sometimes even told me what they thought these additional differences were. If there was only one keyword, this confusion wouldn’t exist.

I can hear that some people have counter-arguments. The ones I heard a lot are:

This is syntactic sugar².
They actually use the implicit access specifier³.
There is semantics behind the use of each keyword that go beyond the sole technical meaning⁴.

All in all, what I’m trying to say is that the language would practically be the same if the keyword class did not exist. So, in the mindset of C++20, we could ask ourselves “what is the purpose of adding a keyword that is neither needed nor useful?”.

I know one thing: that class is one of the oldest C++-specific keywords. Let’s dive into the history of C++ to understand why it exists.

History of the keyword class

Birth

The first official appearance of the keyword class was in Classes: An Abstract Data Type Facility for the C language (Bjarne Stroustrup, 1980), and was actually not talking about C++, but about what we call C with classes.

What is “C with classes”? I think I’ll dedicate a whole article to this subject so I’ll keep it short here. It is C++ immediate predecessor, started in 1979. The original goal was to add modularity to the C language, inspired by Simula⁵ classes. At first, it was a mere tool, but soon evolved to a whole language. However, since C with Classes was a mild success while needing continuous support, Bjarne decide to abandon it and create a new language, using his experience of C with Classes, and that aimed to be more popular. He called this new language C++.

The choice of the word “class” directly comes from Simula, and the fact that Stroustrup dislikes inventing new terminology.

You’ll find more about C with Classes in the book The Design and Evolution Of C++ (Bjarne Stroustrup), where a whole section is dedicated to it.

So the keyword class was actually born within the predecessor of C++. In terms of design, it’s one of the oldest concepts and even the motivation behind the creation of C with classes.

Original difference between struct and class

In C with Classes, struct and class and quite different.

Structures works just like in C, they are simple data structure, whereas it is within classes that the concept of methods are introduced.

At that time, there was a real difference between structures and classes, thus the distinction⁶.

Into C++

The two of the greatest features of early C++ were virtual functions and functions overloading.

In addition to that, namespaces rules where introduced to define on how scopes names would behave. Among those rules:

Names are private unless they are explicitly declared public.
A class is a scope (implying that classes nest properly).
C structure names don’t nest (even when they are lexically nested).

These rules make structures and classes behave differently in terms of scopes and names.

For instance, this was legal:

struct outer {
    struct inner {
        int i;
    };
};

struct inner a = { 1 };

But replacing struct with class provoked a compilation error.

In later C++, the code above doesn’t compile (it needs to be outer::inter a = {1};).

“Fusion” with the keyword struct

It’s difficult to say when this occurred specifically, because none of the sources I found clearly state “This is when structures and classes became the same concept, but we can investigate.

According to The C++ Programming Language – Reference Manual (Bjarne Stroustrup, 1984), the first published document specifically about C++:

(listing derived types)
classes containing a sequence of objects of various types, a set of functions for manipulating these objects, and a set of restrictions on the access of these objects and function;
structures which are classes without access restrictions;
Bjarne Stroustrup, The C++ Programming Language – Reference Manual, §4.4 Derived types

Moreover, if we take a look at the feedback Stroustrup gives about virtual functions and the object layout model (concepts introduced in 1986):

At his point, the object model becomes real in the sense that an object is more than the simple aggregation of the data members of a class. […] Then why did I not at this point choose to make structs and classes different notions?
My intent was to have a single concept: a single set of layout rules, a single set of lookup rules, a single set of resolution rules, etc. […]
Bjarne Stroustrup, The Design and Evolution of C++, §3.5.1 The Object Layout Model

Even though it seems that structures couldn’t hold private members at that time (the private keyword didn’t even exist then!), we can safely say this was the moment where structures and classes were “fused”.

It was at the creation of C++ that the structures from C and the classes from Simula merged together.

But when did they actually become the same?

But technically, structs and classes became what they are today the moment the keyword private was invented. It seems that happened the same time the keyword protected was introduced, for Release 1.2 in 1987.

From then to now

Despite all that, despite the fact that class is technically useless, there is a lot more to it.

The keyword class has acquired semantics.

Indeed, nowadays, writing the keyword class means that you are implementing a class that is not a data bucket, whereas the keyword struct is mainly used for data buckets and similar data structures. In their technicity, these words do not differ, but because of usage, because these keywords have history, they acquired more meaning than the sole technical meaning.

Go see Jonathan Boccara’s very relevant article: The real difference between struct and class – Fluent C++ (fluentcpp.com) for more details. This article is inspired from the Core Guidelines.

The fact that nowadays’ class has lived more than forty years makes it very different from the class from 1980 and the class that it would be if it were introduced in 2020.

But the question that immediately pops up in my head is the following: should we continue to use class this way? Should we stick to the semantics it acquired or should we seek evolution towards a more modern meaning? The answer is simple: it’s up to each of us. We, C++ devs, are the ones that make the language evolve, every day, in every single line of code.

The Core Guidelines tells us how we should use each feature today, but maybe tomorrow someone (you?) will come up with a better, safer, clearer way to code. Understanding what were structs and classes in the past and what they are today is the first step to define what they will be tomorrow.

Wrapping up

The best way to summarize the history of these two keyword is this way: “The structures from C and the classes from Simula merged together at the creation of C++.”, but we can also say that thanks to that, despite representing the same feature, the both have different meaning.

This article is not a pamphlet against class and I will not conclude this article with an half-educated half-authoritative argument like I often do⁷. Instead, I will tell you that I realized how important it is to contextualize articles like the ones I publish on this blog with the version of C++ that is used and the articles and books that are used as examples and inspirations.

I think it s important to understand history to be able to judge the practices we have today. Do we do something out of habit or is there a real advantage to it? It s a question that needs to be asked every day, or else we’ll end up writing outdated code in an outdated mindset.

How C++ developers think⁸ evolves from decade to decade. Each eon, developers have different mindsets, different goals, different issues, a different education and so on… I don’t blame how people coded in the past, but I do blame those who code now like we coded a decade ago, and in the future I hope to see my peers blaming me when I code in “old C++”.

Thanks for reading and see you next week!

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

To go further

Here are two thoughts that are a bit off-topic.

Namespaces in C

Something that wasn’t inherited from C into C++ was the struct namespace.

Indeed, in C, the namespace containing the struct names isn’t the same as the gobal C namespace. In C struct foo and foo aren’t refereeing to the same object. In C++, assuming that foo is a structure, struct foo and foo are the same name.

There is a way, in C, to link these two namespace, using typedef. To learn more about that, read this article: How to use the typedef struct in C (educative.io)

The class keyword in templates

Maybe have you already seen this syntax:

template <class C_>
void foo(C_ arg)
{
    // ...
}

What does class mean in this context?

It actually means nothing more that “a type, whatever which one”.

This is a bit confusing, because we may think that, as type, C_ is supposed to be a class. But it is not. Later, the typename keyword was introduced to lift this confusion, but we can still use class if we want to.

Annotations

1. Since struct and class are so similar, I choose to consider class to be the keyword in excess, simply because struct exists in C and not class, and that it is the process of the keyword class that brought them both so close.

2. This argument is not true. By essence, syntactic sugar is supposed to make the code easier to read or write. Since struct/class is just a substitute of a keyword for another, there is no gain in clarity whatsoever. The only reason the code may seem easier to read is because of the semantics these keyword hold, but not the syntax itself.

3. Yes, I know, some people actually use the implicit. I do too. But what you forget when you say such a thing is that most of the software industry doesn’t think like you nor write code like you. The statement of the struct/class duplicity comes from an empiric fact. Our own individual practices can never be an argument against that fact.

4. While this is factually true, the reasoning is upside-down. It’s because of their history that they have different semantic meaning. If they were created today, from nothing, they would not have that semantics, only their duplicity. I’ll talk about that near the end of the article.

5. Simula is the name of two simulation programming languages, Simula I and Simula 67, developed in the 1960s at the Norwegian Computing Center in Oslo, and is considered the first object-oriented programming language. Simula is very unknown amongst the community of developers, but has greatly influenced other famous languages. Simula-type objects are reimplemented in Object Pascal, Java, C#, and, of course, C++.

6. At that time, you could emulate any structure with a class, but it was still interesting, all the more in the mindset of then, to make the distinction.

7. I tend to always agree with the C++ Core Guidelines (isocpp.github.io), even if I am always trying to be critical as of our habits and practices. But keep in mind that the guidelines we have today may be different from the one we’ll have tomorrow.

8. I think this statement is actually true for every languages, but C++ is the perfect archetype as it is one of the oldest most-used language on the market, today in 2021.

Sources

In order of appearance:

Off-topic: Feedspot’s Top 30 C++ Programming Blogs and Websites

Recently, Feedspot made a top-list of the best C++ programming blogs and website, and Belay the C++ ended up in 9th position!

You can find the top-list here: Top 30 C++ Programming Blogs and Websites You Must Follow in 2021 (feedspot.com)

Meet Peluche

Is Peluche Turing-complete?

What is Turing-completeness

Proof of the Turing-completeness

Input and output

Increase and decrease memory value

Shift the current memory cell left or right

Perform loops

What to do with her?

About “cat-computing”

Wrapping up

Addendum

Notes

What is Duff’s device?

Is Duff’s device worth it?

Benchmarks

Readability and semantics

Trying to seek a very specific case

Principle

Benchmarks

Why are the results so inconsistent?

Wrapping up

Addendum

Notes

Context

Header guards

#pragma once

Today?

A strange issues with #pragma once

Issues with header guards?

A huge benefit though!

So, should I use #pragma once or header guards?

Dear C++20, what about Modules?

Wrapping up

Addendum

Source

C-style casts

In C1

In C++

Run-Time Type Information

Original idea and controversies

Syntax

typeid() and type_info

Uses and Misuses

Abandoned features

C++-style casts

Problems and consequencies

The different casting operators

dynamic_cast

static_cast

reinterpret_cast

const_cast

bit_cast

Wrapping up

Addenda

Sources

Notes

Introduction: Parametrized types

Before Templates

foobar.h

foobar.cpp

main.cpp

Timing

Needs and goals

The build process of templates

Syntax

The angle brackets

The template argument

Constraints of template parameters

Templated object generation

Class templates

Function templates

Template instantiation

Modern evolutions

Variadic templates

Concepts

Deduction guides

Substitution Failure Is Not An Error

Other features in C++20

Exceptions and templates: two sides of the same coin

In C¹