[History of C++] The genesis of casting.

Author: Chloé Lourseyre
Editor: Peter Fordham

C-style casts

First of all, to understand the rationale behind the design of C++ casts, I think it’s important to remind you how C-style casts work, both in C and in C++

In C1

In C, you have two ways to cast:

  • You perform a value cast, an arithmetic conversion from a numeral type into another numeral type. You can have data loss if the targeted type is narrower than the origin type (for instance, when you cast float into long, or if you cast long into int).
  • You perform a pointer cast, which converts a pointer of a type into a pointer of another type. This can work well, as in this example (https://onlinegdb.com/sYFCGeZmH), but it can quickly be error-prone, like in this example (https://onlinegdb.com/pWovM17X4) where types are not exactly the same and in this example (https://onlinegdb.com/HHjNS9NSb) where one structure in bigger than the other2.

Though it is a C feature that has its uses and misuses, in the C++ language this is a behavior we want to avoid.

In C++

The C-style cast does not work in exactly C++ the same way it works in C (even though the final behaviors are similar.

When you perform a C-cast in C++, the compiler tries the following cast operation, in order, until it finds something that it can compile:

  1. const_cast
  2. static_cast
  3. static_cast followed by const_cast
  4. reinterpret_cast
  5. reinterpret_cast followed by const_cast

This is a process that is not appreciated by C++ developer (to say the least) because the cast that is performed is not explicit and does not catch potential errors at compile time.

You can find more information about this behavior on the following page: Explicit type conversion – cppreference.com

Run-Time Type Information

Original idea and controversies

What we call Run-Time Type Information, often shortened to RTTI is a mechanism that allows the type of an object to be determined during program execution.

This is used in polymorphism, where you can manipulate an object through its base class interface (thus, you don’t know at compile-time which derived class you are manipulating).

RTTI for C++ was drafted from the earliest version, but its development and implementation were postponed in hope that it would prove unnecessary.

Some people, at that time, raised their voice against the feature, saying that this would need too much support, was to heavy to implement, too expensive, complicated and confusing, “inherently evil” (against the spirit of the language), seen as the beginning of an avalanche of new features, etc.

However, Bjarne Stroustrup finally decided that it was worth implementing, for three reasons: it is important to some people, it is harmless to those who won’t use it, and without it libraries will develop their own RTTI anyway, leading to inconsistency.

In the end, RTTI was implemented in three parts:

  • The dynamic_cast operator, allowing a pointer to a derived class to be obtained from a pointer to the base class — only if the pointer is effectively of the derived class.
  • The typeid operator, allowing identification of the exact type of an object given an object of the base class.
  • The type_info structure, giving additional run-time info on a given type.

Early in the process (and the main reason he decided to wait until RTTI was needed before implementing it), Stroustrup detected numerous misuses of the feature, and some people even labelled it as a “Dangerous feature”.

However, there is a major difference between a feature that can be misused and a feature that will be misused. That difference resides in education, design, testing, etc. But this has a cost, and the real question is: are the benefits of such a dangerous features worth the effort necessary to keep misuses at a anecdotal level?

The final decision was yes: it is worth the shot. But not all developers agreed at that time.

Syntax

Since casts couldn’t be inherently made safe, Stroustrup wanted to provide a syntax that both signaled the use of an unsafe feature and discouraged its use when there were alternatives.

The C++ crew originally considered either using Checked<T*>(p); for run-time checked conversion and Unchecked<T*>(p); for unchecked conversion, or using (virtual T*)p for dynamic cast only.

But in regard the constraints and the fact that dynamic_casts and “standard” casts (which we know call static_cast) are two whole different operations, the old syntax was abandoned in favor to more verbose unary operators. These are the operators we know today, dynamic_cast<T*>(p) and static_cast<T*>(p) (and, later, the other casts).

typeid() and type_info

The first implementation of RTTI only provided dynamic_cast. However, soon people wanted to know more about the types they were dynamically manipulating leading to the creation of typeid() and type_info.

The typeid() method can be called on any polymorphic object and returns a reference to a type_info that holds all the needed information. The reason that it returns a reference and not a pointer is to avoid pointer comparison and arithmetic on it.

type_infos are uncopiable, polymorphic, comparable, sortable (so it can be used in hashmaps and such) and hold the name of the type.

Uses and Misuses

Now, there are two categories of types: those that have type information at run time and those that don’t. It was decided that only be the polymorphic classes, i.e. the classes who can be manipulated though base classes, will hold RTTI.

At first, people wondered if it would not cause some problems (and frustration) because sometimes it is hard to tell (as a developer) if a function is polymorphic or not. But this is not a big issue, because the compiler is able to tell at compile time if the use of typeid and of type_of is illegal or not.

The main issue that was anticipated was the over-use of RTTI where it isn’t needed. For instance, we can expect such code:

void rotate(const Shape& r)
{
    if (typeid(r) == typeid(Circle)) 
    {
        // do nothing
    }
    else if (typeid(r) == typeid(Triangle)) 
    {
        // rotate triangle
    }
    else if (typeid(r) == typeid(Square)) 
    {
        // rotate square
    }
}

However, this is a broken use of RTTI because it does not correctly handle classes derived from the ones mentioned. A better way to do this would be via virtualization.

Another misuse would be with unnecessary type-checking, like in the following code:

Crate* foobar(Crate* crate, MyContainer* cont)
{
    cont->put(crate);

    // do things...

    Object* obj = cont->get();
    Crate* cr = dynamic_cast<Crate*>(obj)
    if (cr)
        return cr;
    // else, handle error
}

Here, we manually check the type of the object in MyContainer, although it would be better to use a templated version of the container, like so:

Crate* foobar(Crate* crate, MyContainer<Crate>* cont)
{
    cont->put(crate);

    // do things...

    return cont->get();
}

Here, no need to check for errors and, most of all, no use of RTTI.

Theses two misuses of C++ RTTI are most commonly performed by developers following the guidelines of other languages (like C, Pascal, etc.) where such code is accepted, even encouraged. But it doesn’t fit the C++ design.

Abandoned features

Here is a list of features that have been considered for the C++ RTTI, but not adopted in the end:

  • Meta-Objects: it replaces the type_info by a mecanism (the meta-object) that can accept (at run time) requests to perform any operation that can be requested of an object of the language. However it embeds an interpreter for the complete language, which is a threat to its efficiency.
  • The Type-inquiry Operator: An alternative to the dynamic_cast was an operator that can say if an object is of a derived class or not. If so, it would allow us to then cast it (old-style) to the derived class in order to use it. However, dynamic_cast and static_cast can both be applied to pointers and hold different result, so we needed to make the distinction, because old-style-casting pointers would not always give us the result we expect. Plus decorrelating the check and the cast can cause mismatch.
  • Type Relations: Using comparison operators (such as < and <=) was suggested, but it was judged “too cute” (meaning it is a non-mathematical interpretation of the operator, giving meaning to an operation that has no mathematical meaning). Plus, this has the same check/cast decorrelation as the type-inquiry operator.
  • Multi-methods: it is the ability to select a virtual function based on more than one object. Such mechanism may be useful to people who develop binary operators. However, at that time, Stroustrup was not familiar with the concept and decided it would be implemented only if needed later.
  • Unconstrained Methods: this is the mechanism that allows a polymorphic object to call any method that could be called, checking at run time if it can effectively be called, handling errors accordingly. However, with the dynamic_cast we can check this ourselves, which is more efficient and type-safe.
  • Checked Initialization: this is the ability to initialize a derived class object from a polymorphic object, checking at run-time if the type actually match. However there was syntax complication, error-handling uncertainties and it can be done using, again, a dynamic_cast.

C++-style casts

Problems and consequencies

The C-style cast is (quoting B.S.) “a sledgehammer”. It means that when you write (B)expr you say “make expr a B, and whatever happens happens.”. This can become very unfortunate when const or volatile qualifiers are involved.

In addition to that, the syntax is simplistic. Hard to see, hard to parse, and you need an overuse of parentheses when you want to use a derived method in a polymorphic context3.

Thus, it was decided to separate the different ways to cast into separate operators. This way, when you write a cast, you write how you want to cast. Plus, this adds some verbosity to the operation which makes parsing easier and warns the reader that a potentially dangerous operation is happening.

Since there are really bad behaviors (from the C++ point of view) in C-style casts, there are C++ specific cast operators that are meant to not be used (to be isolated from “good” cast operators). These behavior are not deprecated from the language because in some specific contexts they can be useful, but they need to be separated from the others so they can not be used by accident and it is obvious when they are used.

The different casting operators

dynamic_cast

I won’t talk much about dynamic_cast, since this operator is covered in the previous section (about RTTI). Just keep in mind that the keyword dynamic_cast is the one associated with the RTTI solution.

dynamic_cast makes a conversion that is checked dynamically, i.e. at run-time. If you want a static check, i.e. at compile-time, you would prefer static_cast.

static_cast

The static_cast can be described as the inverse operation to the implicit conversion. If A can be implicitly converted to B, then B can be static_casted to A. The operator can also perform any conversion that can be implicitly done.

This alone covers the majority of conversion that does not require dynamic type checking.

static_cast respect constness (making it safer than C-style casts) and is static (any error will be detected at compile time).

Whenever it is relevant, the static_casting to a user-defined type seeks any single-parameter constructor that can match the conversion (if you try to statically cast an Foo into a Bar, the compiler will look for the Bar(Foo) constructor) or any relevant cast operator. See user-defined conversion function – cppreference.com for more info.

Also, you cannot perform a static_cast to or from a pointer to an incomplete type (which can be done using another C++-style cast).

reinterpret_cast

The reinterpret_cast holds the “unsafe” part of the C-style cast. With it, you can cast values from a class to another unrelated class, or from and to a pointer to an incomplete type.

This conversion basically reinterprets the argument its given. You can thus convert from pointer to function and from pointer to member.

This is inherently unsafe and must be performed with great caution. Wherever you write or see a reinterpret_cast, you know you must be extra careful. Using reinterpret_cast is almost as unsafe as C-style casts.

reinterpret_cast can easily lead to undefined behavior if not used following a specific set of rules (which you can find on its documentation page: reinterpret_cast conversion – cppreference.com)

For instance: if you use reinterpret_cast to go from one pointer type to another and then dereference that pointer to access it’s content, that’s likely undefined behavior.

const_cast

The goal to this operator is that the const and volatile qualifiers are never silently casted away.

To perform this operation, the source and destination types must be the same, except the const and volatile qualifiers which can differ.

This is a very dangerous operation and must be use with great caution. Always remember that casting away const from an object originally defined as const is undefined behavior.

bit_cast

Not really historical (it was introduced in C++20) but std::bit_cast was basically made to replace the std::memcpy() manual conversion.

The bit_cast can be undefined if there is no value of the destination type corresponding to the value representation produced (just like with memcpy).

Unlike reinterpret_cast, if you to go from one pointer type to another and then dereference that pointer to access it’s content using bit_cast it’s not undefined behavior if you know for sure that those bits are a valid representation of the target type. The difference here is subtle but it allows the compiler to safely make lots of cases work efficiently and do the right thing in more complex cases without invoking undefined behavior. Typical use case is for serialization.

Wrapping up

Historically, the way C-cast operator was split into four C++ operators follows three simple rules:

  • If you need to check the types dynamically, then use dynamic_cast.
  • If you can check the types statically, then use static_cast.
  • In any other case, it is reinterpret_cast or const_cast that you need, but this is very dangerous.

I’ll add to that that, in any situation, do no perform reinterpret_cast or const_cast unless you know know what you are doing. You should never ever perform these cast only because the other ones did not work.

RTTI in its wholeness is a useful –but totally optional– feature. But it is not a simple to master.

In modern C++, we want to perform checks as much as possible at compile time (for security and performance), so when we are able, we want to use static features instead of dynamic ones.

Of course, you should not force static code where dynamic code would be better, but you should always think of a static solution before a dynamic one.

Author: Chloé Lourseyre
Editor: Peter Fordham

Addenda

Sources

Notes

1. As much as I consider myself an expert in the C++ language, my knowledge of the C language is much more limited. There may be errors in this subsection. If so, please tell me in comments so I can edit the article.

2. You can also cast away the const qualifier through the pointer cast (https://onlinegdb.com/8HIJIeonA) but I don’t think it’s a whole different way to cast.

3. For instance, if px is a pointer to an object of type X (implemented as B) and B is a derived class of X that has a method g. you need to write ((B*)px)->g() to call g from px. A simpler syntax could have been px->B::g().

One thought on “[History of C++] The genesis of casting.”

Leave a Reply