Dealing with integer overflows

Author: Chloé Lourseyre

Integer overflows are a pain. There is no consistent and easy way to detect them, let alone to statically detect them, and the will make your program inconsistent.

This article will talk about both signed and unsigned integer overflow, without distinction. Even if signed overflow is undefined behavior while unsigned is not, 99.9% of the time you want neither of them.

How to detect integer overflows?

While they are few, there are ways to detected integer overflows. They just either lack consistency or easiness.

I will present a few way to detect overflow in this section. If you happen to know other ways to do so, feel free to share it in comments, so it can benefit to everyone.

UBSan

Undefined Behavior Sanitizer, UBSan for short, is a runtime undefined behaviour checker.

It has the ability to detect integer overflows in the form of compilation options (though it is supposed to check UBs, it also do us the favor to check unsigned overflows):

clang++ -fsanitize=signed-integer-overflow -fsanitize=unsigned-integer-overflow

It is implemented for both clang and GCC:

I may be wrong, but I did not see any integration in other compilers.

The main downside of a dynamic checker like this one is that you’ll have to do a full compilation and an exhaustive test run of your program to detect overflows. If your test run is not exhaustive enough, you run the risk to let an overflow slip.

Write adequate unit tests

If your project implement unit tests (I could argue that every project should, but it’s not always up to the developers), then you have a pretty direct way to check for overflows.

For the data and functions that can accept very high numbers, just throw big ints into them. You can then check if the features perform correct evaluations, throws an exception, returns an error code, or do whatever it is supposed to do in these cases.

If you don’t know what result to expect when you put a big int into them, because the result would be too big, and there is no error-handling, then your feature is unsafe. You must put so error-handling in them to ensure no overflow will happen.

Sometime, detecting a potential overflow this way will requires weighty refactoring. Don’t be afraid to do it, you are better safe than sorry.

Don’t put your code in a situation prone to overflows

The best way be sure there is no overflow is to prevent overflows. If you ensure that no overflow can arise from the code you write, you won’t have to detect them.

The next section will give a few practices to help you in that regard.

How to prevent integer overflows?

Use 64-bits integers

One very good way to prevent integer overflows is to use int64_t to implement integers. In most case, 64-bits ints will not commit overflow, unlike their 32-bits counterparts.

There is actually very few downsides in using int64_t instead of int32_t. Most of the time, you won’t care about the performance gap or the size gap between them. Only if you work on embedded software or on processing algorithms you may care about performance or data size constraints (by “processing algorithms” I mean algorithms supposed to have top-notch performance, like in signal processing).

Just note that longer integers does not always mean slower calculation, and with the full set of pointer / reference / forwarding, you don’t copy whole data structures very often.

And in any case, even if you are performance and/or size sensitive, always remember:

First, be safe. Then, be fast.
(citation by me)

Trying to over-optimize and risk integer overflow is always worse than to write safe code then use tools to pinpoint where the program must be optimized.

So I recommend that, unless your integer has a hard limit and is very small, use a int64_t.

About int64_t performances

I ran a few basic operations (addition, multiplication and division) on both int32_t and int64_t to see if there were any significant differences.

Using clang, using two different optimization levels, here are the results:

This may not surprise you, but in every case it’s the division which is the slowest. If you don’t have to use divisions, you will notice no performance gap between 32 bits and 64 bits. However, if you opt for divisions, 32-bits integers are more suitable (but just because you’re using divisions you won’t have top-notch performance).

Gentle reminder about basic types

If you ever wonder why I used the types int32_t and int64_t instead of the plain old int and long, it’s because these two type don’t have a pre-defined size.

The only size constraints that the C++ standard applies on int and long are the following:

int is at least 16 bits
long is at least 32 bits.
The size of long is greater than or equal to the size of int, bool and wchar_t, and is lesser than or equal to the size of long long
The size of int is greater than or equal to the size of short and is lesser than or equal to the size of long

Because of that, please refrain from using plain old ints and longs when you want to avoid overflows.

Don’t assume that because a value is in range, it’s overflow-safe

Let’s say, you have a int32_t my_val that represents a data which max value is one billion (1 000 000 000). Since the max value of a int32_t is 2³¹-1 (2 147 483 647), you may think it won’t cause overflow.

But one fateful day, an random dev unknowingly writes this:

#include <cstdlib>

const int32_t C_FOO_FACTOR = 3;

int32_t evaluate_foo(int32_t my_val)
{
    // foo is always C_FOO_FACTOR times my_val
    return my_val * C_FOO_FACTOR;
}

You called it? Integer overflow. Indeed, there are values of my_val that can cause an overflow when multiplied by 3.

So whose fault it is? Should we check for an overflow when we add or multiply? How could we do this?

Well, there is one simple practice that can help you avert most of the cases similar to this example. When you have to store an integer that is relatively big, even if it can’t overflow by himself, just put it in a bigger data type.

For instance, I never put a value that can be bigger than 2¹⁵ in a data type that can hold the max value of 2³¹. This way, even if we multiply the value with itself, it doesn’t overflow.

With this method we can keep lower data types in smaller structure with no side-effect. C_FOO_FACTOR can stay as a int32_t, the result will be adequately promoted if it’s used in an operation including a bigger type.

E.g.:

#include <cstdlib>

const int32_t C_FOO_FACTOR = 3;

int64_t evaluate_foo(int64_t my_val)
{
    // foo is always C_FOO_FACTOR times my_val
    return my_val * C_FOO_FACTOR; // The result of the multiplication is a int64_t
}

Use auto

Yes, auto can sometime be a lifesaver when you’re not 100% sure of the types you’re dealing with.

For instance:

#include <cstdlib>

int32_t  C_FOO = 42;

int64_t compute_bar();

int main()
{
    // Big risk of overflow overflow here
    int32_t foo_bar = compute_bar() + C_FOO;

    // Probably no overflow here
    auto better_foo_bar = compute_bar() + C_FOO;
}

Here, auto is useful because it prevents the error committed line 10, where the result of the operation compute_bar() + C_FOO, which is a int64_t, is converted back to a int32_t. Line 13, auto will become int64_t so no overflow will occur because of that.

(Note: integer demotion — meaning converting back to a smaller type — is actually an overflow).

There is also another specific case, one that doesn’t occur often, where auto can be useful. Consider the following code:

#include <cstdlib>

int32_t  C_FOO = 42;

int32_t compute_bar();

int main()
{
    // Big risk of overflow overflow here
    auto foo_bar = compute_bar() + C_FOO;
}

There, the return value of compute_bar() is int32_t. But later, the author of this function, seeing that the type is too small, changes it to a int64_t, like this:

#include <cstdlib>

int32_t  C_FOO = 42;

int64_t compute_bar();

int main()
{
    // Big risk of overflow overflow here
    auto foo_bar = compute_bar() + C_FOO;
}

Here, the auto automatically “promoted” to int64_t, avoiding an implicit conversion. If it was int32_t instead of auto at the beginning, then there would have a risk that the developer who edited the signature of compute_bar() did not correct the type of the variable foo_bar, without rising any compilation error or warning. So the usage of auto in this case made us dodge the bullet.

Wrapping up…

Always beware, when you’re dealing with big integers, to use big data type. Use auto when you’re unsure of what you’re dealing with, and use analyzers if you think the code may hold an overflow. And, as always, write good unit tests.

If you personally know of other ways to detect and/or prevent integer overflows, feel free to share in comments.

This is overhaul the best you can do to avoid and detect integer overflows.

Thanks for reading and see you next week!