Author: Chloé Lourseyre
Recently, Joe Groff @jckarter tweeted a very interesting behavior inherited from C:
Obviously, it’s a joke, but we’re gonna talk more about what’s happening in the code itself.
So, what’s happening?
Just to be 100% clear, double(2101253)
does not actually double the value of 2101253
. It’s a cast from int
to double
.
If we write this differently, we can obtain this:
#include <cstdio>
int main() {
printf("%d\n", 666);
printf("%d\n", double(42));
}
On the x86_64 gcc 11.2
compiler, the prompt is as follows:
666
4202506
So we can see that the value 4202506
has nothing to do with the 666
nor the 42
values.
In fact, if we launch the same code in the x86_64 clang 12.0.1
compiler, things are a little bit different:
666
4202514
You can see the live results here: https://godbolt.org/z/c6Me7a5ee
You may have guessed it already, but this comes from line 5, where we print a double
as an int
. But this is not some kind of conversion error (of course that your computer knows how to convert from double
to int
, it will do it fine if this was what was happening), the issue comes from somewhere else.
The truth
If we want to understand how it works that way, we’ll have to take a look at the assembly code (https://godbolt.org/z/5YKEdj73r):
.LC0:
.string "%d\n"
main:
push rbp
mov rbp, rsp
mov esi, 666
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov rax, QWORD PTR .LC1[rip]
movq xmm0, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 1
call printf
mov eax, 0
pop rbp
ret
.LC1:
.long 0
.long 1078263808
(use this Godbolt link to have a clearer matching between the C++ code and the assembly instructions: https://godbolt.org/z/5YKEdj73r)
In the yellow zone of the assembly code (lines 6-to 9, the equivalent to printf("%d\n", 666);
) we can see that everything’s fine, the 666
value is put in the esi
register and then the function printf
is call
ed. So it’s an educated guess to say that when the printf
function reads a %d
in the string it is given, it’ll look in the esi
register for what to print.
However, we can see in the blue part of the code (lines 10 to 14, the equivalent to printf("%d\n", double(42));
) the value is put in another register: the xmm0
register. Since it is given the same string as before, it’s pretty guessable that the printf
function will look into the esi
register again, whatever there is in there.
We can prove that statement pretty easily. Take the following code:
#include <cstdio>
int main() {
printf("%d\n", 666);
printf("%d %d\n", double(42), 24);
}
It’s the same code, with an additional integer that is print in the second printf
instruction.
If we look at the assembly (https://godbolt.org/z/jjeca8qd7):
.LC0:
.string "%d %d\n"
main:
push rbp
mov rbp, rsp
mov esi, 666
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov rax, QWORD PTR .LC1[rip]
mov esi, 24
movq xmm0, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 1
call printf
mov eax, 0
pop rbp
ret
.LC1:
.long 0
.long 1078263808
The double(42)
value still goes into the xmm0
register, and the 24
integer, logically, ends up in the esi
register. Thus, this happens in the output:
666
24 0
Why? Well, since we asked for two integers, the printf
call will look into the first integer register (esi
) and print its content (24
, as we stated above), then look in the following integer register (edx
) and print whatever is in it (incidentally 0
).
In the end, the behavior we see occurs because of how the x86_64
architecture is made. If you want to learn more about that, follow these links:
What does the doc say?
The truth is that according to the reference (printf, fprintf, sprintf, snprintf, printf_s, fprintf_s, sprintf_s, snprintf_s – cppreference.com):
If a conversion specification is invalid, the behavior is undefined.
And this same reference is unambiguous about the %d
conversion specifier:
converts a signed integer into decimal representation [-]dddd.
Precision specifies the minimum number of digits to appear. The default precision is 1.
If both the converted value and the precision are ​0​ the conversion results in no characters.
So, giving a double
to a printf
argument where you are supposed to give a signed integer is UB. So it was our mistake to write this in the first place.
This actually generates a warning with clang. But with gcc, you’ll have to activate -Wall
to see any warning about that.
Wrapping up
The C language is a very, very old language. It’s older than the C++ (obviously) that is itself very old. As a reminder, the first edition of the K&R has been printed in 1978. This was thirteen years before my own birth. And unlike us humans, programming languages don’t age well.
I could have summarized this article with a classic “don’t perform UB”, but I think it’s a bit off-purpose this time. So I’ll go and say it: don’t use printf
at all.
The problem is not with printf
itself, it’s with using a feature from another language1 that was originally published forty-three years ago. In short: don’t write C code.
Thanks for reading and see you next week!
1. Yeah, like it or not, but C and C++ and different languages. Different purpose, different intentions, different meta. That is exactly why I always deny job offers that have the tag “C/C++” because they obviously can’t pick a side.
Author: Chloé Lourseyre
so LC1 has some value, that is moved to rax which is moved to xmm0. And this value in xmm0 is not really being used. But why xmm0 is involved here?
“This actually generates a warning with clang. But with gcc, you’ll have to activate -Wall to see any warning about that.”
gcc 9.3.0 prints a warning by default:
warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘double’ [-Wformat=]
I tried with GCC 9.3 here : https://godbolt.org/z/W4TjbsK8h
The warning doesn’t show. I don’t have an explaination (yet), but maybe it depends on the platform?
It doesn’t give a warning unless you ask for one with “-Wall” or “-Wformat”.
I’m not sure why I thought it would generate one by default. My mistake, apparently.
The example does not compile because it is not valid C.
What example does not compile?
It’s not C, it’s C++.
double(2101253) is not a cast from int to double.
it’t not?
https://floooh.github.io/2018/06/02/one-year-of-c.html
https://floooh.github.io/2019/09/27/modern-c-for-cpp-peeps.html
What’s the point of this article? You’re actually misusing printf from a C++ program, this is not C.
In C, you can’t have #include . So, let’s fix that first and use stdio.h. Then compile it with a C compiler, and what you get is an error (this is not valid C!):
error: expected expression before ‘double’
5 | printf(“%d\n”, double(42));
I don’t say that this is C code, I say that `printf()` is a C feature, and try to explain why we shouldn’t use a C feature in C++.