Home is abroad. Un Pierre à l'étranger.

Aller au contenu | Aller au menu | Aller à la recherche

jeudi 12 juillet 2012

My very first compiler bug report

I must say I am surprised, stunned even. Is there anyone over there using CLang for any C+ +? For I have found a bug in the compiler. But not a subtle, use-the-very-latest-features kind of bug! No! The bug is triggered by plain old C++, and is probably the result of two different optimization processes.

The bug

Here is a sample code to demonstrate it:

// File test_assign_minim.cpp
struct A
  A() : a(0), b(true) { }
  long a;
  bool b;
 B : public A
  B() : A(), c(0xDEAD) { }
  unsigned short c;

void assign(A& dst, const A& src)
  dst = src;

int main()
  A a;
  B b;
  assign(b, a);
  return (b.c != 0xDEAD);

If you run this program with a buggy compiler (i.e. any version of clang will do) the program will return with an error. Simplest to test:

$ clang++ -o test_assign_minim test_assign_minim.cpp
$ ./test_assign_minim; echo $?

This should print "1" if the error is there, "0" otherwise.


So what happens? First, we will assume a few things (that hold true for Intel platforms and the clang compiler):

  1. a n-byte value must be aligned on n-byte, up to a word size. In other words, if you have a 2-bytes integer, it needs to be on an even address, a 4-bytes integer must be on a an address that is a multiple of 4, but a 32-bytes value needs to be on a 8-bytes address for a 64 bits machine and 4 bytes address for a 32 bits machine.
  2. "long" is the size of a word (i.e. 4 bytes for 32 bits machine and 8 bytes for 64 bits machines)
  3. "short" is 16 bits (or 2 bytes)
  4. "bool" is 1 byte long

A first consequence is that the size of the structure must be a multiple of the largest alignment requirement. Indeed, suppose you have a 4 bytes integer in your structure. This value needs to be always on an address that is a multiple of 4. Within the structure, the compiler will make sure than the variable starts at a multiple of 4 from the start of the structure. But this means the structure itself must be 4-aligned. Because in arrays the address of the next element is given by incrementing the pointer by the size of the structure, this implies the structure's size must be a multiple of the required alignment. So, on a 32 bits machine (I am lazy, but it would be almost the same for a 64 bits one), the structure A looks like this:

struct A:
0 1 2 3  4  5 6 7
| a      | b|//////|

Where "//" is some padding. From there, if you were to write an assignment operator, you might be tempted (and it seems the clang team was) by something like:

A& A::operator=(const A& other)
memcpy(this, &other, sizeof(A));
return *this;

That's really fast, it works whatever the structure is, and you don't care about padding and other nasty features. Now, let's look at the structure B:

struct B:
0 1 2 3  4  5  6 7
| a      | b|//| c  |

To optimize for space, the compiler replaced some of the padding with the extra variable c, which has the nice property that B is no bigger than A (i.e. you got a variable "for free"). Using the above assignment operator, you can see the problem: the garbage at the end of A will be copied where c is in B.


The easy way around this is to avoid finish you data structure with a small object. Just make sure the biggest object is the last one. In the case of A, you would end up with this structure:

struct A1:
0  1 2 3  4 5 6 7
| b|//////| a      |

which has no padding in the end, so the new B would be:

struct B1:
0  1 2 3  4 5 6 7  8 9 1011
| b|//////| a      | c  |////|

On the downside, the structure is now bigger, but there won't be any problem.


It took me quite a long time to find the bug. In my case, the extra variable was a reference counter, that ended up reset to random value every time a similar assignment happened. Structures were deleted many times, reference count reached in-understandable values ... and of course I never suspected a compiler error.

I think this is a very simple error. I am really puzzled to be the first one to find it (or at least to report it). The only reason I can think of is that very few teams are actually using this compiler for the C++.

Of course, I have submitted a bug report here: Bug report 13329. Now, let's hope they will correct these small issues as I would really like to test my code on different compiler, if only to ensure I am not using bugs/extensions of g++.



After submitting the bug report and sending an email to the developer mailing list, the bug is now fix. The reactivity was very good, and now the compiler generate valid code for my program. The patch has to be integrated into the trunk now. But thanks to the clang team to be so reactive.

jeudi 7 février 2008

On the readability of languages

So for once a news in english as this is relevant both for french and for english speakers.

It has been some time now I am saying to people that, even though both English and French are no transparent languages (1), French writing give you more information about how words are pronounced than English writing. Most people wouldn't agree on that, based on always the same 3 French examples of the same writing pronounced two ways.

But now, I found support in "Hugo in 3 months - French", a book for English speaker to learn French. And here is an excerpt of the "Pronunciation" chapter of the book:

"Although French spelling may appear complicated, it stills remains a better guide to how words are actually spoken than English spelling is."

Voilà ! A small victory ! For references, the book is written by Ronald Overy and Jacqueline Lecanuet.

Tchô !

(1) A language is said transparent if there is a unique correspondence between writing and speaking, for example Spanish and Polish are transparent, but not German.

mardi 25 décembre 2007

Merry Christmas

That's it! The one day of this month of December (almost) everybody is waiting for is there. Santa Claus spent the whole night flying from house to house delivering presents to all the worthy children of the world. I hope you were worthy enough for him to visit your own house (going through the chimney?). If this is not the case, either you were not waiting for him (in which case he usually doesn't bother to come by ... there are already plenty of people waiting for him anyway) or you were too naughty this year and you did not deserve a present! (if the latter applies, try better next year)

At least, in my house everybody was worthy of Santa's presents ! The base of the Christmas tree is hidden behind the pile of presents he left. We'll open them at noon, when my grand-ma will come to eat with us.

Sapin de Noël

At last, a big news (in case you are language-blind): this blog will now contain posts in english! For this first post, it is even bilingual (there is a slightly equivalent post in french), but I really don't know how things will evolve ... time will tell!

I wish you a very merry Christmas! And for those who were not waiting for Santa, nice holidays !