How Big is a Size and Why Should You Care?

While browsing the sources of some programs, I often notice parts that make asusmptions that may initially seem safe, but tend to turn around and bite you in the ass in pretty bad ways some time later. One of the most frequently occuring ones is the assumption that an int is big enough to hold the size of a C object, i.e. in snippets like the following:

struct foo {
    /* Some huge structure */
};

/* Then further down... */
int bytes, elements;
struct foo *ptr;

elements = 100000;
bytes = elements * sizeof(foo);
ptr = malloc(bytes);

This may look safe if you happen to test it on just one platform, using only one compiler. Then a day comes when you have to port this program to a 64-bit architecture, which happens to have a minor quirk: the size of objects can be huge, very larger than your average 32-bit int, if you have the virtual memory to store such big objects.

That’s exactly where the fun begins. Strange crashes and memory corruption start happening. You have to spend hours in a debugger, tracking down where something goes wrong. Then you find out that the value passed to malloc() is not really what you expected it to be. A quick test program is whipped up, to find out when & where this happens:

     1  #include <stdio.h>
     2  #include <stdlib.h>
     3  
     4  #define ELEMENTS        100000
     5  #define ELT_SIZE        1000000
     6  
     7  int
     8  main(void)
     9  {
    10          int int_size;
    11          size_t size_t_size;
    12  
    13          printf("sizeof int = %zd bytes\\n", sizeof(int));
    14          printf("sizeof size_t = %zd bytes\\n", sizeof(size_t));
    15  
    16          int_size = (int)ELEMENTS * ELT_SIZE;
    17          size_t_size = (size_t)ELEMENTS * (size_t)ELT_SIZE;
    18  
    19          printf("int size = %d, size_t size = %zd\\n", int_size, size_t_size);
    20          return EXIT_SUCCESS;
    21  }

This is run on a 32-bit system, and you see something that is definitely not equal to a hundred billion bytes:

sizeof int = 4 bytes
sizeof size_t = 4 bytes
int size = 1215752192, size_t size = 1215752192

Then you test this on an amd64 system, and something different is printed there, depending on whether you build a 32-bit or 64-bit binary!

$ cc -xarch=386 foo.c 
"foo.c", line 16: warning: integer overflow detected: op "*"
$ ./a.out 
sizeof int = 4 bytes
sizeof size_t = 4 bytes
int size = 1215752192, real size = 1215752192
$ cc -xarch=amd64 foo.c 
"foo.c", line 16: warning: integer overflow detected: op "*"
$ ./a.out 
sizeof int = 4 bytes
sizeof size_t = 8 bytes
int size = 1215752192, real size = 100000000000
$

There are two bugs at work here. Two of the most dangerous bugs that you can create when calculating the size of objects in C:

  • The int type is not the right type for storing the type of objects; the size_t type is.
  • Even size_t can overflow,so an explicit check is needed to avoid that.

The first one is easy to fix, by never using a plain int to store the size of a C object. This includes the size of structs, the size of unions, the size of any sort of array, the size of data buffers, etc. The reason why this is necessary is obvious in the amd64 64-bit program case above. A plain int cannot hold the value of the largest size_t available in this system and build type, because an int uses 4 bytes (i.e. 32-bits) to store its value but a full blown size_t needs 8 bytes (for its 64-bits of value). In a system like this, it may be ok to store int values to size_t objects. The reverse is not always true though :-)

Avoiding the overflow that happens even with size_t is a bit trickier. You have to guard against it and check to make sure that it doesn’t happen, before doing a multiplication. Using the SIZE_MAX constant from the <limits.h> header, you can add a safety check before doing the multiplication:

     1  #include <limits.h>
     2  #include <stdio.h>
     3  #include <stdlib.h>
     4  
     5  #define ELEMENTS        100000
     6  #define ELT_SIZE        1000000
     7  
     8  int
     9  main(void)
    10  {
    11          size_t bytes;
    12  
    13          if ((SIZE_MAX / ELT_SIZE) <  ELEMENTS) {
    14                  printf("Overflowed.\\n");
    15                  return EXIT_FAILURE;
    16          }
    17  
    18          bytes = (size_t)ELEMENTS * ELT_SIZE;
    19          printf("size = %zd bytes\\n", bytes);
    20          return EXIT_SUCCESS;
    21  }

This program works correctly both when built as a 32-bit binary and when built as a 64-bit binary:

$ cc -xarch=386 foo.c
$ ./a.out 
Overflowed.
$ cc -xarch=amd64 foo.c
$ ./a.out 
size = 100000000000 bytes
$

“Correctly” being, in this case, a behavior that doesn’t include a crash, a memory buffer overrun, a random heap corruption because of the miscalculated number of bytes, and all those interesting things we see in many programs, which have been written in a slightly careless, insecure manner.

Advertisements

One thought on “How Big is a Size and Why Should You Care?

Comments are closed.