Danger – unsigned types used here!
This time I’m going to talk about using unsigned types in C/C++. Modern programming languages such as C# and Java don’t provide unsigned types, with good reason (actually, C# does have unsigned types, but only for the purpose of interfacing to COM objects written in other languages, and they are not used in the .NET Framework API).
To illustrate the dangers of using unsigned types, I invite you to consider this example (in which uint16_t and int32_t are typedef’d to be unsigned 16-bit and signed 32-bit types respectively) and decide whether it makes safe use of unsigned values:
void foo(const uint16_t* reading, const size_t size) { size_t i; for (i = size - 1; i >= 0; --i) { doSomething(reading[i]); if (i > 0) { int32_t diff = reading[i] - reading[i - 1]; doSomethingElse(diff); } } }
Why are unsigned types dangerous in C/C++? Here are some reasons:
1. C/C++ provide implicit type conversions between signed and unsigned values. Unlike Ada, there is no a runtime check to make sure the value is convertible to the new type. For example, you can readily “convert” a negative signed value to an unsigned value.
2. If you mix signed and unsigned operands in an arithmetic operation, the signed operand is always converted to unsigned. This may not be what you wanted.
3. In arithmetic expressions, operands whose type is smaller than (unsigned) int get promoted to int or unsigned int. This complicates the situation – especially if the result is implicitly converted again, because the implicit type conversions may not be happening where you think they are.
4. It is very easy to underflow the minimum value of an unsigned int (i.e. zero). It is much more difficult to underflow or overflow the minimum or maximum value of a signed int, especially on a 32-bit (or greater) platform.
How many problems did you spot in the example? There are at least three. The first one is the use of size – 1 as the initial value of the loop counter i. Since size is of type size_t which is an unsigned type, size – 1 will underflow if size is zero. In this case, the loop will count down from the maximum value of a size_t rather than not executing at all. The second problem is the use of i >= 0 in the for-loop header, with i again of type size_t. The loop will never terminate, because i cannot go negative. The third problem concerns the assignment of reading[i] – reading[i - 1] to diff. Suppose reading[i - i] is greater than reading[i], for example by one. Will diff end up as -1 ? Unfortunately, that depends on your compiler and target platform. If uint16_t maps to unsigned short and int32_t maps to int, and assuming 2′s complement hardware, then yes. Both readings will be promoted to unsigned int prior to subtraction, yielding 0xFFFFFFFF. This is then converted implicitly to int for the assignment to diff, yielding -1. But if uint16_t maps to unsigned int and int32_t maps to long, we get a different result. The subtraction yields 0xFFFF this time, which is converted to 0x0000FFFF for assignment to diff .
If you regularly use static analysis on your C/C++ programs (as I hope you do), you might like to check whether your static analyzer reports all three problems for this example.
What’s the best way to manage the dangers inherent in using unsigned types? One strategy is Just Say No. Don’t use unsigned types, except in very special situations. Is this really feasible? When using a 32-bit (or greater) platform, I think it is. You’ll need to define a signed size-type (i.e. an integral type with the same size as size_t) to use as the natural type for representing array indices, string lengths, and sizes. On most platforms, ptrdiff_t is a suitable type to start from. Whenever you call a library function that returns a size_t, or use a sizeof expression, you should immediately cast the result to your signed size type. You may have difficulties if you use an array or string that takes up more than half the address space of the processor, but are you ever going to do that? The other thing you’ll need to do is convert unsigned data read from the hardware to signed data as soon as it is read in, after any necessary shifting and masking. If you read an unsigned value from a 16-bit A-to-D converter, you’ll need to store each value as 32-bits. Alternatively, if you need to store lots of them, you can exempt that data from Just Say No and store it as 16-bit unsigned values, doing the conversion to 32-bit signed int whenever you pick a value out from the store.
Here’s my original example using Just Say No, assuming that the readings came from a 15-bit (or less) A-to-D converter, with index_t standing for our signed size type:
void foo(const int16_t* reading, const index_t size) { index_t i; for (i = size - 1; i >= 0; --i) { doSomething(reading[i]); if (i > 0) { int32_t diff = reading[i] - reading[i - 1]; doSomethingElse(diff); } } }
The only changes I made are to replace unsigned types by signed types. You’ll still need to use unsigned values where you do bitwise operations, such as shifting and masking. But you probably only need to do these operations on raw incoming data, before storing the result as signed data. Also, if you are using bit fields, any fields that store data of Boolean or enumeration type should be declared unsigned.
If Just Say No is too radical for you, then the alternative is to embrace unsigned values, but be very careful using them. You must use a good static analyzer to detect possible problems. MISRA C 2004 rule 10.1 prohibits implicit signed/unsigned conversions (amongst others), so a static analyzer that enforces MISRA compliance should catch them. Even better, use a formal tool such as eCv. Here’s a safer version of our example that still uses unsigned types:
void foo(const uint16_t* reading, const size_t size) { size_t i; for (i = size; i != 0; ) { --i; doSomething(reading[i]); if (i > 0) { int32_t diff = (int32_t)(reading[i]) - (int32_t)(reading[i - 1]); doSomethingElse(diff); } } }
I’ve changed the loop to count down from size instead of size – 1, I’ve changed the termination condition, and I’ve moved the decrement of i to the start of the loop body. I’ve also cast the unsigned readings to signed before subtracting them.