Verifying programs that use ‘sizeof’
Consider the following code snippet (based on a real example of critical embedded software), whose purpose is to serialize some data and send it to another piece of hardware:
enum errorCode sendData(uint32_t address, const void * data, uint8_t size) pre(size <= data.lim); enum errorCode receiveData(uint32_t address, void * data, uint8_t size) pre(size <= data.lim); struct S { int32_t a; uint16_t b; char c[40]; } myVar; ... sendData(someAddress, &myVar, sizeof(struct S)); ...
In order to verify this code, eCv needs to prove that sizeof(s) <= maxof(uint8_t), otherwise the size parameter in the call to sendData will overflow. [Recall that eCv provides the maxof() type operator that yields the maximum value that can be stored in that type.] The problem is that sizeof a struct depends not only on the sizes of the fields of the struct, but also on any padding inserted by the compiler. eCv does know the sizes of primitive types (these are part of the compiler/platform configuration that the user is required to provide), however it does not know how much padding the compiler will insert. The C standard does not place an upper limit on the amount of padding.
What can be done about this? First, we should note that the code is non-portable. If the receiving hardware uses a different processor than the sending hardware, or its software was compiled using a different C compiler, then the receiving software may represent struct S using different padding rules from the sending software, and data from the sender will not be faithfully reproduced in the receiving software. In a non-embedded system, we’d probably transmit the data using XML or some other hardware-independent encoding. However, this adds considerable complexity, and is likely to be overkill for embedded software.
How about simply changing the type of the size parameter to some larger type? It may appear that size_t is the natural type to use, however the size of a size_t is platform-dependent, so it’s safer to use a fixed-width type like uint32_t instead. But what if we want to continue using uint8_t ? In the original software on which this example is based, the message protocol encoded the size in a single byte, and if we wish to maintain compatibility with existing hardware, we can’t change that.
In this particular case, the software designer has carefully declared the fields of struct S in decreasing order of element size. So is it highly likely that the compiler will not insert any padding between fields (note that this is not necessarily the case if the struct has any fields of a primitive type whose size is not a power of 2, such as the 80-bit long double type on the x86 processors). The compiler may still add padding at the end. In this case, the total size of the fields of struct S is 46 bytes. On most 32-bit architectures, it is likely to be padded out to 48 bytes, so that if we declare an array of these structs, the int32_t field is always aligned on a 4-byte boundary.
We can force eCv to assume a particular value for sizeof(struct S) by using an assume directive:
struct S {
int32_t a;
uint16_t b;
char c[40];
} myVar;
assume(sizeof(struct S) == 48)
However, that’s not very robust. An alternative is to use the min_sizeof() type operator that eCv provides. This returns the minimum possible size of its type argument, based on the primitive type sizes that the user specified when running eCv and assuming the following:
- no padding in structs (either between elements or at the end);
- bit fields may straddle word boundaries (no padding between bit fields)
- the bit field designator “: 0″ only skips the remainder of the current char.
These rules should assure that sizeof(T) >= min_sizeof(T) for any conforming C implementation and any type T. So we can provide a more robust assumption, such as:
assume(sizeof(struct S) <= min_sizeof(struct S) + 4)
or even:
assume(sizeof(struct S) <= 2 * min_sizeof(struct S))
This is exactly what we added to the program on which this example is based. It seems extremely unlikely that the size of a struct would be more than double its minimum size, unless it consists mostly of short bit fields that each fit in a single char and the storage unit for bit fields is bigger than 2 chars.