The Taming of the Pointer – Part 2
In last Wednesday’s post I mentioned three ways in which pointers are troublesome in C/C++, and I introduced the null eCv keyword to mitigate one of them. Now I’ll turn to the second issue: the fact that given (say) a variable or parameter of type int*, the type does not allow us to determine whether it refers to a single int, or to an array of ints – nor, if it refers to an array, can we find how many elements the array contains.
There are two solutions to this. The best one is not to use arrays at all, except for string literals and static tables of constant data. If we are writing in C++, then we can use generic collection classes instead of arrays. I find the most useful classes are an Array class that mimics an array containing N elements of type T, and a Vector class that contains a variable number of up to N elements. [The Standard Template Library collection classes are more flexible, but rely on dynamic memory allocation, so are usually not appropriate in critical embedded systems.] If you’re new to C++, you might be interested to learn that arrays are rarely used by good C++ programmers (look up arrays are evil in a search engine if you want to know more).
I’ll say more about the Array and Vector classes in another post. If you’re stuck with C or with an “embedded C++” compiler that doesn’t support generics, or if you’re trying to implement the Array or Vector classes themselves, then you need a different solution.
eCv provides some features to tame array pointers. First, it requires them to be qualified with the keyword array. Here’s an example :
void copyError(const char * array msg, char * array dst, int dstSize) { ... }
The presence of the array keyword tells eCv that the msg and dst parameters point to an arrays rather than single values. If you leave it out, then eCv will not allow you to perform indexing or any other sort of pointer arithmetic on those parameters. When you compile the code, array becomes another macro with an empty expansion, so the compiler doesn’t notice it.
In this example, we’ve passed the size of the destination buffer in a separate int parameter, so that the code can limit how many characters it writes. However, in writing specifications, we often need to talk about the size of the array that a pointer points to even when we don’t have it available in a separate parameter. How do we do this? Well, eCv considers an array pointer to be a struct comprising three values: the pointer itself, the lower bound (i.e. index of the first element), and the limit (i.e. one plus the index of the last element). To refer to the lower bound or limit of dst we use the syntax dst.lwb or dst.lim respectively. We also allow dst.upb (for upper bound), which is defined as (dst.lim – 1). Of course, you can’t refer to these fields in code, but you can use them in specification constructs (e.g. preconditions, invariants, assertions) as much as you like. We call them ghost fields because they aren’t really there.
For example, let’s specify that when the copyError function is called, it assumes that dst points to an array with at least dstsize elements available. Here’s how we can specify that:
void copyError(const char * array msg, char * array dst, int dstSize) pre(dst.lim >= dstSize) { ... }
An array pointer in C/C++ may only address the first element of an array, or one-past-the-last element, or any element in between. If p addresses the first element of an array of N elements, then p.lwb == 0 and p.lim == N. If it addresses one-past-the-last element, then p.lwb = -N and p.lim = 0. So p has implicit invariant p.lwb <= 0 && p.lim >= 0.
Where does all of this get us? Well, within the body of copyError, eCv will attempt to prove that all accesses to msg and dst are in bounds. For example, the expression dst[i] has precondition dst.lwb <= i && i < dst.lim. Also, wherever copyError is called, eCv will attempt to prove that the precondition holds. So anywhere that buffer overflow is possible, there will be a corresponding a failed proof. If all the proofs succeed, and provided that no function with a precondition is ever called by external unproven code, we know that buffer overflow will not occur.