Taming Pointers in C/C++ - Part 1
When doing verification or deep static analysis of C and C++, pointers are troublesome in several ways:
- Zero (i.e. NULL) is an allowed value of every pointer type in the language. Occasionally we want to allow null pointers, for example in the link field of the last element of a linked list. More usually, we don’t want to allow null. Verification requires that anywhere we use the * or [ ] operator on a pointer, we can be sure that it is not null.
- C and C++ do not distinguish between pointers to single variables and pointers to arrays. So, where we have a parameter or variable of type T*, we can’t tell whether it is supposed to point to a variable or an array. If it points to a single variable, then we mustn’t do pointer arithmetic or indexing on it. The verifier must be able to check this.
- Array parameters in C/C++ are passed as pointers. Aside from the problem that we can’t distinguish array pointers from pointers to single variables, we also have the problem that there is no size information contained in an array pointer.
- Anywhere we use pointers to mutable data, there is the possibility of aliasing. In other words, there may be more than one pointer to the same data. The verifier needs to take account of the fact that changes to data made through one pointer may affect the values subsequently read through another pointer.
Although pointers are less troublesome in Ada, the aliasing problem still exists. The SPARK verifiable subset of Ada handles this by banning pointers altogether. Unfortunately, this isn’t an option in a C/C++ subset for critical systems, because pointers are the only mechanism for passing parameters by reference.
I’ll deal with the issue of unwanted nullability first. One solution is to add invariants stating that particular variables of pointer type cannot be NULL. Similarly, where a function takes parameters of pointer type, we can write preconditions that these parameters cannot be NULL. Here are a couple of examples:
struct Status1 { const char* message; ... invariant(message != NULL) } void sum(int *arr, int size) pre(arr != NULL) { ... }
The problem with this approach is that you need a lot of these invariants and preconditions because, more often than not, it makes no sense to allow NULL. So in eCv we take the opposite approach. We assume that pointers are not allowed to be NULL except where you say otherwise. In the above examples, you can leave out the precondition and invariant if you don’t want to allow either pointer to be NULL.
To tell eCv that a pointer is allowed to be NULL, you flag it with the null attribute, like this:
struct Status2 { const char* null message; ... } void setMessage(const char * null msg) { ... }
This greatly reduces the amount of annotation needed, because the null annotation is more concise than a precondition or invariant, and it is needed less often. As you might expect, null is another macro defined in ecv.h that expands to emptiness when you compile the code. Syntactically, it behaves like const or volatile.
That’s all for today – I’ll discuss how we handle the other problems with pointers later.