Expressing the Inexpressible
When writing preconditions, postconditions and other specifications for C programs, sometimes we need to write expressions that can’t be expressed in plain C. That’s why formal verification systems based on annotated programming languages almost always augment the expression sublanguage with forall and exists expressions. In previous posts, I’ve introduced eCv’s implementations of these. For example, the following expression yields true if all elements of the array arr are between 0 and 100 inclusive:
forall ind in arr.indices :- arr[ind] >= 0 && arr[ind] <= 100
Here, ind is declared as a bound variable that ranges over the values in the expression that follows the keyword in, which in this case is all the indices into arr.
Similarly, this expression:
exists ind in 0..9 :- arr[ind] >= 0 && arr[ind] <= 100
yields true if at least one of the first ten elements is in the range 0 to 100. I’ve used one of eCv’s special operators here: the range operator “..”, which yields a sequence of values from its first operand up to its second operand. In fact, in the first example, arr.indices is defined as arr.lwb .. arr.upb, so I was implicitly using the range operator there too.
Providing forall and exists to quantify over finite sets and sequences lets us express many sorts of specification, but isn’t always enough. Very occasionally, we need to use forall or exists to quantify over a potentially infinite type, which eCv also allows. For example, the following expression yields true if p(x) is true for all values of x belonging to type T:
forall T x :- p(x)
However, there are many cases in which a specification still cannot be expressed. For example, consider the following function for averaging a set of readings stored in an array:
int16_t average(const int16_t * array readings, size_t numReadings) pre(readings.lwb == 0; readings.lim == numReadings) pre(numReadings != 0) post(result == ? /* sum of elements of readings */ /numReadings ) { int sum = 0; size_t i; for (i = 0; i < numReadings; ++i) keep(i <= numReadings) keep(sum == ? /* sum of first i elements of readings */ ) decrease(numReadings - i) { sum += readings[i]; } return (int16_t)(sum/numReadings); }
I’ve included some eCv annotations (highlighted like this) in this example, to specify that the valid indices of readings are 0..(numReadings – 1), and that numReadings isn’t zero so that the final division operation will be valid. I’ve also provided a loop invariant and loop variant (see my posts on verifying loops if you aren’t familiar with these). However, in the postcondition I’ve put a question mark where I need to express “sum of all elements of readings“, and in the loop invariant I’ve put another question-mark where I want “sum of the first i elements of readings“. How can we express these quantities?
In this case, there is an easy way, and another way that is less easy but more general. Let’s start with the easy way. eCv is derived from Perfect Developer, and when writing eCv specifications you can use most of the library types and expression types provided by PD. In particular, type seq of T from PD is treated as equivalent to T[ ] in C. So we can use Perfect sequence operations on C arrays. A list of member functions of seq of T can be found in the Perfect Developer Library Reference. In PD, most of these functions are available for use anywhere, since code can be generated for them; but when used in eCv, they are of course all “ghost” functions – that is, functions that can be used in specifications only.
The particular Perfect expression type we need here is the over expression, which expresses repeated application of an operator over the elements of a collection. Those with a background in functional programming may recognise it as left-fold. Our postcondition can be written:
post(result == (+ over readings.all)/numReadings)
We’ve had to use readings.all rather than just readings as the operand of +over, because readings is an array pointer, and we need to provide a genuine array to +over. Hence eCv provides the array pointer type with ghost member all. You can think of readings.all as yielding the sequence readings[readings.lwb], readings[readings.lwb + 1] and so on up to readings[readings.upb]. We’d have liked to use *readings to mean the array that readings points to, but of course in C that just yields the value of the first element.
For the loop invariant expression, we can use +over again, but we need to apply it to the first i elements of readings rather than all elements. The seq of T class provides member take for this purpose, allowing us to use:
keep(sum == + over readings.all.take(i))
That’s enough to verify our function, apart from dealing with a potential integer overflow when summing the elements, which I’ll return to in a later post. Next time I’ll demonstrate how we can use a ghost function to define the notion of summation without recourse to over-expressions.