Verifying loops – Part 2
Last time I showed how it was possible to analyse a loop-free and recursion-free program or function to determine its semantics (i.e. its weakest precondition and its postcondition), but that this didn’t work when we have loops. To make it possible to analyze loops thoroughly, eCv (along with other formal systems) requires you to provide loop invariants.
What exactly is a loop invariant? Well, it’s a Boolean expression that depends on all the variables modified by the loop, and is true when the loop is first entered, during each iteration of the loop, and when the loop terminates. Typically, it comprises two parts.
The most important part of the loop invariant is a generalization of the state that the loop is intended to achieve. It needs to be written so that it is easy to establish by suitable initialization of variables, yet it becomes exactly the desired state when the loop terminates. I’ll refer to the desired state when the loop terminates as the loop postcondition.
Let’s take a simple example. Suppose we have an array a of integers, and we want to set every element in that array to k. Here’s a function to do that:
void setArray(int * array a, size_t size, int k) pre(a.lwb == 0) pre(a.lim == size) post(forall j in 0..(size - 1) :- a[j] == k) { size_t i; for (i = 0; i != size; ++i) { a[i] = k; } }
The first precondition says that a is a regular pointer to the start of an array. The second one says that the number of elements is given by size. The postcondition says that when the function returns, for all indices j in the range 0 to (size – 1), a[j] is equal to k.
In this case, the loop comprises the entire body of the function, so the loop postcondition is the same as the function postcondition. In fact, loop postconditions are very frequently forall expressions, especially for loops that iterate over an array.
We need to generalize the forall expression in the postcondition so that it is true at the start of every iteration of the loop. What is the state when we are about to commence the ith iteration? Well, at that point we’ve already set all elements from zero to i-1 inclusive to k, but not the elements from i onwards. We can express this with a slight modification to the forall expression:
forall j in 0..(i - 1) :- a[j] == k)
Do you see what I’ve done? I’ve changed the upper bound of the forall from (size – 1) to (i – 1). This gives us exactly what we need for the loop invariant:
- The loop initialization sets i to zero, so the initial bounds of the forall are 0 .. -1 (that is, from zero up to minus one). This is an empty range (because -1 is less than 0), and a forall over an empty range is true. So the loop invariant is established at the start of the first iteration.
- During each iteration, we set the ith element to k and we increment i. So the invariant is preserved. For example, after the first iteration, the forall has range 0..0 so it just tells us that a[0] == k, which is exactly right. After the second iteration, the range is 0..1 so it says that a[0] and a[1] have value k, which is again exactly right.
- The loop terminates when i == size, because I wrote the while-condition as i != size. If we replace i by size in the invariant, we get exactly the desired postcondition.
eCv expects the loop invariant to be written between the loop header and the body, like this:
void setArray(int * array a, size_t size, int k)
pre(a.lwb == 0)
pre(a.lim == size)
post(forall j in 0..(i - 1) :- a[j] == k)
{ size_t i;
for (i = 0; i != size; ++i)
keep(forall j in 0..(i - 1) :- a[j] == k)
{ a[i] = k;
}
}
eCv uses the keyword keep to introduce the invariant, because we’re keeping the invariant true.
Is this enough to allow eCv to verify the loop? Unfortunately not. Using the above, eCv reports that it is unable to prove that a[j] is in-bounds in the loop invariant, or that a[i] is in-bounds in the loop body. So we need another component of the loop invariant, to ensure that these accesses are always in bounds. To make sure that a[i] is in bounds, we need to constrain i to be in the range 0..(size - 1) at that point. But we can’t constrain i to this range in the invariant, because when the loop terminates, i ends up with the value size, which is just outside this range (remember, the invariant has to be true not only at the start of every iteration, but also when the loop terminates). So instead we constrain i to the range 0..size. The body isn’t executed when i == size, so that constraint is sufficient to guarantee that a[i] is in bounds in the body. The program then looks like this:
void setArray(int * array a, size_t size, int k) pre(a.lwb == 0) pre(a.lim == size) post(forall j in 0..(i - 1) :- a[j] == k) { size_t i; for (i = 0; i != size; ++i) keep(i in 0..size) keep(forall j in 0..(i - 1) :- a[j] == k) { a[i] = k; } }
I’ve put this new keep clause before the original one. The reason is that eCv reported that it couldn’t prove that a[j] was in bounds in the original keep clause. Putting the new keep clause first allows eCv to assume that i is in 0..size in the second keep clause. That is enough for it to prove that a[j] is in bounds, because j takes values from 0 to i – 1.
Have we finished? Well, eCv is able to prove that the above function meets its specification, if the loop terminates. To prove the loop terminates, we’ll need to provide a loop variant too. I’ll introduce loop variants next time.