Verifying a binary search - Part 1
In the last post, I covered some different levels of formal verification that you may be interested in, and showed how to add minimum annotation to the linearization example to allow eCv to prove predictable execution. The example provided a prototype for the binary search function it called, to which we added a minimal postcondition, so that it looked like this:
size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key) post(result <= nElems);
Let’s develop a verified implementation of this binary search function. We could write the code and then try to get eCv to verify it – either for predictable execution only, or for full functional correctness. However, it’s quite hard to get a binary search function right. So for this example, it’s more productive to use formal specification to develop an implementation that is correct by construction.
First, we need to specify exactly what bSearch is supposed to return, and what its preconditions are. Let’s start with the preconditions. We’re passing the number of elements of table in nElems, so we need to specify this:
size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key)
pre(table.lwb == 0; table.lim == nElems)
post(result <= nElems);
The precondition says that the lower bound (the lowest legal index) into table is zero – in other words, table is a pointer to the first element of an array – and that the limit of table (one plus the highest legal index) is nElems. However, this isn’t enough: the raw values need to be in increasing order too. A standard way of specifying “in increasing order” is to say that for any two indices a and b, if b > a then the element at b is greater than the element at a. Here’s that expressed as a precondition:
size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key)
pre(table.lwb == 0; table.lim == nElems)
pre(forall a in table.indices; b in table.indices
:- b > a => table[b].raw > table[a].raw)
post(result <= nElems);
The ghost member indices of an array pointer is defined by eCv as the sequence of all valid indices into the array. So table.indices means the same as table.lwb .. table.upb. The => symbol means “implies”, so b > a => table[b].raw > table[a].raw means the same as !(b > a) || table[b].raw > table[a].raw but is clearer.
Now we need to specify what the function returns. In developing linearize in the previous post, I said I was assuming a return value of 0 means the raw value is off the bottom of the table, nElems means it is off the top, otherwise the table entry indexed by the return value and the previous table entry bracket the raw value. Actually, we can simplify this. If we say that any elements with indices below the returned value have raw values less than or equal to the key, and any elements with indices at or above the returned value have raw values greater than or equal to the key, then this covers the cases of returning 0 or nElems as well. So let’s express this single constraint in a postcondition:
size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key)
pre(table.lwb == 0; table.lim == nElems)
pre(forall a in table.indices; b in table.indices
:- b > a => table[b].raw > table[a].raw)
post(result <= nElems)
post(forall i in 0..(result - 1) :- key >= table[i])
post(forall i in result..(nElems - 1) :- key <= table[i]);
This specification isn’t precise about the value we return when there is a table entry whose raw value exactly matches the key – it allows us to return the index of either that entry or the next entry. My original informal specification wasn’t precise either – it said that the two entries “bracket” the key. Given that our precondition forbids duplicate entries, we could be more precise if we want, e.g. by changing <= in the final postcondition to <.
Now we can work on the implementation. Note that when you have a function prototype declaration separate from the implementation (as here), eCv expects you to put the function specification in the prototype only, so we don’t need to repeat the preconditions and postconditions in the implementation. Here’s a rough sketch of what we want to do:
size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key) {
size_t low = 0, high = nElems;
while (high != low) {
/* increase low or decrease high, such that high remains >= low,
all elements below low have raw values <= the key,
and all elements at or above high have raw values >= the key */
}
return low;
}
Let’s express the text in the comment as a loop variant and a loop invariant (see my post on verifying loops, if you’re not familiar with these):
size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key) { size_t low = 0, high = nElems; while (high != low) keep(high >= low) keep(forall i in 0..(low - 1) :- table[i].rawValue <= key) keep(forall i in high..(nElems - 1) :- table[i].rawValue >= key) decrease(high - low) { /* do something */ } return low; }
Before we go any further, let’s check that this makes sense. eCv will need to prove that the loop invariants (the keep clauses) are true at the start of the loop. That’s easy – the initialization of low and high ensures that high >= low (which satisfies the first invariant), and that there are no table indices below low and no table indices at or above high (which satisfies the second and third invariants, because forall over an empty range is always true). Also, because we are returning low when the loop terminates, eCv will need to prove that the value of low at the end of the loop meets the postcondition on the return value. We know that when the loop terminates, the loop invariants will be true and the while-clause will be false. Therefore, we can substitute result for low (because we are returning low) and result for high (because high == low is the inverse of the while-clause) in the keep-clauses, to find out what is known about result. When we do this, the last two keep clauses magically turn into the last two postconditions – which is exactly what we want. However, this doesn’t help with the first postcondition, which requires result <= nElems. Adding an extra loop invariant low <= nElems or high <= nElems will do the trick, since substituting result for both low and high will then give us the required term. I’ll choose high <= nElems, because it is stronger and I don’t intend that high should ever exceed nElems.
We can use eCv to check that the design is OK so far, even before we code the loop body. As well as adding the new loop invariant, we’ll need to tell eCv that we intend to assign to low and high low within the loop body, which we can do like this:
size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key) { size_t low = 0, high = nElems; while (high != low) keep(high >= low) keep(high <= nElems) keep(forall i in 0..(low - 1) :- table[i].rawValue <= key) keep(forall i in high..(nElems - 1) :- table[i].rawValue >= key) decrease(high - low) { low = ?; high = ?; } return low; }
eCv allows you to use ? in place of an expression to mean you haven’t decided what goes there yet. Naturally, eCv won’t be able to prove that the loop preserves its invariants or decreases its variant. Sure enough, if we run eCv on this example, we get 6 unproven verification conditions that refer to the loop body: one for each of the loop invariants and two for the variant. However, eCv reports that everything else is OK, including that the loop invariants are satisfied by the initialization and that the return value meets all three postconditions.
So all we need to do now is to write loop body code that assigns low and/or high such that the four loop invariants are preserved and the loop variant high – low decreases. But this post is already quite long, so I’ll do that next time.