## Verifying pointer arithmetic

Today I’ll look at whether code that uses pointer arithmetic is any harder to verify than equivalent code that does not use pointer arithmetic.

Consider this function for copying an array (or part of an array) into another array (or part of another array):

voidarrayCopy(const int* src,int* dst, size_t num) { size_t i;for(i = 0; i < num; ++i) { dst[i] = src[i]; } }

To verify it, we first need to specify what it is supposed to achieve in a postcondition and add any necessary preconditions. Then we need to add a loop invariant so that eCv can verify that the loop achieves the desired state when it terminates, and a loop variant so that eCv can prove that the loop terminates:

voidarrayCopy(const int*arraysrc,int*arraydst, size_t num)writes(dst.all)pre(src.lim >= num; dst.lim >= num)pre(disjoint(src.all, dst.all))post(foralliin0..(num - 1) :- dst[i] == src[i]) { size_t i;for(i = 0; i < num; ++i)keep(i <= num)keep(foralljin0..(i - 1) :- dst[j] == src[j])decrease(num - i) { dst[i] = src[i]; } }

This is fairly straightforward. As usual, the main loop invariant
(the **keep**(**forall**…) clause) is a
generalization of the desired postcondition, and we need another invariant
(the **keep**(i <= num) clause) to ensure
that the main invariant and the body don’t violate the preconditions of the indexing operator.
These invariants and the loop variant (the **decrease**(…) clause)
are sufficient for eCv to verify this function.

Many C programmers would write the body as **dst++ = *src++;* instead
of explicitly indexing *dst *and
*src*. If we make this change, then because we
are changing *src *and
*dst* with
in the loop, we need to describe their values in the invariant.
Here is how we can do that:

voidarrayCopy(const int*arraysrc,int*arraydst, size_t num)writes(dst.all)pre(src.lim >= num; dst.lim >= num)pre(disjoint(src.all, dst.all))post(foralliin0..(num - 1) :- dst[i] == src[i]) { size_t i;for(i = 0; i < num; ++i)keep(i <= num)keep(foralljin0..(i - 1) :- (olddst)[j] == (oldsrc)[j])keep(src == (oldsrc) + i; dst == (olddst) + i)decrease(num - i) { *dst++ = *src++; } }

Applying the keyword **old **to an expression in a
loop invariant or loop variant yields
the value of that expression just before the first iteration of the loop. I’ve had to replace
*src *in the second loop invariant by
* old src* and similarly
for

*dst*, because in that invariant I am referring to array elements computed relative to the original values of

*src*and

*dst*. I’ve described the modification to

*src*and

*dst*by adding a third invariant, that says that at the start and end of any iteration,

*src*has its original value plus

*i*, and similarly for

*dst*. The post-increment operators applied to

*src*and

*dst*in the body ensure that these relationships are maintained. Once again, the function is verifiable by eCv.

What if we go further and get rid of the loop counter *i *as well?
We can do that if we precompute the end pointer *src + num* at the
start, and then iterate until *src *reaches this point. We still need to
describe the values of *src *and
*dst *in the invariant, but we must do so without the
benefit of *i*. Here’s one way of doing it:

voidarrayCopy(const int*arraysrc,int*arraydst, size_t num)writes(dst.all)pre(src.lim >= num; dst.lim >= num)pre(disjoint(src.all, dst.all))post(foralliin0..(num - 1) :- dst[i] == src[i]) {const int*array constsrcEnd = src + num;while(src != srcEnd)keep(src.base ==old(src.base))keep(0 <= src - (oldsrc))keep(src - (oldsrc) <= n)keep(foralljin0..((src - (oldsrc)) - 1) :- (olddst)[j] == (oldsrc)[j])keep(dst == (olddst) + (src - (oldsrc)))decrease(srcEnd - src) { *dst++ = *src++; } }

Here’s how I arrived at the new loop annotations:

- I’ve taken the three
**keep**-clauses from the previous version and substituted*(src – (*for the loop counter**old**src))*i*. That’s how I got the 3rd, 4th and 5th**keep**-clauses. For the last of the three, I also removed the resulting tautology*src == (*, leaving just the component that talks about**old**src) + (src – (**old**src))*dst*. - The original first invariant
placed an upper bound on**keep**(i <= n)*i*, thereby ensuring that*src[i]*and*dst[i]*in the original code or**src*and**dst*in the modified code were in bounds. I didn’t need to put a lower bound on*i*because its type was*size_t*which is an unsigned type, so its lower bound was zero implicitly. But I’ve now replaced*i*by*(src – (*, which has no such constraint. So I need to put a zero lower bound on this expression, which is what the second**old**src))**keep**-clause is for. - Unfortunately, this isn’t quite enough. The expression
*src –*uses the pointer-difference operator, which has the precondition that the operands are pointers into the same array. In general, when we re-assign an array pointer such as**old**src*src*, there is no requirement that it points into the same array as before. In this case, all we ever do to*src*is increment it; so it obviously continues to point into the same array as long as it is not over-incremented (which eCv verifies). However, eCv isn’t yet clever enough to spot this and generate an implicit loop invariant. So I’ve added the first invariant*src.base ==*right at the beginning of the loop. In the expression**old**(src.base)*src.base*,*base*is a ghost field of*src*, just like*lim*and*lwb*. It subtracts*src.lwb*from*src*, yielding a pointer to the very start of the array that*src*points into. So to state that two array pointers point into the same array, I just need to say that their bases are equal. - Rather than just substitute for
*i*in the loop variant, I’ve changed it to say that I expect the difference between*srcEnd*and*src*to decrease on each iteration.

I hope this example demonstrates to you that verification of C code that uses pointer arithmetic can be done, although it is likely to require more loop invariants than equivalent code that does not use pointer arithmetic, and getting them right may be a little harder. The number of verification conditions to be proved may be higher too – there were 24 from the original code, 32 from the second version, and 35 from the last version. So indexing should generally be preferred to explicit pointer arithmetic – as recommended by MISRA-C 2004 rule 17.4 – when writing verifiable code.

Finally, we could implement this function using *memcpy*:

voidarrayCopy(const int*arraysrc,int*arraydst, size_t num)writes(dst.all)pre(src.lim >= num; dst.lim >= num)pre(disjoint(src.all, dst.all))pre(num *sizeof(int) <= maxof(size_t))post(foralliin0..(num - 1) :- dst[i] == src[i]) { memcpy(dst, src, num *sizeof(int)); }

I’ve had to add another precondition to ensure that the computation of the number of bytes
*num * sizeof(int)* doesn’t overflow a

*size_t*. If eCv were to assume that the number of bytes in any array cannot exceed

*– as is the case for many implementations – then it could infer that*

**maxof**(size_t)*dst.lim <=*and we would not need the precondition. Unfortunately, the C standard makes no such promise.

**maxof**(size_t)/**sizeof**(int)The call to *memcpy *involves implicit conversions of
its first two parameters from * int** to

*and from*

**void****to*

**const int****. In general, pointer conversions should be avoided when writing for eCv, because such conversions break eCv’s strong type model. However, eCv knows enough about the semantics of*

**const void****memcpy*to prove the 10 verification conditions generated by this version too.