Perhaps there is some reason to provide a mechanism for asserting, in a particular patch of code, that the compiler is free to make optimistic assumptions about the kinds of aliasing that can occur. I don’t know any acceptable way of changing the language specification to express the possibility of this kind of optimization, and I don’t know how much performance improvement is likely to result. I would encourage compiler-writers to experiment with extensions, by #pragma or otherwise, to see what ideas and improvements they can come up with, but I am certain that nothing resembling the noalias proposal should be in the Standard. – Dennis Ritchie, 1988

 

From the beginning of the C standards process, the standards committee has wanted to  break C pointer semantics in the cause of never quantified compiler optimizations.  In C, pointers are memory addresses with types to permit both pointer arithmetic scaled to the size of the data type and checking for unintentional misapplication – such as assigning the result of dereferencing an integer pointer to a structure.  Pointer aliasing (multiple pointers addressing  the same or overlapping memory objects) and “type punning” (pointing to the same memory block with pointers of different types) are fundamental operations in C. For example, the C library sorting utility relies on converting pointers to an array of some type to pointer “void *” so it can sort arrays of abitrary type.  Similarly the C library memory allocators return pointers of type “void *” that are generally converted to non-void typed pointers, e.g. “struct mystruct *m = malloc(sizeof(struct mystruct))”. In a sequence, “x  = malloc(n); … free(x)… y = malloc(m) …” that same pointer value may be reused multiple times to point to objects of different type. The memory copy utility is passed “void *”  pointers too, and then internally decides where it should copy byte by byte or using larger chunks of memory.

Compiler developers prefer to limit both pointer alias  and  type-punning.  The “noalias” discussion was ostensibly driven by the difficulty of vectorizing code given overlapping pointers – an issue that has been considered in FORTRAN almost since the dawn of time.  Essentially pointer-aliasing creates a classical consistency issue, like ones we have in distributed computing. Suppose we have code:

 *x += *y; *(x+1) += *(y+1)

Given a two element vector load, store, and addition instructions, the compiler could potentially get this done in parallel: maybe
“load2 v,x” to load *x and *(x+1) into the vector register and “add2 v,y” for the addition of (*y,*(y+1))  and “store v x” would complete it. But maybe the bad programmer has an overlapping alias and x = y+1. If y points to [10,20,30] then the step-by step produces [10,30,20] and then [10,30,50] but the parallel operation produces [10,40,40].  This gets more depressing for large scale matrix operations – which is where it was first encountered in FORTRAN.  So compiler writers would like to have unaliased pointers, but in C, aliased pointers are very useful. To move all the elements of a string left by one place is a simple application: “b = c+1; while( (*c++ = *b++))”.   Another obvious example is computing a checksum for a packet. Instead of worrying about the possibly complex structure of the packet, the programmer might want to think of it as a sequence of 64bit unsigned ints and just xor them, but this requires pointers to overlapping memory object that don’t have the same type. So these kinds of pointer operations are useful and customary in C code and, to make it more peculiar, the claimed optimizations are of such marginal utility and so dangerous that the Linux project specifically disables them (fno-strictalias).

If the only reason is to permit compilers to use vector instructions in loops, it seems like a simple opt-in could be used. C restrict is a solution, even though the standard description of it is terrible. The only other substantive optimization found in examples is where the compiler can substitute “memcpy” (which assumes non-overlapping pointers) for a loop, but surely that’s something the programmer should do where it’s an advantage.  Maybe there are other reasons for wanting unaliased pointers (the examples in Regehr’s post are not all that compelling)  and someone can tell me about them.

The compiler developers and standards authors, however, don’t want to leave it to the programmers. Instead they have made use of a loophole created in C99 which can be read as permitting compilers to do anything at all in the presence of “undefined behavior” and another change to the C standard that allows pointer casting and aliasing to be treated as undefined behavior.

Anyone who thinks the C Standard has a clear or even excusable model of type-punning and pointer alias could read this horrifying discussion    which ends with the understatement “The conclusion for the de facto semantics here is not completely clear. ” ( via @comex in twitter)  or look at John Regehr’s authoritative post with the even more understated title:

The Strict Aliasing Situation is Pretty Bad

And now, we can see that the standards committee is attempting to go way beyond these to make pointer rules even more opaque.

See this for a more optimistic take.
Pointer alias analysis in C
Tagged on: