undefined behavior and the purpose of C

The C programming language is designed to let programmers work at both low and high levels. In particular, for applications like operating system development, cryptography, numerical methods, and network processing, programmers can go back and forth between treating data as representing abstract types and as machine addressable sequences of bits, between abstract operations and low level, machine dependent operations. The language has, by any account, been spectacularly successful and remains the essential systems programming language. Over 45 years, one would expect some limitations would show up, but C currently suffers not just from it’s own limitations but from the combination of being being profoundly out of fashion (see the somewhat mixed up critique here for an example) and the misguided bureaucratic attentions of the ISO standards process which has had a negative effect on the development of C compilers. At the heart of the problem is that the C standard process is muddying C semantics while also facilitating dubious “optimizations” from the larger compiler projects. The key aspect of these “optimization” is that they break C programs that work with lower levels of “optimization” or earlier The situation was concisely summarized by Linus Torvalds: Kunnamangalam “the C standard is _clearly_ bogus shit“. And Torvalds is not the only person to object.

An alarming CERT security alert was issued in 2008: “Some C compilers optimize away pointer arithmetic overflow tests that depend on undefined behavior without providing a diagnostic (a warning). Applications containing these tests may be vulnerable to buffer overflows if compiled with these compilers.”
Linux developers forced GCC and Clang compiler developers to provide an ad-hoc collection of compiler flags for an unstable non-Standard dialect of C that turn off some of the “optimizations” supposedly permitted by the C standard.
A warning from the chief Clang architect that compiler changes mean that “ where to buy Pregabalin in canada huge bodies of C code are land mines just waiting to explode” (followed later by a suggestion that programmers switch to other languages).
A paper from the foremost researcher in cryptography asking for a version of C that would be “safe” from dangerous “optimizations”.
A 2012 MIT article by Wang et al with a compendium of bizarre and unexpected undefined behavior “optimizations”
A polite and completely ignored academic proposal by Regehr et al for a “friendly” C compiler and standard.
A really excellent critical academic analysis, by M. Ertl also ignored, and see also this paper.
A graphics programmer practitioner critique by Eskil Steenberg.

As an example of what provoked these reactions, consider a small C program that, without a warning, when passed though the current gcc and Clang highest level optimizers can’t recognize a negative number as a negative number. The same program works correctly when compiled at lower levels of “optimization”, or with optimizations by earlier versions of clang and gcc or even currently by the Intel C compiler. And consider the following, idiomatic C code, transformed into an infinite loop that never gets to emergency shutdown.

//keep opening valve until pressure down or do emergencyshutdown
for(int i=0; i >=0; i++){ //try it a bunch of times
     if ( getpressure() > DANGER)openvalve();
     }
if (getpressure() > DANGER)emergency_shutdown();

You can see how this works in the gcc8 generated machine code for the x86 – a processor family that natively wraps arithmetic overflow to do the right thing. The increment, the check, and the call to emergency shutdown are all deleted by the compiler – as an “optimization”.

/Machine code output of GCC8 with O3 optimization level.
f:
        sub     rsp, 8
.L4:
        call    getpressure
        cmp     eax, 10
        jle     .L4
        call    openvalve
        jmp     .L4

Of course, this code is perhaps not the best C code, but it accords with the customary and expected behavior of the language over decades of practice. Earlier versions of Gcc compile the code correctly as does the Intel compiler, even later versions compile it correctly unless the higher optimizer levels are invoked. So tested, working safety critical code, moved to a new version of the same compiler will fail with no warning (whoops, sorry about the refinery explosion). And all this apparently meets the requirements of the C standard.

The ISO C standard committee has spent decades on a botched effort to strengthen C type system in a way that has produced a contradictory and complex set of rules under which a vast body of C code can be considered to embody “undefined behavior” (often with no remedy in the Standard). Not only is is not possible to access “device memory”, but it is impossible to write a memory allocator let alone a virtual memory system, check for arithmetic overflow, reliably check for null or invalid pointers, checksum network packets, encrypt arbitrary data, catch hardware interrupts, copy arbitrary data structures, use type punning, or, most likely, to implement threads. Application programs written in C face the same problems and worse. For example, the standard POSIX function “mmap” appears to be intrinsically “undefined behavior”. This only going to get worse. For example see recent proposals to add pointer “provenance” to the standard.
The primary open source compiler developer groups have seized on the sloppy definition of undefined behavior in the Standard to justify a series of increasingly dangerous silent code transformations that break working code. For reasons explained below, I will call these transformations Beyond Guideline Transformations (BGTs).

The key passage in the C11 Standard is the following (my highlight in red)

undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.

The compiler developers have seized on “imposes no requirements” to claim a license for arbitrary program transformations in the presence of “undefined behavior”. As pointed out by Eskil Steenberg the C99 standard changed the wording of this provision to “Possible” where the C89 standard used the word “permissible” – which would imply that the compilers must restrict themselves to the range of behaviors. The C99 Standard also moved the sentence on the range of behaviors to a note so which in the arcane world of ISO standards, transforms a rule into a “non-normative” suggestion. This has been taken to license reasoning such as: since overflow of signed integers is “undefined behavior” and the standard imposes no requirements, the compiler can assume that signed overflow never happens and delete a test for overflow as an optimization. GCC/Clang developers, with the tacit acquiescence of the ISO standards committee, now claim the compiler is free to assume undefined behavior “can’t happen” and that the compilers can silently remove programmer checks for it – even while generating code that produces the condition. This reasoning, such as it is, clearly violates the Standard definition of “optimization”, but nobody seems to care.

Clang architect Chris Lattner believes there is nothing the compiler developers can do:

The important and scary thing to realize is that just about *any* optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will keep getting better, and a significant part of their reason for existing is to expose secondary optimizations like the ones above. To me, this is deeply dissatisfying, partially because the compiler inevitably ends up getting blamed, but also because it means that huge bodies of C code are land mines just waiting to explode.

You’d imagine that faced with warnings like this, complaints from influential C practitioners like Torvalds and DJB, a CERT advisory, and a number of academic papers, there would be some concern at the WG14 committee that develops the C standard – but no.

When I visited the ISO WG14 C Standards committee meeting in Pittsburgh 2018 to ask for some corrective action, I found a peculiar lack of urgency. The most interesting comments were from a long-time committee member who seemed to believe that C had become a hopeless cause and from another attendee who suggested that fixing contradictions in the standard was not urgent because there are so many of them.

I refer to these strange transformations as “Beyond the Guideline”, because they violate the range of possible/permissible behaviors in the standard definition of undefined behavior.

At one time, it was generally understood that the undefined status of signed integer overflow meant that the native arithmetic operations of the target processor determined the result: so that often “i+ j” in C can be compiled to a single machine instruction. If the result of the addition overflows the processor representation of an signed integer – anything could happen -the burden was on the programmer, processor, and execution environment to detect and handle the problem. This is either “ignoring the situation” or “behaving during [..] program execution in a documented manner characteristic of the environment” : the compiler was not required to do anything about the overflow. In fact, the standard modern processors roll over as expected, although there are some processors that generate traps and some embedded devices with things as odd as saturating arithmetic. None of that is a problem under the customary compilation of the language. The compiler can just leave it up to the underlying machine, the execution environment, and the programmer to sort out. The “beyond the guideline” (BTG) approach is relatively new and is still unknown to many practitioners and involves major conflicts with the ANSI C definition in the second edition Kernighan and Ritchie book: which is the definitive specification . You can follow Linux’s lead and block the overflow “optimization” with “frwapv” which forces wrapping semantics (in some places) but that’s not even necessarily what programmers want and there are many other places where C code is vulnerable to similar transformations. For example, in machines with linear addressing, it is not unusual to check a pointer is in a range with code like:

if( p < startptr || p >= endptr)error()

This code is straightforward and easy to compile on the dominant processor architectures of the day, ARM, x86, MIPS, and similar. It might be a problem on, say, segmented architectures or elaborately typed architectures such as the not lamented Itanium. It’s also possible that, depending on how the pointer “p” is generated that it might violate complex, contradictory, and hard to understand type rules for pointer comparisons and fall into the “undefined behavior” bucket. In that case, the compiler developers claim that they can silently delete the check – after all, undefined behavior “can’t happen”. All the code that depends on these types of checks, checks that are reasonable C code and that have worked for decades, may encounter an unpleasant surprise at any future time when the compilers get around to “optimizing” those checks away. And, as usual, there is no guaranteed way in the Standard to “fix” this check. Suggestions involving converting pointers to integers run into other undefined behavior and the optional status of the required data type.

Since common sense has not sufficed, both the compiler development groups and the Standards need an intervention. The decision to ignore the guideline is easily repaired, mostly by reversing the wording and formatting changes.

undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements other than that : Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Undefined behavior is common in C programs and translators should document operation in the presence of undefined behavior. It is not acceptable to change the semantics of correct program structures due to undefined behavior elsewhere.

I’d add the following note as well

NOTE: Undefined behavior, particularly, due to non-portable constructs is a normal component of C programs. Undefined behavior due to errors that the translator can detect should produce a diagnostic message.

One of the common arguments made by some of the compiler developers who champion BTG transformations is based on a false dichotomy: that without license for BTG transformations, compilers would have to provide an intrusive run-time that guaranteed predictable results and the Standard would have to spell out every possible behavior. This speaks to a fundamental misunderstanding of the purpose and use of C.

How can the compiler distinguish between an erroneous read of uninitialized memory which will produce unspecified results (or even a trap on some architectures) and a syntactically equal chunk of code that reads a memory mapped device control register which cannot be written? The designers of C solved the problem by leaving it up to the programmer and the execution environment: the compiler produces a load operation, the semantics are not its problem – they are “undefined behavior”.
How can the compiler or standard distinguish between a mistaken use of an integer pointer to access a floating point number and a numerical methods expert’s shortcut divide by 2 which involves changing the binary representation of the exponent? That’s not the responsibility of the compiler.
How can the compiler tell the difference between a null pointer dereference in user space where it will, usually, cause a trap on memory mapped systems, and one where the zero page is accessible memory? It cannot: the programmer and operating system can handle it. The best the compiler can do is warn.

What would be nice is if the compilers could warn about possible errors and if the Standard provided opt-in methods of locking down access methods and preventing common errors like array boundary and type violations. C developers increasingly rely on sophisticated static analysis tools and test frameworks to detect common errors and have embraced methodologies to limit the kinds of haphazard programming that was common earlier. But all of that is very different from retroactive, silent, language changes based on opaque and often ill considered rules.

The developers of the C Standard were forced to add a hurried character pointer exception to their type rules when they discovered they had inadvertently made it impossible to write the fundamental memcpy function in C. The exception makes the type rules ineffective while still being cumbersome ( Brian Kernighan made exactly this criticism of Pascal decades ago). And it certainly did not fix the problem. For example, the Standard only makes malloc/free/calloc and similar work within the type system by treating them as special exceptions to the type rules (in a footnote, no less) and apparently nobody realized that those functions could not then be implemented in Standard C.

Lattner is absolutely correct about the “land mines” which this approach to “optimization” produces but he is wrong to imply that the compilers are required to adopt the BTG approach to undefined behavior or that the code triggering undefined behavior is necessarily “buggy”. C compilers could optimize on undefined behavior, as they always have, without violating the guidelines suggested in the standard: as Intel’s ICC still does. The choice to go beyond the guidelines is an engineering choice.

The invisible advantages of BTG optimizations

Some of the defenders of BTG optimizations argue that code producing undefined behavior is not actually C code and so doesn’t have a meaning. That kind of pedantic legalism has no support in the history of the language, the behavior of compilers (which have no trouble compiling and producing expected behavior from the same code prior to “optimization”) or the language of the standard – it’s just a pompous way of excusing crappy engineering. The C-Standard is a train-wreck on its own, but it does not mandate the bizarre paradoxical undefined behavior GCC and Clang are choosing to haphazardly implement.

The supposed motivation for BTG transformations is that they enable optimizations. There are, however, only a bunch of dubious anecdotes to support the claim of great optimizations – not a single example of careful measurement. And there are studies that show the opposite. Here’s the justification from one of the LLVM developers:

This behavior enables an analysis known as “Type-Based Alias Analysis” (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code. For example, this rule allows clang to optimize this function:
float *P;
 void zero_array() {
   int i;
   for (i = 0; i < 10000; ++i)
     P[i] = 0.0f;
 }
into “memset(P, 0, 40000)“. This optimization also allows many loads to be hoisted out of loops, common subexpressions to be eliminated, etc. This class of undefined behavior can be disabled by passing the -fno-strict-aliasing flag, which disallows this analysis. When this flag is passed, Clang is required to compile this loop into 10000 4-byte stores (which is several times slower), because it has to assume that it is possible for any of the stores to change the value of P, as in something like this:
int main() {
  P = (float*)&P;  // cast causes TBAA violation in zero_array.
  zero_array();
}
This sort of type abuse is pretty uncommon, which is why the standard committee decided that the significant performance wins were worth the unexpected result for “reasonable” type casts.

This is a ridiculously contrived example, of course. The program, as written, makes no sense and will crash on any conceivable system. If not setup to fail, any half-decent C programmer could have used a local variable parameter for the pointer (in which case -fno-strict-aliasing has no effect) or called memset directly. These are the kinds of things that C programmers find and fix using profiling. Oddly, the compiler misses the big optimization, which is to do nothing – since nothing is done with the array before the program exits. So here we have the usual elements of an example of why C compilers need to make “unexpected” BTG results out of undefined behavior

A minor or unnecessary optimization
of a contrived example that shows nothing about actual applications
that is possibly a undefined behavior optimization but not a BTG transformation
that is accompanied by absolutely zero in the form of measurement
that ignores actual optimization possibilities.

A talk by Google’s Clang expert, Chandler Carruth on this topic has an even worse example. Carruth gives an example of array indexing that has all of the first 4 characteristics above, plus the benefit that if you look at the generated code, the supposedly much better form taking advantage of undefined behavior for optimizations is not really any better than the original. He also explains how the compiler cannot determine whether programs are correct (!) and makes an unfortunate analogy between compiler behavior and APIs. Imagine if operating systems, for example, were to caution developers that “anything can happen” if the parameters passed to an OS system call are not as expected. Programming languages should minimize surprise.

In the real world, the justification for BTG transformations is convenience for the compilers – which are increasingly driven by not only artificial benchmarks, but by their support for C++ and Java which are languages that have very different semantics. Operating on intermediate codes that are stripped of C specific semantics, it is difficult for the compilers to safely apply optimization passes without making dangerous C transformations. Rather than addressing the weakness of this internal representation of code, it’s easier to find loopholes that allow the customary semantics to be ignored.

C is not Java

Here’s something more from Lattner.

Violating Type Rules: It is undefined behavior to cast an int* to a float* and dereference it (accessing the “int” as if it were a “float”). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results. The rules for this are quite nuanced and I don’t want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc). This behavior enables an analysis known as “Type-Based Alias Analysis” (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code.

To me: “The rules for this are quite nuanced and I don’t want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc)” means, “we mucked up the standard and we are going to cause many systems to fail as these nuanced rules confuse and surprise otherwise careful and highly expert programmers”. Unexpected results are a catastrophe for a C programmer. Limitations on compiler optimizations based on second guessing the programmer are not catastrophes ( and nothing prevents compiler writers from adding suggestions about optimizations). Programmers who want the compiler to optimize their algorithms using clever transformations should use programming languages that are better suited to large scale compiler transformations where type information is clear indication of purpose. As Chris Lattner notes:

It is worth pointing out that Java gets the benefits of type-based optimizations without these drawbacks because it doesn’t have unsafe pointer casting in the language at all.

Java hides the memory model from the programmer to make programming safer and also to permit compilers to do clever transformations because the programmer is not permitted to interact with the low level system. Optimization strategies for the C compiler should take into account that C does not put the programmer inside a nice abstract machine. The C compiler doesn’t need to force C to become a 5th rate Pascal or Swift. Unexpected results are far more serious a problem for C than missed minor optimizations.

undefined behavior and the purpose of C

Tagged on: C clang gcc undefined behavior