Doing it wrong: Poul-Henning Kamp

Kamp writes to explain that virtual memory behavior can be important for program performance. That is, he explains

From the programmer’s standpoint, the working set of information is the smallest collection of information that must be present in main memory to assure efficient execution of his program.

But that note is actually from Peter Denning’s classical article (http://portal.acm.org/citation.cfm?doid=363095.363141 and http://cs.gmu.edu/cne/pjd/PUBS/WSModel_1968.pdf ) published in 1968 in Communications of the ACM – the same publication in which Kamp’s article appeared more than 40 years later. Kamp’s main result is that a search algorithm over a very large set of objects will perform poorly if it encounters a large number of page faults (if the working set is too large for the physical memory allocated to the process) ! Not since a researcher  explained during a talk at a serious academic conference that memory speeds could rival or outweigh processor speed in determining performance have I been so shocked but since that researcher was from Harvard, he had an excuse . The publication of Kamp’s article in ACM Communications shows something about the culture of computer science as an academic field these days.

Kamp’s article is easy to mock but, at least he’s doing some performance analysis and measurement – things that are definitely out of fashion in the field.  And he cares about performance.

What good is an O(log2(n)) algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n^2) algorithm, which avoids page faults, will run circles around it.

Indeed.

A classic blunder in optimization was committed some years ago by one of the major software vendors. We had their first interactive timesharing system, and it was a “learning experience” in a variety of ways. One such experience was that of the FORTRAN compiler group. Now any compiler writer knows that the larger the hash table you use for a symbol table, the better your performance on lookup will be. When you are writing a multipass compiler in a 32K mainframe, you end up using a relatively small symbol table, but you create a really, really good hash algorithm so that the probability of a hash collision is reduced (unlike a binary seach, which is log n, a good hash table has constant performance, up to a certain density, so as long as you keep the density below this threshold you might expect that you will typically have an order 1 or 2 cost to enter or lookup a symbol, on the average. A perfect hash table (which is usually precomputed for constant symbols), has a constant performance between 1.0 and 1.3 or thereabouts; if it gets to 1.5 you rework the hashing to get it lower). So anyway, this compiler group discovers that they no longer have 32K, or 128K, or 512K. Instead, they now have a 4GB virtual address space. “Hey, let’s use a really big hash table!” you can hear them saying, “Like, how about 1 MB table?” So they did. But they also had a very sophisticated compiler technology designed for small and relatively dense hash tables. So the result was that the symbols were fairly uniformly distributed over the 256 4K pages in that 1MB, which meant that every symbol access caused a page fault. The compiler was a dog. When they finally went back to a 64K symbol table, they found that although the algorithm had poorer “absolute” performance from a purely algorithmic viewpoint (taking many more instructions to look up a symbol), because it did not cause nearly as many page faults, it ran over an order of magnitude faster. So third-order effects do matter. http://www.flounder.com/optimization.htm

You also have to worry about that constant factor hidden in the Big-O notation.

What’s interesting is how basic knowledge in computer science keeps being lost. There is no culture of measurement, education ignores basic technology, and engineering methodologies are discarded.