Pointers draft

Despite pedants, pointers are useful and interesting and if you don’t understand them, you don’t really get what’s going on in a computer. Think of memory as a function $M: N\to N$ where $N$ is the set of non-negative integers. When we build physical memories, there are limits on both domain and range, but let’s ignore that for now. We can call the elements of the domain “addresses” and the elements of the range “contents”, but  they are both just non-negative integers.   A list of numbers, maybe phone numbers, could be in memory as a sequence. Perhaps $M(x_0), M(x_0 +1) \dots M(x_0 +k)$ is the list.

 x0 2125551212 x0+1 3015551212 x0+2 7125551212 … x0+k 5125551212

Alternatively, that data could be scattered around memory in pairs of phone number and address of next phone number. Address and contents are just numbers. The memory may have contents $M(x_0)$ that is an address of other contents: $M(M(x_0))$ is a perfectly reasonable expression. Here we are using the contents at one address to “point to” contents at another address. If $y_0$ is the address of the first such pair, the next pair is at $y_1 = M(x_0 +1)$ and the next after that is at $y_2 = M(x_1 +1)$. To make this more comprehensible define $Y(0) = y_0$ and $Y(n+1) = M(Y(n) +1)$ Then $Y(i)$ is the address of the $i^{th}$ pair and the $i^{th}$ phone number is $M(Y(i))$. The list of phone numbers is $M(Y(0)),\dots M(Y(k))$. It’s easy to screw this up: if $M(Y(i)+1)$ is not have the right address of the next element, following the chain could end up making a random tour of memory that would pull up who-knows-what as phone numbers.

How business works in the real world: Apple and GT and Beelzebub

GT Advanced declared bankruptcy and blamed Apple for its problems. Apple called GT Advanced’s story “defamatory”.  I have no idea about the specifics in this case but I do know about  big companies pushing insanely onerous and self-defeating terms on small ones.  Here’s the original claim by GT Advanced:

At the start of negotiations, Apple offered to buy 2,600 sapphire growing furnaces from GT Advanced, which GT Advanced would operate on behalf of Apple, the “ultimate technology client to land,” according to Squiller.

“In hindsight, it is unclear whether Apple even intended to purchase any sapphire furnaces from GTAT,” he wrote.

But after months of hard negotiating, Apple offered a deal under which it would shift away economic risk by lending GT Advanced the money to build the furnaces and grow the sapphire, and then sell it exclusively to Apple for less than market value, Squiller wrote.

GT Advanced was effectively forced to accept the unfair deal in October 2013 because its intense negotiations with Apple had left it unable to pursue deals with other smartphone makers, he said.

Back when we were selling a real-time OS, I contacted an ex-boss who now had a high position in a telecommunications company to see if he could help us sell into it. His response was “don’ t touch this place, it is expert in destroying small vendors.” And, whatever the actual story with GT and Apple, the storyline is not at all unusual. The elements of a smaller company spellbound by prospects of a huge deal/giant customer, followed by time consuming negotiations, followed by onerous demands – been there, done that. We once were negotiating with a huge semiconductor company about a big deal that, over time, got worse and worse for us. We “finalized” with some terms that we thought might be survivable. And then the semiconductor company negotiators, one of whom by this point our negotiators were privately calling “Beelzebub”, announced they had to take the deal “to management” and came back with much more absurd demands. We were able to walk away but we saw other small companies make deals with Beelzebub and then fail. There is a strong impulse in some big companies, among some business units, to squeeze small company vendors way too far.

Once, after a sales visit to a big Wall Street Firm, two experienced sales people for a second supplier told me they were shocked by what I had said to the customer and advised me never to make the same error. What was my mistake? I had told the customer that we had produced a product that worked a lot better than what they were currently using, cost them a lot less, and was highly profitable to us. “Never do that”, counseled my colleagues, “they want to believe that, at best, you are breaking even on the deal, otherwise they think they left money on the table.”

The absurdity of these kinds of “negotiations” is that they are highly unprofitable for big companies. The potential savings are usually negligible for the bigger company. Negotiating purchase agreements is expensive for big companies. If they are even in the position of dealing with a small vendor, there must be some compelling business reason for getting whatever the small vendor is selling. This also means that a deal that would be unprofitable, perhaps damaging, to the vendor would put a critical part of the supply chain at risk for the purchaser. But instead of closing the deal and moving on, some purchasing groups in some companies want to grind the vendor down to “show value” to their management or maybe even just out of habit. Sometimes this requires the smaller company to walk away, sometimes to go over the heads of the purchasing group to business units waiting for the product. We’ve probably lost some deals that would have worked out well in the end by refusing to accept onerous terms, but we’ve also walked away from deals that would have been fatally unprofitable. The good thing is that walking away from an overbearing big customer is usually a good starting point for a sales effort aimed at its competitors.

Telecom is 10 years behind wall street on time synchronization

This is an interesting paper, but Telecom has not yet come to grips with the problems and advantages of fast shared commodity ethernet interconnect.

North American service providers are in the process of upgrading their radio access networks with next generation LTE equipment. They arefinalizing a 4G rollout that involves highly stringent timing requirements, but in many cases theyare relying on sole-source synchronization byusing Global Navigation Satellite System (GNSS). Natural occurring disturbances, as well as unintentional radio frequency jamming, intentional jamming, and spoofing, make GNSS vulnerable to interference.

This article presents a novel approach for addressing the issue of GNSS vulnerability by introducing a standard means of providing a redundant packet-based synchronization source for LTE base stations. It also describes how this new approach can mitigate noise caused by asymmetry and transit delay variation in packet networks.

Project Roseline

Accurate and reliable knowledge of time is fundamental to cyber-physical systems for sensing, control, performance, and energy efficient integration of computing and communications. This simple statement underlies the RoseLine project.  Emerging CPS [Cyber Physical Systems – vy]  applications depend on precise knowledge of time to infer location and control communication.  There is a diversity of semantics used to describe time, and quality of time varies as we move up and down the system stack.  System designs tend to overcompensate for these uncertainties and the result is systems that may be over designed, in-efficient, and fragile.  The intellectual merit derives from the new  and fundamental concept of time and the holistic measure of quality of time (QoT) that captures metrics including resolution, accuracy, and stability.

The project will build a system stack that enables new ways for clock hardware, OS, network services, and applications to learn, maintain and exchange information about time, influence component behavior, and robustly adapt to dynamic QoT requirements, as well as to benign and adversarial changes in operating conditions.  Application areas that will benefit from Quality of Time will include: smart grad, networked and coordinated control of aerospace systems, underwater sensing, and industrial automation.  The broader impact of the proposal is due to the foundational nature of the work which builds a robust and tunable quality of time that can be applied across a broad spectrum of applications that pervade modern life. Roseline.

Why is computer science education so horrible

This is a post about CS education. It is prompted by a seriesofposts by Mark Guzdial in which he criticizes the pervasive belief among CS educators that when it comes to programming, there’s not much an instructor can do: some students get it, others don’t; it’s all in genetics or other external factors that CS educators can’t influence (aka the Geek Gene Hypothesis). I am with Mark on this, but I feel a bit stronger about it than he does. I think that attitude is bullshit, and the studies supporting it are unsound by means of making conjectures ignoring an enormous number of confounding factors.  Cristina (Crista) Videira Lopes

A lot of it is not even teaching method, it is  attitude.  When CS was “professionalized” in the 1970s, all the stupid gatekeeper ambitions and pointless competition from the traditional “hard sciences” and engineering were imported.  When I taught intro CS, I found that the most important part of my job was sustaining the morale of the students who came in without programming background (and many bad ideas).

Harvey Mudd, as Vidiera points out, has actually tried to address these problems and

Our instructors had private conversations with students who were using up a disproportionate amount of air time in class talking about arcane details and asked them to have those conversations with the instructors in private because other students found their level of knowledge intimidating. [US News]

Not being as nice, smart, or hardworking as the people as Harvey Mudd, I used to just tell those students to shut up and learn something in my class. And I spent a lot of time telling other students not to worry, that the “experts” didn’t actually know anything near what they thought they did. The common attitude among CS teachers, who are generally former arcane-air-time students themselves, is quite different.

JP Morgan Security Breach

“Faced with the rising threat of online crime, JPMorgan has said it plans to spend \$250 million on digital security annually, but had been losing many of its security staff to other banks over the last year, with others expected to leave soon.” – New York Times.

Why is it that NY banks can spend so much money on computing infrastructure and find themselves short of actual talent so frequently? There are very sharp people working in finance technology, but …

Paths versus Recursion

/*
* Iterative DepthFirst file list (c) Victor Yodaiken 2013
*
* "Not the way we do it in Brooklyn" - Dave "Kinch" Arnow.
*
*
* Data structure is P - the current path, with some aux data
*
* Two basic operations:
* 1) Lp(P) - starts at path P and extends it to the leftmost
* reachable file/directory
* 2) IterateDF(P) iterates by advancing a path to the next in depth
* first order
*
* So program is
*
* Initialize(P);
* do{
* PrintPath(P);
* }while(IterateDF(P) != EOF)
*
*
* Horribly inefficient - can be cured by caching positions in directory
* entries.
*
* */
Link to Code

What happens when you do not use enterprise quality technology.

News from last year.
Aug. 26, 2013 11:52 a.m. ET

A glitch in time synchronization caused German exchange operator Deutsche BoerseAG DB1.XE -0.16% ‘s Eurex Exchange arm to halt trading for slightly more than an hour early Monday, the latest in a thread of technical issues at global exchanges.

The Eurex derivatives trading market was halted at 0620 GMT, 20 minutes after trading started, “in order to protect the integrity of the market,” Deutsche Boerse said.

“An incorrect time synchronization within the system” triggered the market halt. The problem was solved, pre-trading started at 0720 GMT and, as of 0730 GMT all products were again tradable, the exchange said.

A person close to the matter said the glitch was caused by a problem with the system clock, not extreme data load, noting that current trading volumes are far below previous peaks.

Wall Street Journal

From what we understand, the problem was due to a number of GPS time servers that dropped leap year adjustments, so jumping the time back 36 seconds. Then PTP2 software adjusted server time to jump backward 36 seconds. It’s not the first failure of this sort (or the last). The legacy technology for time synchronization lacks the cross check, failover, and alerting that are built into TimeKeeper. That technology was not designed with enterprise in mind and although it has been heavily modified over time, it is very difficult to engineer a solution to basic architectural mismatch. For example, TimeKeeper is architected to treat all time sources the same, but software like PTPd and NTPd is designed for a specific protocol. So TimeKeeper can use a feed from one protocol to crosscheck another, but that requires a clumsy grafting process in one of the protocol specific time synchronization programs.  As another example, the PTP standard has a completely inappropriate fault-tolerance technique baked into the standard – a technique that is a holdover from the PTP standard origin in device control. The standard was designed for systems with really simple networks and time consumers. A single cable with some welding machines on the end is perhaps a typical intended use case. The idea was that the time sources would advertise how good they were and the consumers would simply pick the one that said it was the best. This is an absurd policy for an enterprise network with super-powerful server computers receiving time packets across complex networks dotted with routers and switches. TimeKeeper was designed to ignore this “best master” protocol for fault tolerance and to utilize the smarts of the consumers to detect problems in the feed and to select among alternatives.

Fischer Lynch Patterson and timeouts

There is a widely cited (over 1400 cites in CiteseerX ) result called the Fischer-Lynch-Patterson theorem about consensus – a key issue in distributed databases or any system where data is either distributed or replicated or both.

In this paper, we show the surprising result that no completely asynchronous
consensus protocol can tolerate even a single unannounced process death. We do
not consider Byzantine failures, and we assume that the message system is reliable it
delivers all messages correctly and exactly once. Nevertheless, even with these
assumptions, the stopping of a single process at an inopportune time can cause any
distributed commit protocol to fail to reach agreement. Thus, this important
problem has no robust solution without further assumptions about the computing
environment or still greater restrictions on the kind of failures to be tolerated!
Crucial to our proof is that processing is completely asynchronous; that is, we
make no assumptions about the relative speeds of processes or about the delay
time in delivering a message. We also assume that processes do not have access to
synchronized clocks, so algorithms based on time-outs, for example, cannot be
used. (In particular, the solutions in [6] are not applicable.) Finally, we do not
postulate the ability to detect the death of a process, so it is impossible for one
process to tell whether another has died (stopped entirely) or is just running very
slowly.

The Wikipedia summary is similar:

In a fully asynchronous system there is no consensus solution that can tolerate one or more crash failures even when only requiring the non triviality property.[11] This result is sometimes called the FLP impossibility proof. The authors Michael J. Fischer, Nancy Lynch, and Mike Paterson were awarded a Dijkstra Prize for this significant work. The FLP result does not state that consensus can never be reached: merely that under the model’s assumptions, no algorithm can always reach consensus in bounded time. In practice it is highly unlikely to occur.

As far as I can tell the “surprising” result is that unbounded delays are not bounded. Without timeouts, we can not distinguish between a site B that will eventually send a message to site A and a site B that has failed (crashed) and will never send a message.

Why is clock synchronization so important in big data

Distributed transactions have historically been implemented by the database community in the manner pioneered by the architects of System R* [22] in the 1980s. The primary mechanism by which System R*-style distributed transactions impede throughput and extend latency is the requirement of an agreement protocol between all participating machines at commit time to ensure atomicity and durability. To ensure isolation, all of a transaction’s locks must be held for the full duration of this agreement protocol, which is typically two-phase commit.

The problem with holding locks during the agreement protocol is that two-phase commit requires multiple network round-trips between all participating machines, and therefore the time required to run the protocol can often be considerably greater than the time required to execute all local transaction logic. If a few popularly accessed records are frequently involved in distributed transactions, the resulting extra time that locks are held on these records can have an extremely deleterious effect on overall transactional throughput. [ Calvin: Fast Distributed Transactions for Partitioned Database Systems. Alexander Thomson et al]

This is one reason why we have all the new databases without transaction support – because high transaction rates in a distributed environment (e.g. web or IOT applications in the Cloud) cannot be scaled in face of lock overhead.