Time Synchronization in the Cloud

Synchronizing clocks in the cloud, especially for virtual machines, is way beyond the capabilities of ordinary synchronization methods. Tests show that virtual machines in the cloud relying on NTPd can fall off the reference time by tens of minutes over a single day. Even bare metal cloud platforms are significantly worse than dedicated machines.  This creates an interesting dynamic because distributed applications are becoming more and more dependent on tight time synchronization and there are a large number of existing applications that need at least millisecond level synchronization.  In the cloud environment, the design limitations of the alternative time protocol, PTP, mean it cannot be relied upon to give provide better performance either. PTP was initially designed to servos and data acquisition devices on a single shared ethernet – weaknesses like dependence on multicast (broadcasting)  and top  down failover methods are a problem in the general enterprise, but really become impediments in the cloud.  That’s why we took a semantic approach, fixing time distribution above the level of the protocol. The low level, bit level, packet level, cannot deliver the sophisticated time analysis, fault recovery, and management needed for complex environments like the cloud.