SCM disrupting data centers. Now CPUs could be bottleneck

Google data center
Google data center

It used to be computing and software development assumed CPUs are fast, storage is slow, and thus CPUs would spend time waiting for much slower hard disks. Architecture of data centers is based upon that assumption. However, this is rapidly changing. Storage Class Memories (SCM), the non-volatile flash memory found in smartphones, is now so fast it has to wait for CPUs. This is a classic case of technology disrupting existing systems.

SCM isn’t just a little bit faster. New SCM devices are 1000x faster than hard disks.

They are outstripping CPUs on performance improvements and are closing in on inverting the I/O gap, where storage devices struggle to keep CPU’s busy.

“Today’s PCIe-based SCMs represent an astounding three-order-of-magnitude performance change relative to spinning disks (~100K I/O operations per second versus ~100),” the authors state. “For computer scientists, it is rare that the performance assumptions that we make about an underlying hardware component change by 1,000x or more.”

SCM is expensive compared to hard disks, so the challenge to data centers now is they will need to keep them busy. This means vastly more CPUs. Hard disks can’t simply be replaced by SCM, because that means performance issues elsewhere. The entire system needs to be re-engineered.

1. The age-old assumption that I/O is slow and computation is fast is no longer true: this invalidates decades of design decisions that are deeply embedded in today’s systems.

2. The relative performance of layers in systems has changed by a factor of a thousand times over a very short time: this requires rapid adaptation throughout the systems software stack.

3. Piles of existing enterprise datacenter infrastructure—hardware and software—are about to become useless (or, at least, very inefficient): SCMs require rethinking the compute/storage balance and architecture from the ground up.