AWS, Google Cloud and Azure offer disaggregated accelerators to their IaaS cloud customers. What does this mean and why is it important?
To review, an instance type describes a group of identical physical servers. Type defines basic configuration and regional availability for a class of virtual servers at a cloud provider. Type also specifies non-configurable specifications, such as processor vendor, brand, model, speed and maximum available cores.
Instance type sizes define configurable specifications, such as the exact regional availability and pricing for rentable virtual machines, called VMs. We call each runtime VM an instance, which is where all this cloud naming complexity began.
Some cloud instance types include dedicated accelerators. Dedicated accelerators are compute acceleration chips, such as GPUs and FPGAs, that have a hardwired connection to the type’s processors.
That means each dedicated accelerator type offered to customers must have specific accelerator hardware installed in each regional cloud data center where the type is offered. The hardware capital expense for the accelerator chips used in a dedicated accelerator type is permanently tied to the rest of that server’s hardware expense.
Dedicated accelerator sizes are priced based on the model and number of accelerator chips included.
Liftr Insights uses the term “disaggregated” to describe connecting at runtime a processor-only instance running in one physical server to an accelerator-only instance running in a different physical server. The two instances are connected at runtime using a cloud data center’s network.
Separating processor from accelerators over a network gives AWS and GCP a different economic basis to provide compute acceleration to customers. It decouples new processor innovation and generational buying from new accelerator innovation. So, AWS and GCP can offer the latest GPU or custom accelerator products to cloud customers without tying those purchases to specific processors.
Disaggregation, by definition, gives AWS and GCP the ability to offer a lot of flexibility to customers in mixing and matching processor-only and disaggregated acceleration types. However, there are some downsides to disaggregation.
Clouds offering accelerator disaggregation must write network drivers to dynamically bind each model of accelerator to each type of processor-only server. In practice this means that AWS and GCP currently will only bind disaggregated accelerators to Intel Xeon processors, which reduces the complexity of writing and maintaining those drivers. But that limits customer choice of processor-only instances as AMD EPYC-based instance types become more popular.
Disaggregated accelerators also have different performance characteristics from dedicated accelerators. For workloads that depend on tight coupling and fast response times between processor and accelerator to get better performance, network-induced communications latencies to and from a disaggregated accelerator may be a deal-breaker.
Alibaba Cloud does not yet offer disaggregated accelerator instance types. Azure has previewed five “Brainwave” neural network model accelerator instance types, but they are more PaaS than IaaS, as they run fixed models. AWS recently expanded the number of types it offers in its Elastic Inference service, currently powered by unspecified GPUs. AWS also pre-announced plans to launch its own Inferentia deep learning inferencing chip into the service by the end of 2019.
Liftr Insights tracks processor-only, dedicated accelerator and disaggregated accelerator instance type and sizes at the top public clouds worldwide.
Liftr Cloud Components Tracker:
Liftr Cloud Regions Map:
Follow Us On:
0 Comments