An anonymous reader quotes a report from Ars Technica: On Thursday, an Amazon AWS blogpost announced that the company has moved most of the cloud processing for its Alexa personal assistant off of Nvidia GPUs and onto its own Inferentia Application Specific Integrated Circuit (ASIC). Amazon dev Sebastien Stormacq describes the Inferentia’s hardware design as follows: “AWS Inferentia is a custom chip, built by AWS, to accelerate machine learning inference workloads and optimize their cost. Each AWS Inferentia chip contains four NeuronCores. Each NeuronCore implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps cut down on external memory accesses, dramatically reducing latency and increasing throughput.” When an Amazon customer — usually someone who owns an Echo or Echo dot — makes use of the Alexa personal assistant, very little of the processing is done on the device itself. […] According to Stormacq, shifting this inference workload from Nvidia GPU hardware to Amazon’s own Inferentia chip resulted in 30-percent lower cost and 25-percent improvement in end-to-end latency on Alexa’s text-to-speech workloads. Amazon isn’t the only company using the Inferentia processor — the chip powers Amazon AWS Inf1 instances, which are available to the general public and compete with Amazon’s GPU-powered G4 instances. Amazon’s AWS Neuron software development kit allows machine-learning developers to use Inferentia as a target for popular frameworks, including TensorFlow, PyTorch, and MXNet.

Read more of this story at Slashdot.

Read more