Three years ago, at the SC14 supercomputing show, I worked with a team that demonstrated how it is possible to achieve portable supercomputing at the edge by exploiting mobile processors and GPUs; that was done by massively scaling up the quantity used. However, scaling them efficiently requires low-latency communication among the GPUs for optimal performance. This year, at Supercomputing 2017 in Denver, I followed that up with a demonstration for the U.S. Naval Research Laboratory with the Silicon Valley startup PointR Data. In our new demo, we were able to demonstrate supercomputing capability with 12 GPUs (NVIDIA Tegra X2) connected through a PCIe fabric with 18 TeraFLOPs of half precision (FP16) performance and a full software stack for low-latency inference capabilities, intended for AI models. This level of compute horsepower was possible within a power envelope of 110 watts.
We used an appliance codenamed PointR Data, which has been designed as an inference engine using commercial off-the-shelf (COTS) mobile GPUs connected via PCIe. The use of PCIe enables high-speed, low-latency communication that is critical to getting the most out of 18 teraflops, which the system is capable of clocking under heavy data traffic. It is important to note that for AI applications this is equivalent to 21 servers (or half a server rack).
PointR Data’s proprietary Minnow AI framework was used in executing a cascade of AI models that are running with a mixture of the Caffe deep learning framework and TensorFlow open-source software library.
As PointR Data CTO Burcak Beser explained, the PointR Data appliance is “achieving one of the highest teraflops per wattage, in a form factor that can be carried in the overhead bin of an airplane.” The PointR Data platform is designed for portable applications driven by batteries. At a lower than 10 watts/teraflop, it is possible to use image processing in cars, airplanes, boats, or other scenarios where space and power are at a premium.
One specific inferencing demonstration involved high-definition images of individuals on the show floor who were “fingerprinted” to generate a unique identifier (hash code) for the face, torso, and lower body for each person in the crowded exhibition area. We were able to analyze 2,000 images and generate fingerprints for each individual detected by distributing the images throughout the GPU fabric.
This allowed us to verify the scalability of the system under heavy data traffic. There was no compromise in latency of inferencing, which was consistently contained at under 300 milliseconds for all parallel image processing of subjects simultaneously.
“The PointR solution comes with low-latency inferencing capability at the edge and with flexibility of executing multiple AI models concurrently or switching them in near real time on a need basis,” explained Gabriel Sidhom, VP Technology and Development at Orange Silicon Valley. “This will enable multi-mission capabilities for embedded supercomputers. We see this as an evolution for supercomputing at the edge with high energy efficiency and agility for any specific mission.”
Hitting that leap forward was our original goal when we started working towards our early prototype in 2014. The idea was to provide a multi-mission capable, programmable AI supercomputer build out of COTS, embedded GPUs that can be deployed anywhere, both with extreme energy efficiency and without compromising on-the-fly programmability. This enables users to train new AI models and deploy them “live” to a portable field supercomputer, which is allowing reconfiguration of the mission parameters in real time. We see this as a definitive proof-of-concept for AI (super)computing at the edge.
Editor’s note: To read more about what Orange Silicon Valley was doing at Supercomputing 2017 in Denver, read our most recent Medium post about on-demand supercomputing.