Nvidia Tesla P4 And P40 GPUs Boost Deep Learning Inference Performance
Two new GPUs for deep learning have just been announced, the Nvidia Tesla P4 and P40. Both are successors to Tesla M4 and M40 but are more powerful. The Pascal-based Tesla P100 came with support for 16-bit (FP16) precision but the Nvidia Tesla P4 and P40 support 8-bit INT8 precision as researchers have now learned that you do not need especially high precision for deep learning training.
Out of the two new GPUs, the Nvidia Tesla P4 is the low-end one. It is aimed at scale-out servers that need high-efficiency GPUs. The Nvidia Tesla P4 consumes 50-75W of power and has a peak performance output of 5.5 TeraFLOP/s and 21.8 INT8 TOP/s. In the AlexNet image processing test, the Nvidia Tesla P4 is 40 times more efficient than an Intel Xeon E5 CPU.
Tesla P40 performance enhancement is due to the Pascal architecture as well as the shift from the 28nm planar process to the 16nm FinFET process. Nvidia claims that this GPU is 4x faster than the previous generation Tesla M40. The GPU has a peak performance of 12 (FP32) TeraFLOP/s and 47 TOP/s, which makes it twice as powerful as the Nvidia Tesla P4. The max power consumption for the GPU is 250W.
So far, we have seen that Nvidia has been comparing their products to Intel’s general purpose CPUs but Intel’s main product for deep learning is the Xeon Phi line of chips which consists of multiple Atom based processors. We are yet to see how the GPUs stack up against that but we can imagine that Nvidia would be on top in that comparison as well because of the benefits GPUs have over multicore CPUs.
Still, if we talk about the comparisons made here then a more realistic comparison would be between the Xeon Phi with Nvidia’s GPUs.