Inside Nvidia Pascal GP100 Silicon; GPU features 3,840 CUDAs, 4096-bit wide HBM2 Memory
Unveiled at GTC 2016, the Tesla P100 is Nvidia’s latest compute monster which is powered by their fastest GPU yet, the Pascal GP100. Nvidia Tesla P100 features a slightly cut down version of GP100 GPU, delivering 5.3 TFLOPS using 64-bit floating-point numbers, 10.6 TFLOPS using 32-bit, and 21.2 TFLOPS using 16-bit.
From what we know so far, the Pascal GP100 itself features 15.3 billion transistors on a 610mm2 16nm die, and 16GB of HBM2 memory. It has 4MB of L2 cache, and a block of 14MB of shared memory. Nvidia claims the GPU delivers 65 percent high speed, around 2 times the transistor density increase, and 70 percent less power than its 28HPM tech.
Recently, folks over at TechPowerUp had the chance to get their hands on the block diagram of the Pascal GP100 silicon. According to the details shared, the GP100 is a multi-chip module, comprised of a large GPU die, four memory-stacks, and silicon wafer (interposer) acting as substrate for the GPU and memory stacks.
Design-wise, the GP100 follows the top-level hierarchy of other Nvidia GPUs, except that it integrates two key interfaces: bus and memory. This next gen GPU has six graphics processing clusters (GPCs), with each featuring 10 streaming multiprocessors (SMs). Each SM holds 64 CUDA cores, which means the GPC holds a total of 640 CUDA cores, and the entire GP100 chip consists of 3,840 CUDA cores.
As for clock speeds, the Tesla P100 system runs at an insanely high 1328 MHz core clock, with GPU boost frequency of 1480 MHz, and a TDP of 300W. The GP100 uses HBM2 memory across a 4096-bit wide interface, with typical memory bandwidths of up to 1TB/s.
In addition to this, the Pascal GP100 GPU also integrates the NVLink—Nvidia’s new high speed, high bandwidth interconnect for maximum application scalability. Which delivers a 5x acceleration in bandwidth compared to today’s best-in-class solution.
“Up to eight Tesla P100 GPUs can be interconnected with NVLink to maximize application performance in a single node, and IBM has implemented NVLink on its POWER8 CPUs for fast CPU-to-GPU communication,” Nvidia revealed during their GTC 2016 Keynote.
Nvidia Tesla P100 GPU is aimed at hyperscale data center workloads crunching deep-learning AI and HPC apps. Servers with the chips are set to hit the shelves in Q1 2017.
Gohar is the lead editor at TechFrag. He has a wide range of interests when it comes to tech but he's currently spending a big chunk of his time writing about privacy, cyber security, and anything policy related.