A closer look at Intel’s 72-core x86 Knights Landing chip and the race to exascale computing

On Tuesday, Intel revealed Knights Landing, its next-gen up-to-72-core Xeon Phi supercomputing chip. The main change is that Knights Landing will be a standalone processor, rather than a slot-in coprocessor that must be paired with a standard Xeon CPU.

Furthermore, Knights Landing will have up to 16GB of DRAM 3D stacked on-package, providing up to 500GB/sec of memory bandwidth (along with up to 384GB of DDR4-2400 mainboard memory). Knights Landing will debut in 2015 on Intel’s 14nm process, and with a promise of 3 teraflops (double precision) per socket it will almost certainly be used to build some monster 100+ petaflop x86 supercomputers, and beyond to exascale.

The current version of Xeon Phi (Knights Corner) is a PCIe expansion board with an up-to-61-core Intel MIC (Many Integrated Core) chip. These cores are based on the original P54C Pentium core – just like its stillborn Larrabee predecessor – but with a lot of modern additions, such as 64-bit support and 512-bit vector registers. (Read more details about the current Xeon Phi here).

Knights Landing is a major revision of Knights Corner, making sweeping changes across almost the entire platform. Gone are the P54C cores, replaced with up to 72 out-of-order Silvermont (Atom) cores. These new cores will implement AVX-512 (AVX 3.1 instructions).

Perhaps most importantly, though, Knights Landing will be a standalone CPU, with an integrated six-channel DDR4-2400 memory controller, up to 16GB of on-package 3D stacked RAM, and 36 PCIe 3.0 lanes.

All of these changes equate to a theoretical performance of 6 teraflops of single precision math, or 3 teraflops of double precision math. By comparison, Haswell maxes out at around 500 gigaflops of double precision math.

Power-wise, Knights Landing should manage between 14 and 16 gigaflops per watt. While it’s a nascent comparison, the most efficient supercomputers currently max out at around 4 gigaflops per watt. With 16GB of on-package RAM boasting a bandwidth of 500GB/sec, there should be significant latency gains, too.

Later in 2015, there will also be a special version called Knights Landing-F that integrates a 100Gbps Cray HPC interconnect on 32 of those PCIe 3.0 lanes, allowing supercomputer makers to connect up Knights Landing chips via standard QSFP optical links.

Xeon Phi competes directly with Tesla, Nvidia’s GPU-based coprocessor add-in boards. Tesla currently dominates the HPC accelerator/coprocessor market, with 38 out of the top 500 supercomputers. Xeon Phi is a major component of the world’s most powerful supercomputer (Tianhe-2), but adoption is generally lower (just 13 of the top 500). By becoming an actual CPU, rather than an add-in card that must be controlled by a “normal” CPU (Haswell, Opteron, etc.), it will be possible to build supercomputers entirely out of Xeon Phi – a huge change that will both reduce the complexity and cost of building supercomputers, and, thanks to the unified architecture, it’ll be a lot easier to write software that takes full advantage of the hardware.

At 3 teraflops per socket, assuming four sockets per 1U server, we’re looking at a full 500 teraflops (half a petaflop) in a single 42U rack. If the 100 petaflops barrier hasn’t been broken by 2015, it will almost certainly be a Knights Landing-based supercomputer that does it first – and it should be a serious competitor for the race to exascale (1000+ petaflops) computing.

Image Credit: VR-Zone