Starting with Sandy Bridge, Intel made next-generation vectorisation capabilities a priority. Sandy Bridge (SNB) introduced the AVX instruction set for floating-point operations and widened the CPU’s registers to 256 bits, up from SSE’s 128.
AVX2, which just launched with Haswell, widened integer registers to 256 bits as well. AVX also introduced a non-destructive operand mode, which allows for operations like C=A+B as opposed to A=A+B (thus saving the original value of “A”). SIMD (single instruction, multiple data) instruction extensions are a major part of boosting total processor efficiency and instructions per clock cycle (IPC).
Intel has just announced a future version of AVX, dubbed AVX-512. As the name implies, AVX-512 widens vector registers out to 512 bits. Whether or not this constitutes an actual widening of anything, however, depends on the product in question.
Intel is saying that the first product to support AVX-512 is the next generation version of Xeon Phi, dubbed Knights Landing. Knights Landing is built on 14nm technology, will integrate on-package memory, be capable of operating as a standalone co-processor, and reportedly will break the 3TFLOPS barrier. PCIe 3.0 and DDR4 are also reportedly on the menu as well.
Some of Intel’s remarks imply that AVX-512 automatically implements AVX and AVX2 support as well; the company’s blog post states that AVX-512 instructions can be mixed with AVX instructions without penalty. It’s possible that this won’t apply to Knights Landing – Intel could save full backwards compatibility for a chip like Skylake or Skymont – assuming that the AVX 3.2 instruction set mentioned in the slide below is actually analogous to AVX-512.
Current Xeon Phi processors support 512-bit vectorisation but lack AVX support, and Intel has stated in the past that it would unify AVX capability between various products at some point in the future. Presumably that’s going to happen in the 2015 timeframe with these new chips. It also allows Intel to continue closing the gap between CPU double-precision floating point performance and what might be considered “good enough” with relation to discrete GPU offload.
Discrete cards will always have an advantage in this area as far as absolute performance is concerned, but if Intel can continue evolving Core at the present rate, it may blunt the impact of AMD’s HSA initiative. The degree to which ordinary workloads will be able to continue leveraging these benefits is something we’ll have to measure when the time comes, but the impact in high-performance computing (HPC) workloads will be significant.