Intel’s next-generation architecture, codenamed Haswell, isn’t just another “tock” in Intel’s tick/tock cadence; it’s a serious threat to both AMD and Nvidia. For the first time, Intel is poised to challenge both companies in the mainstream graphics market while simultaneously eroding Nvidia’s edge in the GPGPU business. Its low-power, 10 Watt TDP ULV parts will challenge the price/performance ratio of AMD’s second-generation Brazos SoC (codenamed Kabini) as well as any ARM-based Windows 8 notebooks that companies like Qualcomm might bring to market.
Let’s take a look at the architecture, starting with the CPU.
Wider, deeper, faster
Haswell is a logical extension of the micro-architectural improvements Intel first introduced in Sandy Bridge. The new chip adds support for Intel’s second-generation Advanced Vector eXtensions (AVX2), which doubles the core’s peak FPU throughput. L1 and L2 bandwidth have been doubled to ensure the execution units stay fed, and the integer and FPU register files have all been enlarged. Branch prediction efficiency also gets a boost. Haswell’s real-world single-threaded performance in unoptimised code is expected to improve by 10 to 15 per cent. In optimised, AVX2 code, the leap will be much larger; AVX2 includes support for integer vectorisation that AVX lacks.
The increased FPU capability and additional AVX2 functionality make a huge difference in Haswell’s floating-point performance. The CPU is capable of up to 32 single-precision and 16 double-precision floating point operations per core. That’s twice what Sandy Bridge could achieve; a theoretical eight-core Haswell clocked at 3.8GHz will offer 972.8 gigaflops of SP and 486.4 gigaflops of DP performance. While it’s true that current GPUs exceed these levels, x86 compatibility is one heck of a carrot.
Intel’s “good enough” argument sank the big iron RISC vendors of the 1990s and early 2000s, and it’s a real threat to Nvidia’s GPGPU momentum. The chip’s L1/L2 cache bandwidth is vastly increased from current levels; the L1 bus is twice as wide as well. The massive amounts of additional bandwidth are what the chip needs to keep the AVX2 units busy; Haswell should be able to hit a relatively high percentage of its peak theoretical gigaflops rate in real-world scenarios.
While Team Green will likely retain the overall performance advantage, a quad-core Haswell with a 4GHz Turbo mode will offer 256 gigaflops of double-precision floating point (512 gigaflops single-precision). That level of single-precision performance is right in the neighbourhood of Nvidia’s GT 640. Because Nvidia has historically hobbled double-precision performance on consumer cards, quad-core Haswell could well outperform Nvidia’s GTX 680 and possibly pace the GTX 580 in DP operations.
Nvidia could win the battle at the high end, only to lose the war at other price points if Intel chooses to make an issue of it. Worse, there’s the fact that every single Nvidia-equipped HPC system comes with an Intel solution by default. Make no mistake, Intel is playing up the potential Xeon Phi connection; three of the company’s IDF seminars addressed vectorisation for both Haswell and Xeon Phi.
Haswell’s GPU turns the screws on Nvidia, AMD
Haswell’s GPU is a tweaked version of the cores currently deployed in Ivy Bridge. What’s really changing is the shader loadout; Intel will offer Haswell in 10, 20, and 40-shader flavours (GT1, GT2, and GT3). The chip will also be offered in variants that include up to 128MB in on-package RAM, a feature that provides the GPU with a small dedicated pool of memory. Intel isn’t talking much about the GPU changes, but the company has stated that the new GT3 configuration offers “up to 2x” the performance of Ivy Bridge’s HD 4000 graphics.
Even a conservative take on that promise spells trouble for AMD and Nvidia. According to figures from Anandtech, Trinity’s GPU is an average of 18 per cent faster than Llano’s across a range of 15 popular titles. Compared to Sandy Bridge, Trinity was almost 80 per cent faster. Against Ivy Bridge, it’s just 20 per cent faster. Given what we know of Haswell’s GPU shader counts and performance targets, it shouldn’t be hard for Intel to deliver a 30 to 50 per cent performance boost in real-world gaming. If it does, Trinity goes from the fastest integrated GPU on the market to an also-ran, and AMD loses the superior graphics hole card it’s been playing since it launched the AMD 780G chipset four years ago.
Sunnyvale has virtually no room to manoeuvre. The company’s 28nm Kaveri APU, with its next-generation graphics core based on the HD Radeon 7000 and the new Steamroller CPU reportedly has yet to tape-out. That means it could be the tail end of 2013 before we see the chips, assuming that production goes smoothly. AMD will likely offer a “Trinity 2.0” update to stem off Haswell’s onslaught, but slightly higher clocks aren’t going to be enough to keep Intel from matching AMD’s performance.
Barring a major screw-up, it’ll be Haswell, not Kaveri, that crosses the “good enough for enthusiasts” line first. The chip’s 10 Watt power envelope won’t directly compete with any potential Tegra 4-based tablets – that’s what Bay Trail, Intel’s out-of-order 22nm Atom SoC is for.
No, Haswell won’t drive AMD out of business or frighten Nvidia into dumping Tesla – but unless Intel completely blows its roadmap, it’ll drive both companies further towards the margins of computing. For AMD, that move is very literal; the company is being forced into the low-end products that Intel doesn’t want to bother with.
For Nvidia, it means scrambling to convince OEMs to allocate space for discrete GPUs at a time when Intel’s marketing dollars and consumer preferences are aligned against it. Enthusiast preferences, Intel’s historically weak driver support, and Nvidia’s own brand recognition will help, but the IT industry is littered with the bones of companies that treated their brand as an unassailable bulwark rather than a sand castle. Enthusiasts that care about performance tend to follow it, wherever it goes.