Intel has officially announced its next-generation product for the high performance computing crowd: Knights Corner, the latest implementation of the company's 'Many Integrated Cores' architecture. We caught up with Intel at the International Supercomputing Conference in Hamburg to find out what's going on.
Intel is justifiably proud of its presence in the small yet lucrative HPC market: back at ISC 2010 Kirk Skaugen, general manager of Intel's Data Centre Group boasted that 82 per cent of the TOP500 list of supercomputers were based around Intel processors. It was an impressive figure, but one that masks a worrying trend: Intel's chips aren't always doing the heavy lifting.
In recent years, there has been a shift away from the CPU as the number cruncher in a supercomputer. The advent of GPGPU programming technologies such as Nvidia's CUDA, OpenCL, and Microsoft's DirectCompute allow tasks to run on the massively parallel processors of a high-end graphics card, spreading parallel tasks across hundreds, rather than dozens, of processing cores.
With no high-performance graphics expertise to speak of, that's a trend has Intel worried. Thankfully, it has a plan: Many Integrated Cores, or MIC. Based on the company exascale computing research and the defunct Larabee graphics technology programme, MIC promises to put Intel back on top.
Unlike Intel's CPU technologies, which struggle to hit double figures of cores, Intel's upcoming MIC implementation - dubbed Knights Corner - will offer more than 50 processing cores per unit, giving HPC users the kind of massively parallel processing capabilities that would previously have had them turning to graphics chips from Nvidia or AMD.
"We'll be launching on 22 nanometres and with tri-gate technology," Intel's Tony Neal-Graves explained to thinq_ during a briefing session, referring to a technology the company claims allows for the creation of a more efficient '3D' semiconductor. "That means that we're going to be able to leverage cheaper, and faster, and more power-efficient silicon."
Intel's figures are certainly convincing: with a software development platform it launched last year, codenamed Knights Ferry and based on a non-tri-gate 45nm process, the company's partners have been able to push performances of up to 772 gigaFLOPS on a sample Knights Ferry board, while a demonstration system at Intel's ISC booth featuring eight Knights Ferry boards in a single 4U server is able to push 7.4 teraFLOPS.
The above offers an insight into the direction Intel is taking with the MIC programme: rather than replacing the CPU, Knights Corner is designed to augment it. Delivered as a PCI Express add-in board covered with a large heatsink - and looking for all the world like a high-end graphics card from Nvidia or AMD - Knights Corner is designed to be a drop-in replacement for GPGPU-based supercomputers, with one major difference: programmability.
"We're going to bring programming capabilities to end users in a way that they've never been able to do before," claimed Neal-Graves. "We're really focusing on a common programming model that's really well-known in the industry, based around x86, and extending our co-processor - MIC, or Knights Corner - with that same capability. So, the things that you can do today on the CPU you'll be able to do tomorrow with Knights Corner, and you'll have a very familiar programming model with which to do it."
That's a major breakthrough, and one that promises to help push adoption of the MIC architecture in a world which has increasingly looked to GPGPU programming for increased performance on highly parallel tasks. "If you can program a Xeon, you can program a MIC co-processor," Neal-Graves explained. "It uses the same tools, the same compilers, it has the same programming model that exists on x86 today."
Mike Showerman, Technical Program Manager at the NCSA's Innovative Systems Labs agrees. "MIC offers a sort of fast-path to quickly get codebases up and running on the device, and we're able to continue to use methods of task parallelisation that we've used on shared memory systems for a decade," he explained to thinq_. His lab has had early access to two Knights Ferry-based systems, and has already achieved a great deal.
"It was real easy to do the initial port over to running on the MIC itself," Showerman explained. "Then we were able to take some of the MIC extensions and start to prove some of the performance, and demonstrate that we are able to achieve the specifications with the equipment. We also thought of, and have done, a full astronomy application, based on code which was originally written with accelerator-type platforms in mind so it fits relatively well to this programming method."
For proof of the accessibility of the MIC platform to x86 programmers, Intel's booth at ISC 2001 includes a demonstration of hybrid LLU factorisation running on a Knights Ferry platform. "The key thing about this is that it's demonstrating heterogeneity," Neal-Graves enthused, "because you'll have code that's running on the Xeon and code that's running on a MIC, taking advantage of whatever it is that that particular application needs in terms of either the scalar performance that you have with the Xeon or the high parallel aspect of the code that's running on the MIC co-platform."
Another demonstration at Intel's booth, of hybrid computing with SGEMM, shows eighteen lines of code that can run on a Xeon or a MIC untouched. "We take care of it in the tools and in the compilers," Neal-Graves explained, "to make sure that it runs appropriately on the MIC environment."
Intel's partners are also well represented at the event in Hamburg, with a protein folding application from Forschungszentrum Juelich being of particular interest. "They were able to move this application over to our SDP in three days with very little change to their software," claimed Neal-Graves. "They got it up and running and they were able to demonstrate this application very quickly."
LRZ, another Knights Ferry partner which is demonstrating its TifaMMy matrix multiplication implementation at the event, is an even greater success story. "They were able to get this application up and running in just three hours, and then with tuning work they've been able to achieve 650 gigaFLOPS of performance on their particular application," we were told.
The story from all of Intel's MIC partners seems resolutely upbeat, in fact. Talking to thinq_, SGI's Michael Woodacre claimed that he believes MIC and similar technologies represent the future of computing. "As we, like many others, look towards getting to the exaFLOP level within the decade, it's a real challenge to scale up to that level of performance - we're looking at 500x over the next eight years. Even with Moore's Law giving us more transistors per chip, that's only going to give us about a 40x improvement.
"We're really looking to the MIC technology to give us this additional order-of-magnitude performance increase that we need to be able to deliver systems at this scale. From our experience and early access to the technology, we're pretty convinced and comfortable with the path that Intel is on, enabling us to deliver this extreme scale of performance. It's important, too, to remember it's obviously exciting to scale up to the exaFLOP level, looking at that capability, but it's also important to scale down too, and MIC technology does that by allowing you to get this order-of-magnitude performance density improvement into your local compute room, onto your desk.
"So, from an SGI perspective, yeah, we're very excited that these two elements: ease of use with the familiar programming environment and the compute density that this technology is bringing to the market," he concluded.
While nobody was willing to talk release dates Neal-Graves confirmed that Knights Corner is on track for a release in 2012, while others readily admitted that the technology behind MIC has a way to go. Intel senior engineer James Reinders explained: "We've consistently heard from people using this platform today - which is essentially a prototype, a software development platform - that the auto-vectorisation capabilities of our compiler on MIC are not doing everything that we do in our products for Xeon today. That's something I expect to see a lot of progress on over the next year before we go into product with this," he claimed, admitting that the demos at ISC have required a fair bit of manual tuning using intrinsics to extract the peak performance from the Knights Ferry boards.
Intel's MIC programme is clearly a response to a growing threat both from established GPGPU vendors like AMD and Nvidia but also from relative newcomers to the market like many-core pioneer Tilera and co-processor specialist Adapteva. With an accelerator product of its own pushing high performance on highly parallel tasks, Intel will finally be able to offer a competitive solution.
It's going to have to hurry, however. With AMD promising a unified address space on future implementations of its Fusion architecture, the concept of GPGPU computing is far from dead in the HPC market.
One interesting omission from Intel's booth at ISC this year is Itanium, the company's once-flagship high performance computing architecture. With Intel pushing the capabilities of a Xeon and MIC hybrid it's hard to see where Itanium fits in to the space, suggesting that Intel is finally giving up on what many in the industry see as one of the company's biggest blunders.