Epiphany promises CPU performance breakthrough

While chip giants ARM, Intel, and AMD battle for control of your smartphones and PCs, a small company in Massachusetts called Adapteva is starting a revolution: many-core processors that offer a significant performance boost over anything currently on the market. We chat to its founder Andreas Olofsson to find out what's going on.

Initially, it's easy to dismiss Adapteva's chances of success as slim to none: in markets where a single processor architecture holds overwhelming dominance, such as x86 in mainstream computing and ARM in the world of smartphones, Olofsson has decided to develop an entirely new architecture.

Olofsson readily admits that history is littered with companies who have tried the same approach and failed. "In the mobile space there's MIPS, for example," he explained to thinq_ during our interview. "That's another architecture, which has been around for 25, 30 years now - you know, it's a great architecture, and yet they have very low traction. Everybody - 98 per cent of the market - uses ARM."

To avoid falling into the same trap, Olofsson has a new idea - or, specifically, a variation on an old one. In the early days of personal computing, it was common for a central processor to have a 'math co-processor' chip alongside it - a secondary processor which was designed specifically to carry out floating point arithmetic at speeds significantly faster than the main processor. Intel had its 8087, Motorola its 68881, and AMD the stand-alone 9511.

Over time, however, these chips became integrated into the processor itself - evolving into the high-performance floating-point units, or FPUs, that are a feature of all modern central processors. Olofsson's big idea, he explained, is to bring the days of the math co-processor back - and promises some major performance-per-watt gains for those making the step.

"I think there's little room for a new architecture to be the master," he admitted - but went on to explain that his company's aim is to use its new technology, dubbed Epiphany, to aid existing processors, rather than replace them. "I don't think we add that much value that we should replace a main processor.

"The idea is that we solve a specific class of problem, and we need something other that's more general-purpose, that's more flexible, to be the master in the system. So, we can be an augmenting slave to an FPGA, to an Intel processor in a desktop or in a server, or to an ARM processor in a smartphone."

While the flexibility of general-purpose architectures is a positive in terms of doing many tasks well, Olofsson argues that it's a very wasteful approach - even for reduced instruction set architectures like ARM. "The more features you have, the less energy efficient you become, the heavier the machine becomes. So, we basically said 'you can't have that kind of architecture and do very efficient floating-point math' - the kind of math you would do in a RADAR application, or in an image processing application, or in a speech recognition application. So, there you need something special."

That something special, Olofsson claims, is Adapteva's Epiphany architecture. "What people have done in the past is to create hardware accelerators - they've basically put the gates down, and hardened it, and put it on a chip. It does one thing - which is a big problem today, because chips are very expensive, and it becomes very expensive if you want an accelerator for every function," Olofsson claimed. "If you think of a cellphone, for example, if you were to put in an accelerator for audio, for video, for the camera, for speech recognition, you'll be stuck with a very large chipset.

"We took the approach that we want something software programmable, we want something very energy efficient. So, that's kind of how the company started. What we did in terms of the architecture standpoint: we started from scratch."

That's a risky move, but Olofsson claims that from a developer's perspective, the instruction set used in the hardware is significantly less important than in the early days of computing.

"The instruction set architecture is very important if you're running an operating system, like ARM and x86, but for people writing code - not the firmware developers, not the low-level guys that are doing Linux drivers - they don't care about instruction sets. That's what we've found. Everybody writes in C today, or some other high-level language."

That's perhaps the key feature that pushes Epiphany above alternative systems such as GPGPU computing, where a system's graphics processor acts as a massively parallel offload engine using OpenCL, Nvidia's CUDA, or Microsoft's DirectCompute. "GPUs, from Nvidia especially, they have great energy performance compared to x86," Olofsson admitted, "so they sit as a co-processor - and I think they've done very well. Where we're a little bit different is that we're truly C programmable.

"A guy straight out of college who's done a course in C programming can take a program and run it on our machine. There's no new constructs to run - you can take a program with legacy code and run it straight out of the box on our machine, and you can't do that on GPU."

The big problem, of course, is persuading OEMs to add the chips to their hardware designs - but the small size and high performance of Epiphany co-processors is a convincing argument. A 64-core Epiphany implementation takes up around the same space as a dual-core ARM processor, while offering a significant performance boost for floating-point arithmetic - some 50 times faster than a traditional multi-core system-on-chip design, the company claims.

Each 1GHz Epiphany core is capable of pushing two gigaflops of sustained performance, while the technology's current implementation offers up to 50 gigaflops of performance per watt of energy consumed.

Olofsson has spent the last week showing off his creation to the great and good of the semiconductor world at the ESC Multicore Expo in San Jose, with a 16-core 1GHz Epiphany reference design based on a 65nm process - and which is already capable of pushing 35 gigaflops per watt, despite its somewhat ancient process size.

"We were late into 65nm," Olofsson admitted during the interview. "It's hard to be one process size to two process sizes behind - people don't give you credit for doing an elegant architecture, they just care about the raw performance." That's all set to to change, however, with a 28nm implementation just around the corner. "We have a great relationship with our foundry, and they gave us early access to 28nm - we're right there."

With Epiphany's 65nm chips already gaining interest, many customers will be interested to see what the technology can do at 28nm - and Olofsson plans to borrow ideas from both x86 giant Intel and RISC champion ARM when it comes to monetising his technology.

"There are certain markets where we're looking to license IP, and certain markets where we're looking to build chips," he explained. "If you look at the lower volume markets, like the military market with UAV drones, the supercomputing market where people want massively parallel server types of configurations - for those, we're going to build custom chips, and we're going to continue doing that. That's going to be our technology development vehicle.

"At the same time, we really think this a great place for mobile platforms, smartphones, tablets - but you only get in if you're a tier-one semiconductor company, so we're looking to partner there," Olofsson claimed - taking his lead from ARM, a company which produces no hardware but instead licenses its intellectual property to third party manufacturers such as Samsung and Qualcomm.

While it's easy to dismiss Olofsson as at best a dreamer and at worst about thirty years too late, he certainly has the experience to see his new venture through. An ex-employee of Analog Devices and later Texas Instruments, Olofsson's work on digital signal processors gave him the epiphany which led to Epiphany. "At Analog Devices, I did a lot of DSP design - Digital Signal Processors, which is basically a microprocessor tuned for signal processing. You could say that this is the next generation - although I had to leave the company to start it!

"Today we see that there are a lot of DSPs and microprocessors that are trying to scale up from one big core to multi-core, and they find that they can only really fit four or eight cores on a single chip before they just run out of silicon area. We took the approach that to scale up to many more cores and get the performance that we need, we needed to start from scratch - design everything specifically for multi-core design. So, we designed the cores, the instruction set architecture, the memory interface - everything for multi-core. That's the reason we can get an order of magnitude more cores on a single chip."

Olofsson isn't exaggerating, either: while the currently reference implementation uses 16 cores, he plans to bring out a 64-core version once the 28nm production run begins - and sees the technology scaling considerably higher than that.

"If you look at an FPGA company, they have all kind of sizes of FPGAs - and we're going to take the same approach. You never know what the market is going to want, so what you want to do is put out a whole selection of processors and let the market choose. We're going to have flavours from 16 cores up to 4,000 cores."

While field-programmable gate arrays are often used to accelerate tasks thanks to their software-definable nature, Olofsson doesn't see them as a competitor to Epiphany - but rather as another possible customer. "FPGAs are heavily used in infrastructure communications for the military, and it's basically a series of programmable gates - and we try to be a co-processor to them, even though an FPGA is not really a processor by itself."

It's something which has already caught the eye of FPGA and DSP specialist BittWare, which has licensed Adapteva's Epiphany technology for use in its own Altera-based FPGAs under the name Anemone. "We believe that Adapteva's Epiphany architecture represents a flash of brilliance that reintroduces some much needed creativity into the embedded signal processing world," BittWare's chief Jeff Milrod crowed at the launch of Anemone. "The resulting Anemone chip, combined with Altera's family of FPGAs, creates an insanely cool solution for complex signal processing tasks that can be optimised for power, performance, and productivity."

Although Olofsson denies trying to replace existing architectures such as x86 and ARM, he readily admits that his vision is of a huge quantity of Epiphany processing cores finding their way into all computing markets.

"Everybody always wants more performance - it never seems to be enough. In most platforms today, energy efficiency is a big deal - obviously, in mobile phones and tablets it's everything, but in datacentres thermal design issues are a big problem. In every field we looked at that needs high performance, power consumpution and energy efficiency is a big concern.

"The only way to get there," Olofsson claims, "is to specialise - you can't do it by putting down more and more general purpose processors. It doesn't scale."

That's something with which the established giants of the semiconductor world might disagree - but if Olofsson's technology can scale to the extent he claims, it might not be long before his dream of an Epiphany co-processor in every system becomes true.