AMD Detonates Trinity: Behold Bulldozer’s Second Coming

It's been a turbulent 12 months for AMD. Since the company launched Llano, its first mainstream "Fusion" part, it has replaced its CEO, brought in multiple new executives, debuted a disappointing architecture, delayed its next-generation Brazos parts by a full year and outlined a comprehensive vision of the future that de-emphasises cutting-edge process node transitions in favour of re-useable IP blocks that can be shared between multiple SoCs (system-on-a-chip).

When it launched last year, Bulldozer ran hot, scaled poorly, and was less efficient than its predecessor. So, when it came to building Llano's successor, AMD clearly had its work cut out for it.

We're guessing that Trinity (the code name) is a nod to the fact that Trinity (the APU) contains a new CPU, new GPU, and new interconnect structure. There's also a handy reference back to the first atomic bomb test in July, 1945 (this is where Oppenheimer famously said "I am become death, destroyer of worlds" and, of course, to the Holy Trinity. These are both big shoes to fill, so let's tackle what it's come up with, starting with the CPU core. We'll only be addressing the CPU and GPU here.

AMD claims to have done a great deal of low-level optimisation to clean up Bulldozer's mess. Piledriver's branch prediction is better, its integer and FPU scheduling makes better use of shared resources, and larger L1 TLBs (Translation Lookaside Buffers) reduce the chance that the CPU will "miss" when searching translated virtual addresses.

Piledriver also adds support for two additional instructions, FMA3 (Fused Multiply-Add) and F16C. FMA3 is a different form of the FMA4 instruction Bulldozer supported. AMD has beaten Intel to the punch on this one; Intel's own FMA3 support will debut in 2013, with Haswell. Both instructions can improve code execution efficiency by fusing operations and performing them in a single clock cycle, but neither FMA3 or FMA4 is expected to provide significant speed boosts. F16C is a method for converting and storing 32-bit floating point values using 16-bits. AMD might make use of this for the GPU (GPUs have a native 16-bit floating point shader capability), but that's an unknown as well.

Nearly all of the listed changes are small, but combined, they could make a significant difference in the chip's overall efficiency. I'm particularly curious about the unspecified "L2 efficiency improvements," having long suspected that high cache latencies fundamentally sabotaged Bulldozer last autumn.

One major feature Piledriver doesn't change is the number of instructions decoded per clock cycle (four per module, for a total of eight in a dual-module / quad-core design. That's significantly fewer than Llano (12 per quad-core) or Sandy Bridge (16). With Bulldozer, it was never clear how much of a role this played in the chip's lower-than-expected performance.

CPU Performance

To call AMD's CPU performance data "cherry picked" is an understatement, virtually every performance score the company provided is GPU-centric or leverages the GPU heavily. The only non-GPU performance data AMD released was in PCMark Vantage and PCMark 7. Those aren't bad choices for total system productivity - while they don't rely entirely on the CPU, they're probably much more relevant to the end-users AMD is courting with these new designs.

Unfortunately, even here, data is extremely limited. AMD's vaunted claim that Trinity delivers 2x the performance per watt of Llano is based solely on PCMark Vantage's overall score. AMD claims that a dual-core, 17W Trinity at 2.6GHz essentially ties a quad-core Llano at 2.3GHz, but states elsewhere that an A10-4600M (Trinity, quad-core, 3.2GHz) is only 28.5 per cent faster than an A8-3500M (Llano, quad-core, 2.4GHz). Incidentally, the company's claim to deliver a 28.5 per cent higher x86 performance is based solely on the latter figure.

With dubious and rather contradictory data, our best guess is that Trinity improves on Llano's overall positioning and offers equivalent performance, clock-for-clock. This will translate into better CPU performance in some SKUs. More important, from AMD's perspective, was the need to bring Bulldozer's power consumption down to something that would fit into mainstream and "ultra-light" form factors. Trinity accomplishes this. It won't compete with Ivy Bridge - matching Llano means it won't even compete particularly well with Sandy Bridge in CPU-centric workloads - but AMD is pricing these parts into markets well below Intel's target for IVB and ultrabooks.

AMD's Cayman - Trinity's linchpin

In most literature, AMD refers to Trinity's GPU as a "Northern Islands" class part without bothering to explain whether it's based on Barts (a modest step forward from the old 5000 series) or Cayman (the high-end GPU that AMD confined to the 6900 series). Officially, it's branding the new core as part of the Radeon 7000 family, which isn't accurate either.

Trying to make sense of AMD's branding has become murky at best. The 7000M brand is now polluted with three types of GPUs - 40nm rebrands of 6000 parts that are based on Barts/Turks, 28nm parts based on GCN (Graphics Core Next), and 32nm APUs based on Cayman. Trying to hash out which GPUs are the best match for the APU is a task better left to saints and madmen then poor journalists; we'll leave the topic of paired graphics for another day.

Unlike Llano, whose integrated GPU was nearly identical to AMD's discrete "Redwood" part, there's no easy point of comparison for Cayman. The new GPU features an array of six SIMD clusters of 64 cores each; Llano had five SIMDs of 80 cores. The new GPU is slightly smaller than its predecessor, with 384 cores instead of 400, but one of the features AMD introduced with Cayman was a VLIW4 architecture that was significantly more efficient than the VLIW5 designs that preceded it. AMD has also increased the number of texture units, to a maximum of 24, up from Llano's 20. The total number of ROPs remains the same, at eight.

When it comes to game performance, AMD is more willing to share the goodies.

  • All game tests run at 1,920 x 1,080

The company has a bad habit of switching back and forth between desktop parts and laptop parts, and there's no Ivy Bridge comparison data. Still, things look good. This was very nearly a given; Cayman made a number of efficiency improvements that were logical fits for Trinity, and all of AMD's APU demos in the run-up to launch focussed on the GPU.

Trinity moves AMD forward, buys time for 2013 launches

After talking with AMD and reading over the company's presentations, our educated bet is that Trinity is a qualified success. Piledriver may not move the bar very much on the CPU side of the equation, but between power consumption, temperature, and performance, AMD had to fix the first two to have any chance of launching a mobile part based on the architecture. If Piledriver can match Llano clock-for-clock (or within the same TDP), that's still significantly more than BD managed when compared to Istanbul/Thuban.

Will it compete effectively against Ivy Bridge? No. But it was never intended to. AMD's goal with Trinity is to position the CPU as a successor to Llano, a further fulfilment of the company's "Fusion" vision, and as an anchor in the popular mid-range segment. Based on what we've seen and a few educated guesses, it's got a fair chance of pulling it off - short term.

No matter how successful Trinity is in 2012, it doesn't change the fact that AMD has no traction in tablets or sub-10W designs at a time when companies like Qualcomm have given notice that they intend to move into PCs. That's fine for the moment, because Windows 8 won't drop until the latter half of the year, and it'll take four to six months past that point for some of the traditional smartphone/tablet players to make moves into the low-end PC space.

AMD needs a quick jump to 28nm Brazos and a fast refresh on Trinity. In theory, the new chips - Kabini and Kaveri - will be ready in 2013. The company has yet to put a quarter on that number, or to even comment on where the parts are being made. Trinity may be a good beginning, but it's only that; AMD has a long way to go when it comes to carving out its own territory in between Intel at the top of the market and an onslaught of ARM-based hardware at the bottom.

  • Published under license from Ziff Davis, Inc., New York, All rights reserved.
  • Copyright © 2012 Ziff Davis, Inc.