AMD's 2012 Mobile Strategy Revealed: Trinity Becomes Fusion
5/21/2012 by: Marcus Pollice
AMD recently introduced the processor codenamed Trinity. At first, Trinity will power a series of "ultrathins" and other mobile designs, while the desktop version will follow soon afterwards. In this article, we go deep into what Trinity brings to the market.
The Trinity die is manufactured at the 32nm using SOI technology at GLOBALFOUNDRIES in Dresden, Germany. It packs 1.3 billion transistors, which is slightly down from Llano's 1.45 billion. Yet the die measures 246 mm², while Llano comes in at 228 mm². Why is that? One possible explanation could lie in the Piledriver design that is an evolution of Bulldozer. Bulldozer sparked some controversy to be inefficient in terms of transistor density. Compared to Llano, Trinity is a bit worse, but still quite good thanks to the very dense GPU design – more on that later.
Still this is an interesting point to make since the original design goal of Bulldozer and its successors was to improve transistor efficiency by sharing certain components of the CPU cores. It appears this strategy has backfired so far. AMD never officially explained why that is the case. Rumor has it that this is a side-effect of automatic silicon design tools compared to designs "handcrafted" by engineers. Given Bulldozers lackluster performance it remains to be seen how long AMD will stay on that train. AMD has since promised to improve performance significantly with each passing generation. The revised Piledriver cores will soon show how these promises translate into real world clock for clock performance.
AMD touts a 29% increase in productivity and a 56% increase in visual performance. These claims are based on the figures of the A10-4600M (Trinity) vs. an A8-3500M (Llano) in PC Mark Vantage (productivity) and 3D Mark Vantage (visual). The productivity increase looks a bit moot, considering the difference in clock speed. The 3500M is a 1.5GHz (2.4GHz Turbo) quad-core Husky CPU (K10.5 architecture with minor enhancements), while the 4600M is a 2.3GHz (3.2GHz Turbo) quad-core Piledriver CPU (2 modules containing 2 integer 1 FP core). So the application performance boost comes mainly from clock speed.
This is not a good sign for Piledriver performance going forward. Bulldozer already was unable to match its predecessor clock for clock, much less Intel’s offerings. It seems this will repeat similarly with Piledriver. Now we don't want to put Piledriver in a bad light, but we don't like this trend. Piledriver brings some improvements over Bulldozer, namely some instruction set enhancements, like FMA3 support and 16-bit floating point support (F16C).
AMD Piledriver CPU core improved on many fronts compared to its predecessor, the Bulldozer core.
Additionally AMD claims it brings IPC improvements, without quantifying it in detail. The L1 TLB now holds 64 entries, up from 32 which should help with performance. As the slide from AMD shows, a lot of other small things have been improved. So it might be more efficient per clock compared to Bulldozer, but the first official benchmark numbers indicate that this improvement will be only minor. Still before final judgment independent benchmarks covering a wide range of workloads should be evaluated.
Despite our skepticism regarding the CPU performance, it is still remarkable that AMD is able to fit these cores into ever smaller thermal envelopes. The company cited leakage reduction as a Piledriver improvement as well. The notebook models of Trinity will come in 17W, 25W and 35W variants. That’s down from 35W and 45W with Llano. In their first briefing, AMD gave a frequency range of 2.0GHz to 3.8GHz for the CPU and 424MHz to 800MHz for the GPU. But beware, this also includes desktop models, which are planned to be launched in 65W and 100W bins during Computex Taipei 2012, which is being held from June 5-9 in Taiwan.
The following table will give an overview over the different models launched for notebooks including the 25W quad-core and 17W dual-core. We will cover the detailed specs of the different GPUs in a separate table later.
|Model||Clock (Turbo)||Cores||L2 Cache||GFX||TDP|
|A10-4600M||2.3 (3.2) GHz||4||4 MB||HD 7660G||35W|
|A8-4500M||1.9 (2.9) GHz||4||4 MB||HD 7640G||35W|
|A6-4400M||2.7 (3.2) GHz||2||1 MB||HD 7520G||35W|
|A10-4655M||2.0 (2.8) GHz||4||4 MB||HD 7620G||25W|
|A6-4455M||2.1 (2.6) GHz||2||2 MB||HD 7500G||17W|
The GPU part of Trinity is much more exiting, even though it consists of already known technology. Concretely, the GPU features the VLIW4 architecture, which was previously used only in the high-end discrete GPU segment for Radeon HD 6900 series cards. Before the APUs will transition to AMDs current GCN architecture, it gives VLIW4 a second opportunity to shine. VLIW4 was specifically designed to provide more efficient utilization of resources compared to the VLIW5 architecture found in Llano as well as Radeon HD 5000 series as well as some 6000 series (excluding 69x0) and even some Radeon 7000 series OEM products that are simply relabeled old products. AMD specifically touts improved performance per mm² of silicon space. We believe that this helped offset some of the deficiencies of the Bulldozer design approach criticized in an earlier paragraph.
Trinity's 3D engine is one of parts how Trinity got its name: Piledriver CPU + Northern Islands GPU + Southern Islands Video
Compared to the Radeon HD 6900 series GPUs codenamed Cayman, the GPU in Trinity is much smaller of course. AMD specifies it to have up to 384 stream processors. This is a bit less than in Llano, but due to higher clock speeds Trinity packs more punch. While the Llano GPU topped out at 600 MHz, Trinity raises the clock up to 800 MHz. Due to the different internal architecture of the units comprising the GPU, in Trinity there are more texture units improving texel fill rate. The amount of ROPs has stayed the same and thus pixel fill rate will only scale with clock speed.
The following table gives an overview of the different bins that are used in the mobile Trinity models. For comparison purposes, the specs of the HD 6620G found in the Llano-based A8-3500M are included. The GFLOP figures are not comparable to AMDs marketing, as AMD includes the FP performance of the CPU part as well.
|Clock||444 MHz||497 MHz||497 MHz||497 MHz||360 MHz||327 MHz|
|Turbo Clock||-||686 MHz||655 MHz||686 MHz||497 MHz||424 MHz|
|Shader Units||400 (80 x5)||384 (96 x4)||256 (64 x4)||192 (48 x4)||384 (96 x4)||256 (64 x4)|
|GFLOPs||355.2||381.7 (526.8)||254.5 (335.4)||190.8 (263.4)||276.5 (381.7)||167.4 (217.1)|
|Pixel Fillrate||3.55 GPix/s||3.98 (5.49) GPix/s||3.98 (5.24) GPix/s||1.99 (2.74) GPix/s||2.88 (3.98) GPix/s||2.62 (3.39) GPix/s|
|Texel Fillrate||8.88 GTex/s||11.93 (16.46) GTex/s||7.95 (10.48) GTex/s||5.96 (8.23) GTex/s||8.64 (11.93) GTex/s||5.23 (6.78) GTex/s|
Similar to Llano there will also be lower SKUs with a smaller amount of shader processors and texture units as well as ROPs. Leaked information point to additional SKUs with only 128 shader processors, but for now this is everything AMD launched. The number of texture units is the shader count divided by 16, while the number of ROPs will probably be 4 for some of the lower end models similar to Llano. The ROP number is market with a question mark where we weren't 100% at press time whether it's the correct number. We are fairly sure the 192 shader version (and below) will only come with 4 ROPs and the 256 shader version will retain all 8 units but this is only an educated guess.
The way the GPU accesses memory seems to be the same as with Llano. The memory controller is organized in two unganged 64-bit controllers that can independently handle requests when there are accesses for both the CPU and GPU to be served. The GPU gets a fixed amount of memory that is dedicated to it, which can be configured at the BIOS level. Some notebooks might not allow the user to change the amount though. There are now also memory p-states, meaning the frequency will be dropped as soon as the memory is not utilized to conserve power.
In terms of display connectivity, AMD now supports DisplayPort 1.2 which allows them to drive up to 4 displays. Other than that the platform didn't change much. There will be a new version of the Fusion Controller Hub (FCHs), namely the A85X on the desktop. Legacy PCI will be thrown out the door. There are two additional SATA 6Gbps ports, totaling eight now. Also the controller will now include RAID5 support. This seems to be the way to go for upcoming FM2 motherboards. As a reminder, socket FM2 won't be pin-compatible with the predecessor FM1.
Another feature that got a major overhaul was TurboCore which is incorporated in its third incarnation in Trinity. Trinity not only supports different boost steps for the CPU, but also the GPU. The APU manages its TDP budget and optimizes for the specific workload it is given. AMD gave a good overview of how it might work out in practice. If only a single CPU core is loaded the maximum boost frequency is selected. In the case of the 4600M used as an example this would be 3.2GHz - quite high for a notebook.
The New Turbo Core
In a workload using more than one CPU core, the boost frequency is reduced to 2.7GHz. Finally in a 3D heavy application the CPU frequency goes down to the stock 2.3GHz, while the GPU gets clocked up to 685MHz (up from 496MHz). It appears that GPU boost and CPU boost are exclusive, but this needs to be verified in independent tests.
Earlier we mentioned that AMD disclosed a 56% performance increase in 3D Mark Vantage for the 4600M over the 3500M. The former uses the Radeon 7660G, while the latter has the part called Radeon 6620G. Looking at our spec sheet a few paragraphs up, it becomes evident, that this performance increase doesn't come from either shaders or texture units. The big difference maker here is the GPU Turbo. As AMD detailed on the Turbo slide, when running graphically intensive workloads the GPU clocks up to 685MHz. That's a 54.3% clock speed difference compared to the Llano model it is compared to. The remaining 2% can be attributed to efficiency improvements of the VLIW4 architecture and possibly the higher CPU speed.
This brings us to the interesting part of this preview - performance projections. Trinity seems to scale almost linearly compared to Llano in 3D performance. Of course memory speed has to be taken into account, as Llano's GPU performance scaled nicely with faster memory. As things look, Trinity will be no different. We expect to see 3DMark Vantage scores of around 5550 to 5750 and 3DMark 11 scores of 1400 to 1490. This is for the maxed desktop SKU having a GPU clock of 800MHz (assuming it won't be able to boost GPU clocks even further) and a memory speed of DDR3-1600. This is up ~33% from Llano. One the notebook side of things the gains will be more in the 50% range due to GPU turbo. The same should be true for 65W models on the desktop.
AMD promises a "Premium High-Resolution Display Experience". Can it deliver on that promise?
Real world game performance may vary, as different engines behave differently given certain GPU characteristics. This needs to be evaluated by reviewers in different settings. Even though it packs quite some punch, Trinity won't bring Full HD gaming to APUs. Some less demanding games might run decently smooth on Trinity in 1080p, but the vast majority still needs to be played at 1280x720. Some games might afford 1680x1050, which is popular among budget gamers. At least in 720p Trinity should allow to experience the games in their full DX11 glory.
We indicated earlier that the launch on Tuesday focuses on mobile SKUs only. For the desktop Trinity should emerge only in the third quarter, with the public announcement on Computex. As a huge chunk of the market for APUs is notebooks, this is a sensible approach. Also it should give Intel's Ivy Bridge a run for its money. AMD directly targets to compete with Intel's 17W Ultrabook SKUs and big OEMs like HP already announced very sleek designs based on Trinity. AMD wants to move these ultrathins (they are not allowed to use the Ultrabook moniker, as Intel trademarked it) more towards the mainstream price band, which AMD considers to span from $400 to $700.
AMD, Advanced Micro Devices, GlobalFoundries, 32nm, 28nm, APU, Brazos, Brazos 2.0, Trinity, 32nm SOI, Bulk, Fusion Center of Innovation, AMD Ventures, GPU, CPU,
© 2009 - 2011 Bright Side Of News*, All rights reserved.