If you followed AMD closely over the course of past couple of years, you know that the company is preaching the "Fusion" mantra for quite some time. However, AMD didn’t deliver Fusion APU [Accelerated Processing Unit] when it promised [May 2009], since the company decided to abandon its complete CPU roadmap back in late 2006. Given that AMD traditionally needs (at least) five years to deliver a new CPU architecture, that move.

Unfortunately for AMD, the company completely botched messaging to their partners and the media, who all thought Bulldozer is one architecture – in reality, Bulldozer V1 was scrapped for V2, which we could call pretty radical. Bulldozer and Bobcat are two new CPU architectures which will be combined with the Northern Islands GPUs but the first Fusion processor isn’t based on Bulldozer nor Bobcat CPU core, neither on Northern Islands GPU core. Meet "K10.6" i.e. STARS architecture [Agena/Deneb] gets its third manufacturing process – 32nm Llano.

Meet Llano’s 32nm CPU core – 9.69mm2 of smart processing goodness
In order to discuss the AMD’s Llano core, we spoke with a Mr. Sam Naffziger, Senior Fellow. For starters, we were explained that the x86 core AMD is talking about on ISSCC 2010 does not have a code name. We were told to use an "x86 core using 32nm SOI". We’ll use Llano and Llano core instead.

Single core inside AMD's Fusion APU: Llano brings 512KB L1 and 4MB of L2 cache, all compactly packed
Single core inside AMD’s Fusion APU: Llano brings 512KB L1 and 4MB of L2 cache, all compactly packed

Single x86 execution CPU core inside Llano comes at 9.69mm2 in size, packing 35 million transistors in the process. Given that Llano does not feature unified L2 cache memory [on AMD CPU designs, L3 cache is unified, L2 usually isn’t], the 35 million figure only include core logic and L1 cache in its usual quantity [64KB Instruction and 64KB Data]. Do bear in mind that going to 32nm enabled AMD to start putting 1MB of L2 cache per core, rather than 512KB of today and kill off the L3 cache. Sam told us that AMD improved the execution performance, "cleaned up the instruction set" and build Llano to feature "true power management".

" We’re particularly happy about the implementation we’ve pulled off on core power gating. This is a first for AMD – we’ve integrated the ability to completely disconnect any one or all of the cores from the power supply to essentially, cut the power consumption to negligible levels." However, the way how AMD achieved this is very interesting as "[AMD] didn’t need to add any special metal layers or leakage devices into the core level. Essentially, we’re taking full advantage of what Silicon On Insulator provides us – It’s a synergetic effect that is not possible on bulk technology and we’re using the package layers to send power around."
"Power aware clock grid design" term was also thrown around. According to AMD, 32nm Llano comes with a precisely tuned clock grid construction, claiming 80% reduction in clock grid metal capacitance – reducing the number of power buffers by half.

Very interesting look into how AMD enhanced the clock gate design - digital APM solved a lot of issues related with power saving techniques in the past
Interesting insight into how AMD enhanced the clock gate design – Digital APM solved a lot of issues related with power saving techniques in the past

Operational voltage is set between 0.8 and 1.3V in a very interesting way. Sam called this core "first true mobile design", as it uses very similar design guidelines as mobile GPUs, with multiple power planes able to shut off parts of the chip as they become unnecessary, all in an effort to reduce power consumption to a minimum. In order to be able to efficiently shut off power on parts, Llano features a Digital APM Module – power management controller which checks roughly 100 signals per core and depending on that data, cuts down the power to the parts of the chip which are not under load. This should enable AMD to finally challenge Intel for notebook dominance in terms of battery life. Given that AMD’s GPUs are much more potent than Intel’s, the company won’t have a tough time competing on that front. Then again, Intel has a somewhat unexpected partner in nVidia’s Optimus technology, which simply works.

Given that a single CPU core operates between 2.5 and 25 Watts, this figure is quite important. According to AMD, 25W figure is when the core is working at 3GHz, meaning a desktop design will probably go into the 125W Maximum Power envelope, with 25W being given to each CPU and one GPU core, while notebook designs will have to obey the 35W CPU + 35W GPU limits.

AMD Llano – Fusion takes shape
Even though AMD is not talking about Llano’s as a complete part – over the course of past few months, we managed to learn that Llano incorporates a graphics core from Evergreen family, not the Northern Islands one. This is equal to the policy deployed on the CPU side, using tried and tested components to produce an unique product. Explanation for that is really interesting – Llano will feature world’s first SOI [Silicon-on-Insulator] GPU, ahead of moving the GPU production from exclusively TSMC to a mixture of TSMC and GlobalFoundries as 28nm process takes off. AMD ditched the old concept of MCM [Multi-Chip Module] that Intel utilizes oh-so-well to maximize its profits [and yes, it works with no questions asked] and went for the native mono-die approach: one silicon to rule them all.

This single die will feature four x86 cores, 512KB L1 and 4MB L2 Cache, Evergreen GPU core, unified memory controller, and utilize the same CPU socket AMD is selling today. Compatibility may be an issue, as AMD will need wires to connect the display outputs or use a connecting chip with all the display logic.

In any case, Llano is looking quite promising – while it cannot compete with Intel in terms of CPU performance, packing a good GPU onto the same CPU silicon and more efficiently, having access to the latest manufacturing techniques from GlobalFoundries bodes well for AMD.