How AMD's Fusion A8-3850 APU Changes Personal Computing
8/25/2011 by: Marcus Pollice
With the introduction of the A-series APU also known by it's codename Llano, AMD brought their Fusion technology to mainstream notebooks and desktops. In this review we will take a closer look at the desktop version of Llano. First we will go into detail about the technical side of things, then we will continue with various benchmarks and measurements and evaluate the APU as a whole.
Llano marks the introduction of a Fusion product for mainstream PCs and notebooks. The first Fusion designs codenamed Ontario and Zacate targeted ultra-mobile and embedded usage models. Thus, AMD came up with a dedicated low power architecture, in some ways similar to Intel and their Atom line of CPUs. For the mainstream market this architecture would not really be suited for performance reasons, so AMD simply took their K10.5 architecture, which is used up to this point in their mainstream and high-end offerings for desktops and servers.
For the GPU part of Llano, AMD took the Redwood design, which was employed in the Radeon 5550, 5570 and 5670 products, all of which were launched in early 2010. As a descendant of the Evergreen-generation, the GPU features AMDs VLIW5 architecture. However, it was updated with the most recent UVD3 video decoder, adding support for 3D Bluray, MPEG2 and MPEG4 ASP (DivX / Xvid) bitstream decode. AMD also implemented more aggressive power gating than in the original Redwood design to optimize power consumption. Instead of the usual memory controller in the GPU, AMD designed special interfaces to the northbridge and the coherent request queues.
The CPU part did only receive minor optimizations. AMD claims a 6% IPC increase over previous K10.5 CPUs, which can mostly be attributed to the larger L2 cache, which is now 1MB per core. That is up 50% from the Propus core used in Athlon II X4 models. The DDR3 memory controller got updates as well and now supports speeds up to DDR3-1866. This speed is only available if you use no more than two DIMMs. DIMMs with capacities up to 16GB are supported, so Llano tops out at a maximum configuration of 64GB RAM.
We earlier mentioned that the GPU has two different interfaces to other parts of the chip. The direct link to the northbridge is used to access the UMA memory, that is dedicated to the GPU at boot time. The size of this UMA memory can be configured in BIOS, with options being 256MB, 512MB and 1GB. The other interface is used to access general RAM and the address space of other devices. Basically all communication from the rest of the system to the GPU is handled this way.
Platform-wise there are a few changes too. Ever since the introduction of the K8-based Athlon64 CPU, AMD employed a HyperTransport link to a northbridge providing a PCI-Express interface to dedicated graphics. This northbridge was either accompanied by a southbridge providing storage and other connectivity or it was directly integrated into the northbridge as seen on some NVIDIA chipsets. With the integration of most of the traditional northbridge features into the CPU this kind of setup no longer makes too much sense for notebook and desktop platforms.
Instead of a HyperTransport interface, Llano directly integrates a PCIe interface. 16 lanes of it are reserved for a discrete graphics card and the remaining four are used to connect to the Fusion controller hub. This connection is actually called Unified Media Interface, but the technology behind it is basically PCIe. This is remarkably similar to what Intel did with their Sandy Bridge generation. For the controller hub, there are two options available, the A75 and the A55. The A55 only provides SATA 3Gb/s support, while the A75 allows for 6Gb/s over the six ports it features. On top of that it integrates 4 USB 3.0 ports, something AMD is very proud of and even Intel has to acknowledge. The chipsets are codenamed Hudson D3 (A75) and Hudson D2 (A55) respectively which are manufactured at a 65nm process at TSMC.
The Llano chip is manufactured using Globalfoundries' 32nm SOI process in Dresden, Germany. There are a few remarkable things about that. Llano is actually the first design that entered volume production on this process and started shipping in the second quarter of this year. Also it is the first time AMD managed to manufacture a relatively current GPU architecture on a CPU manufacturing process, all integrated into a single die. The Bobcat-based Fusion products launched earlier this year were all manufactured on bulk silicon. The Llano die measures 228mm² and consists of roughly 1 billion transistors.
AMD supplied us with their top of the line A8-3850 APU, which comes with the CPU clocked at 2.9GHz and the GPU including all 400 shader cores clocked at 600MHz. Similar to previous AMD CPUs, the APU will also reduce it's operating frequency and voltage to save power, when it is not fully utilized. The following table lists all available P-States and the default voltages used for them.
As you can see, the default operating voltages are quite high, considering the chip is manufactured using the 32nm manufacturing process. As we have published earlier, the chip lends itself to heavy undervolting. We'd like to note, that several tools can't properly read the core temperatures of the chip. According to Tamas Miklos, programmer of AIDA64 at FinalWire, at this point there is nothing that can be done about it. He told us, that the readout basically works the same way as for previous CPUs from AMD, but in this case the values are not correct. He continued to point out, that this is not the first time something like that happens. AMDs Athlon 64 “Brisbane“ chip for example also reported wrong temperatures, which could be corrected by an offset.
Also, the readout of the properties and/or temperature of the GPU with GPU-Z was errornous as well at the time of the review. Since Llano was still very new, it could very well be that those issues will be corrected in future releases of this software.
Part of our Llano testkit was a Gigabyte A75M-UD2H micro-ATX mainboard. It is one of the more expensive Llano mainboards, which is due to it's very comprehensive feature set. With regards to display connectivity, there are all options available: D-Sub, DVI, HDMI and DisplayPort. When connecting multiple displays, please note that not every combination is possible. D-Sub and HDMI as well as DVI and DP are mutually exclusive. If you enable Dual Link DVI, all other ports are disabled. Also it is not possible to hotplug another display when the computer is running. Changing the display connection requires a reboot.
The mainboard supports controlling the fan speeds if the respective setting is enabled in BIOS. Whether you use 3-pin or 4-pin fans doesn't matter, they will be automatically detected.
Updating the BIOS was quite easily done using the Q-Flash utility, which can be invoked via F11 at bootup. The BIOS file simply needs to be put on a USB pen drive and can be selected from within Q-flash. The BIOS contained a plethora of settings which allow to tune almost every aspect of the platform. There are a few options that don't work properly though. There are settings to increase the multiplier of the CPU and the GPU clock beyond the default specifications. This only amounts to higher frequencies being reported by tools like CPU-Z, but the chip will not perform any faster. However, we'd like to note that AMD readies a unlocked Black Edition A-series APU, where these options might come in handy.
With the F3 BIOS update, a new option to enable “C6 mode“ was added to the BIOS setup. Apparently this option enables the use of the C6 sleep state, as idle power consumption dropped by almost 4W after the BIOS update.
Kingston supplied us with a KHX2000C9AD3T1K2/4GX kit, a pair of 2GB modules specified to work up to DDR3-2000. This would be more than ample to supply the APU with the necessary bandwidth. However, with both the shipping BIOS (F2) and a beta BIOS (F3b), we weren't able to operate this memory at the DDR3-1866 setting. The slower DDR3-1333 and DDR3-1600 modes worked fine. After Gigabyte released the F3 BIOS update, also the faster DDR3-1866 mode worked without a hitch.
The memory SPD is programmed with conservative timings of 9-9-9-24 (CL-RCD-RP-RAS) at DDR3-1333 and 8-8-8-22 for DDR3-1066 for the default operating voltage of 1.5V. The SPD also contains Intel Exteme Memory profiles (XMP) for the high performance settings, which require a memory voltage of 1.65V but are unusable on an AMD based Motherboard/CPU. They contain programmings with CL9 up to DDR3-2000, CL8 up to DDR3-1776 and CL7 up to DDR3-1554. For DDR3-1333 there is even a CL6 setting available.
On Llano the XMP profiles are not supported by the BIOS, so we had to manually configure the timings to ensure best performance settings. This can be considered a minor hassle, but once you come up with a good working configuration you won't change the timings every now and then.
Overclocking Llano can be quite a daunting task. The only way to overclock Llano is via the reference clock, which is 100MHz per default (PCIe interface clock speed). When raising the reference clock, the clock of all interfaces like SATA will be raised as well, potentially leading to instability.
At certain clocks dividers kick in to bring the interface clock speeds back to normal levels. For example a setting of 120MHz might be unstable, but a setting of 133MHz might again work without a hassle. With a reference clock of 133MHz the CPU is already operating at 3857MHz and the GPU at 798MHz. Depending on cooling and the quality of the chip, this may or may not run stable.
In our own testing we weren't able to overclock the chip at all. This could be due to various reasons, which we weren't able to pinpoint because of time constraints. Our colleages at Overclockers.com list a few reasons why overclocking Llano might fail. However, we were able to reach a stable undervolt of -0.275V, which clearly shows that there is a lot of potential in either direction, so we assume something else was holding us back.
At this point we'd like to state, that we don't think the A-series APUs based on Llano are a particularly good choice for overclockers. Despite the theoretically superior 32nm manufacturing, the 45nm chips of the Athlon II and Phenom II lines overclock a bit better better on similar cooling solutions. If you are gunning for performance, you are most certainly not interested in the integrated GPU either. So if you want to overclock for greater performance, we'd suggest to get a Phenom II and a discrete GPU corresponding to your needs instead. For Llano we'd rather try to minimize power consumption by undervolting. As always, your mileage may vary.
While this test uses only DirectX 10, it's still a quite demanding test, especially for weaker hardware. Llano delivers excellent results considering it has an integrated GPU. It also becomes evident, that the APU is highly dependant upon memory performance. There is a big gap when going from DDR3-1333 to DDR3-1600. DDR3-1866 can further improve performance, but the advantage is not that prominent at that point.
This test shows a similar result as it's predecessor with respect to memory scaling. In general it is quite an impressive result, definitely the fastest integrated GPU to date.
AIDA64 Memory Tests
These tests show how additional memory bandwidth affects actual transfer rates. Also the latency is considerably lower as we move up in frequency.
Here we can see, that the memory also affects CPU performance. The impact is not as big as for 3D applications where the GPU is stressed too, but still these are nice gains nonetheless
As a rather memory dependant test, it is fairly expected that it scales very well as we improve memory performance.
We also tested a few actual games on the APU to see how well it fares in practice. In general we believe that Llano is very well suited for little less demanding games, which are played more casually. This is in line what the expectations that AMD has for Llano without any discrete graphics added to the mix. For example we had a good experience with MMOs like Guild Wars or World of Warcraft. But also current games like Dirt 3 ran reasonably well. More demanding games like Metro 2033 didn’'t work that well at all and likely required dual graphics with the addition of a discrete card.
While the GPU inside the A-series APU is quite capable, you have to make a few compromises regarding visual quality. We found that the optimal resolution for the APU is 1280x720 for gaming with high quality settings. If you want to go higher, you'd have to reduce visual quality settings to achieve playable framerates. Of course these things also depend a lot on personal preference. Just keep in mind that AMD can deliver a decent discrete GPU option starting at around $99.99 for the Radeon 6770.
World of Warcraft
We set the resolution to 1280x720, all the settings to the highest possible setting (High or Ultra respectively), except Shadows, which was set to 'Good'. We also enabled 4xAA and 8xAF. We used the Troll intro scene displayed after creating a fresh troll character as a benchmark sequence and used Fraps to measure the frames per second.
As you can see Llano is more than capable of delivering good performance in WoW in the graphics setting we used. Just consider that in crowded cities or raid content, the figures would go down a bit, so having some headroom is always a good idea. As with the other benchmarks we can see good scaling with memory. Especially the jump from DDR3-1333 to DDR3-1600 is quite prominent.
We set the resolution to 1280x720 and all the detail settings to High respectively Highest. 4XAA was enabled as well. We used the built-in benchmark test as a performance measure.
The A8-3850 delivers a very good show here. It might not be the most demanding game, but Dirt 3 still is a very recent title that can make use of DirectX 11 features. Again, DDR3-1600 delivers a lot of extra performance, while DDR3-1866 only has a minor incremental impact.
Looking at power consumption can be done in many different fashions. Historically most reviewers simply provided figures for idle and load power consumption. Load power consumption is sometimes divided between CPU, GPU and combined power draw. Load is usually generated using special programs that strain the respective hardware subsystems to its very limits, which means temperatures and power consumption is maximized.
We consider this approach as valid and will provide the results of our measurements. CPU load was generated using Prime, while GPU load was generated using MSI Kombustor.
As you can see the majority of the power budget is allocated for the CPU cores. This somewhat contradicts the design approach AMD outlined, though that refers to silicon space and not actual power consumption. Still it's something to take into consideration.
However we'd like to point out, that while this correctly shows what is going on at minimum/maximum workloads, it doesn't reflect power consumption in practice at all. Back at E3 AMD put up an interesting slide explaining that in a quad-core CPU, depending on the workload, all four cores are rarely used at once. According to the figures from AMD, even when you are doing video editing, no more than two cores are utilized over half of the time.
While one could debate about these very figures, the essence to take away is, that often less than four cores are utilized and might not be loaded to 100%. AMD used it to explain how the turbo feature comes in handy. We use it here to make a point towards real world power use. Since not all cores are usually loaded to the max, the power figures are much lower for practical workloads.
Therefore, we will informally give a few additional power figures, to give you an overview on how Llano fares when used for various tasks. When browsing the web, the computer is usually not strained to it's limits. Even when using lots of tabs and watching 1080p videos on YouTube, Llano not only performs well, it does so with only 75W-90W of power draw. When playing actual games like World of Warcraft or Dirt 3, power consumption is in the 130W to 150W range.
After all the tests we carried out, the final question remains: Does AMDs A-series mainstream APU deliver? We think it does, at least in the target segment it is marketed for. The CPU delivers enough performance for the majority of users and the GPU lives up to the claim of discrete level performance, at least at the entry level. Compared to Intels Sandy Bridge lineup the AMD A-series APUs are not only cheaper, but also deliver a better rounded package, that is if you don't plan on using a discrete GPU.
Of course there are some things about Llano, that don't shine so bright at this moment. While AMD heavily touts GPGPU applications as one of the reasons this chip delivers superior performance, in practice we are not quite there yet when it comes to the availability of software, that uses GPU computing to it's full potential. If this were the case, Llano would wipe the floor with other CPUs in performance comparisons, which is not quite the case, except for anything 3D. There need to be more applications for the APU outside of MotionDSP’s software.
There is one point we couldn't stress enough regarding Llano: Don't be cheap on memory, you will regret it! As our testing has shown that going from DDR3-1333 to DDR3-1600 provides a tangible performance increase both for the CPU and GPU. DDR3-1866 improves performance even more, but the increase is not so prominent than the former. Considering current RAM prices, you should really get at least DDR3-1600 RAM in conjunction with Llano, it is worth the money. Regarding DDR3-1866 we have to say that due to the rather steep price increase it is generally not worth it at this point. In the future this could change of course, since DDR3-1866 is now a JEDEC-ratified speedbin and there are even faster bins planned.
Going forward, around the second quarter of 2012 the successor of Llano dubbed Trinity should already start shipping. It will feature two next-generation Bulldozer modules, i.e. 4 processing cores, but only 2 FPUs. The GPU will be updated to a Cayman-descendant based on it's VLIW4 architecture. Compared to the approach AMD took with the slightly upgraded Redwood GPU inside Llano, this strikes a bit odd, as there is no low-end GPU design with the Cayman architecture yet. This could very well change once AMD releases the 28nm Radeon HD 7000 series, which is expected to feature the same architecture. Overall the chip is projected to deliver a 50% performance increase over Llano.
As for platform compatibility, nothing is known at this point. Trinity is said to come for a FM2 socket, which is different than the current FM1. But it wouldn't be the first time, that AMD has managed to keep products compatible over generation – just consider the incremental updates to socket AM2 – AM2+ - AM3. It's just something I wouldn't expect at this point.
Editor's Note (Anshel)
Based on this review, we've concluded that we'd like to give AMD an award for their A8-3850 Llano APU by granting them our Editor's Choice for Mainstream award.
AMD, Llano, APU, Fusion, Llano APU, A8 3850, A8-3850, Memory, RAM, Kingston, DDR3, Dirt 3, Gigabyte, K10.5, DivX, Xvid, VLIW5, Zacate, Ontario, Propus, GPU, CPU, USB 3.0, USB, P-states, A75M-UD2H
© 2009 - 2011 Bright Side Of News*, All rights reserved.