AMD's Radeon HD5970 Hemlock - Is Wickedly Fast
11/17/2009 by: Sean Kalinich



Not too long ago AMD took the lead in the GPU world, it was not a massive performance lead at first. No, it was merely a simple yet massively important move. This was the move to 40nm. This move was the opening move a new game of chess between AMD and their rivals, nVidia, and it spelled out something that was either overlooked or ignored.

The company wasn't afraid of jumping the gun to the latest process note, and just as AMD skipped 65nm and went from 80nm to 55nm for their Radeon 3800 series, AMD worked in secrecy to push out a new generation of GPUs on TSMC's new 40nm process. These would become the 5800 series - the first DirectX 11 GPUs in the world. The 5800 series also brought the speed crown back to AMD. For the first time in a long time AMD did not sit back and wait. Instead they forged ahead and are now pushing out the fastest single card on the planet. This comes to you in the form of the Dual 5870 GPU card, formerly known as Hemlock. AMD decided to change the name from the expected 5870X2 and instead named the product ATI Radeon HD 5970. We took one of these for a quick spin and are eager to tell you about it.


Click to Enlarge


New and Improved
The 5970 represents an improvement over previous generations of "X2" cards. For starters, AMD has [like nVidia] moved away from the X2 designation and we would expect to see AMD's CPU's ditching the X2/X3/X4 nomenclature. They have also changed some things under the hood as well. For starters they approached the 5970 like they should; it is an enthusiasts’ card after all and deserves a little more attention. Ultimately, AMD developed a product in a 300Watt envelope [8+6-pin + PCIe x16 slot] but knowing that it would be pushed beyond this by many gamers they made sure it was capable of taking a 400Watt load. In fact, in a conversation with Devon Nekechuk, Product Manager for Hemlock and Jay Marsden, PR Manager for Graphics at AMD; we were carefully explained that while they have designed it to run at 400Watts they have not validated it and that running it at that spec will overdraw the PCIe specifications. If you have a top-of-the-class PSU that allows Amperage overdraw [check out some overclocking forums for these ones, we know that the Silverstone Zeus 1200W is one of those "golden babies", Ed.] In fact, AMD told us that the highest successful current draw was massive 450W but that is probably in the domain of LN2 cooling and gunning for world's fastest 3DMark scores. 

   
Click to Enlarge - Opens new Window


To overcome this thermal challenge AMD resorted to cherry picking the ASICs that go on each of these cards to get the highest speed bins [tested for leakage and speed], have thrown on Volterra programmable digital power regulators, real time power management features, crazy expensive ceramic "SuperCapacitors" and GDDR5 that is rated at 5Gbps. They also improved the cooling by using the vapor chamber on full length of the card to allow it to handle up to a 400Watt heat load. The ATI Radeon HD 4870X2 came with only a small vapor chamber cooler on the GPU0, while the GPU1 used regular copper cooler. With Hemclock, AMD really did their homework and have put together what looks like a highly overclockable card. In fact they refer to it as "Unlocked". Due to time constraints we were not able to run any overclocking tests on our 5970 but will be following up with a deeper look into overclocking, power and heat with this new card.

   
Click to Enlarge - Opens new Window


Speaking of power AMD has not forgotten you there. As we mentioned they designed the HD5970 with a 300Watt envelope this means that at stock speeds under 100% load it should not pull more than about 275-285 Watts at 23 Amps. When the card does not need to pull this much juice from the wall it is capable of actually shutting down one of the GPUs by putting it into a sleep state. This handy little feature brings the idle power draw down to about 42Watts. This reduction in power [and current] draw makes for longer product life.

One other improvement that is very welcome is that you no longer have to force Crossfire on the card. With the old 4870 X2 and 4850 X2 you had to enable Crossfire on the card to make sure you were using both GPUs. Yes with later revisions of the driver it would do it automatically but many times this ended up having to be disabled and then re-enabled to make sure it worked properly. Now with the new HD5970 it simply works in Crossfire mode right out of the box. In fact the CCC [Catalyst Control Center] only sees it as a single card, you cannot turn Crossfire off.

Last on the list of new features is unfortunately one that we will have to test out for you later. According to AMD Crossfire is now enabled for Eyefinity. Yep you can now use up to three monitors for your gaming pleasure and still get the benefits of both GPUs. Note that this feature is not enabled for discrete cards in Crossfire, i.e. it doesn't work on two 5850s or three 5770's. We weren't given the exact Catalyst version when the "Crossfire <3 Eyefinity" feature will trickle down to discrete variants.

In all not a bad list of improvements at all.

The full specifications of the HD 5970 compared to the rest of the 5800 family are shown below.


Click to Enlarge - Opens new Window



Performance

Test systems and comments

We are going to break our usual review style here and head right into what you want to know about. How fast is the new HD5970? Well there are a few things we have to get out of the way first but we will be diving into that in just a second.

First you need to know the system that all of this was run on.

Intel Core i7 Extreme 975 [3.33GHz] [provided by Intel]
ASRock X58 Extreme [P130 BIOS] [provided by ASRock]
6GB Kingston KHX12800D3T1K3/6GX [provided by Kingston]
128GB Patriot Torqx SSD [Provided by Patriot]
Ultra X4 1200Watt Fully Modular PSU [Provided by Ultra]
Microsoft Windows 7 Ultimate x64 [With all patches up to 11/12/2009]

AMD GPUs and drivers used
Radeon HD 5970 [provided by AMD] 8.663.1_Beta4_Hemlock_VistaWin7_Nov6
Radeon HD 4850 X2 [provided by Sapphire] Cat 9-10
Radeon HD 4890 [Provided by Gigabyte] Cat 9-10

nVidia GPUs used
Zotac AMP! Edition GTX 285 [clocked down to stock speed for reference] [provided by Zotac] Forceware 191.07

   
Click to Enlarge - Opens new Window


Unfortunately at the time of writing we were not able to test with a GTX 295 [a closer product to the HD5970] or another dual card configuration. Our reference motherboard and a set of Zotac GTX 280s was damaged by a Topower Tiger 1200Watt PSU [all three were completely destroyed] and we still have not been able to resolve that issue with them. In fact we have not heard from them on the matter since we asked for independent testing of the cards and the PSU [at their suggestion] on October 22nd. We will try to get some additional GPUs in for future tests but were unable to in time to meet the lifting of the NDA.

For the performance section we are not going to go into each game like we normally would but will just comment on the results. Instead we will cover some general information here. We used a group of Synthetic and real-world tests. All of our real-world testing was actual in-game testing. To accomplish this we used FRAPS version 3.0.1 to capture frame rates during actual game play. For each game we selected a level and ran that same level through three times. The scores for minimum, maximum and average were recorded. The average of these three runs was used for our final result. Settings for each game are shown below.  The Synthetic tests will be covered before each with a brief description where needed, but otherwise each is very self-explanatory.

Now on with the testing results!

Gaming
This is what you really want to know about so let’s get into it.

Call of Duty 6: Modern Warfare 2

 


Click to Enlarge - Opens new Window


Impressive is a word I would use here, overkill is another. Playing MW2 on the HD 5970 was quite a treat. In fact I ended up finishing the game; all in the name of testing of course.


Crysis

 


Click to Enlarge - Opens new Window


Ah the age old question, Will it play Crysis? I believe it was first postulated by the greeks… um I meant geeks. Well the answer to that question is Hell yes. I was able to max out the game in terms of settings and resolution and still get 48 Frames per Second [more than double what any of the other cards we tested could get]. Finally, two years of toil we have a card that can Play Crysis in DX10 mode without dying or stuttering! Then again, similar story happened with the original Far Cry too.


Borderlands

 


Click to Enlarge - Opens new Window



Wow color me surprised, but the GTX 285 is able to eek out a win here. Granted the margin of victory is only 0.190 frames per second…, ok let’s call it a tie but still an interesting happening to be sure. It looks like the game is loaded with long shaders which play into hands of 240 cores of GTX 285, instead of 640 "fat" units inside Hemlock's two GPUs.


BattleForge [Play4Free]




Click to Enlarge - Opens new Window


BattleForge is an EA title heavily sponsored by AMD, in fact during the opening "credits" we are treated to a nice AMD animated banner just like the annoying TWIMTBP banners from NV. As such it is no wonder we see the GTX 285 simply trounced here. This test was hands down an embarrassment for the 285 and that is putting it midly.

Then again, this on-line strategy carries the title of world's first DirectX 11 application.

Dragon Age: Origins




Click to Enlarge - Opens new Window

Dragon Age: Origins is the latest hit title from Bioware. From the looks of it, it seems to favor multiple GPUs quite nicely, as we see both the 5970 and the 4850 X2 perform exceptionally well in this game. The margin of victory here is simply mind numbing.


The Synthetics
These are demos that are all static in nature. As such they are easily repeatable. New ones that we have added will be briefly introduced.


3DMark Vantage
This test really needs no introduction; if you do not know what this is then you are probably reading this by accident.


Click to Enlarge - Opens new Window


The HD5970 achieves a score of 21,000+ without breaking a sweat. This is a score that I can remember pushing for with multiple cards not too long ago and is the fastest single card stock sore I have ever seen. I would love to see what a pair of these could do with a little overclocking. As you can see in the graph above, there are two instances of the GTX 285. One instance of the test has PhysX enabled while the other does not, which explains the 49,000+ CPU score.

Unigine – Heaven DX11 Benchmark -
The first DX11 benchmark Unigine’s Heaven bench is visually impressive even if the audio loop does get annoying. It is very good for seeing the differences between DX10 and DX11 rendering as you can see from the images below.

 
DirectX11 rendering by Unigine's Heaven Bench
 
DirectX10 rendering by Unigine's Heaven Bench
Click to Enlarge - Opens new Window




Click to Enlarge - Opens new Window




Simply put the raw power of the HD 5970 flattens the rest of the field in DX10.



Furmark
Just like 3DMark, this benchmark really needs no introduction. In fact, there is only one introduction this benchmark needs - Dave Baumann of AMD called this benchmark; "power virus".






Click to Enlarge - Opens new Window



In FurMark the HD 5970 still wins the game but is not too much further out than the GTX 285 [although it still wins handily]. At the same time is scores a little less than double what the 4xxx series cards do.


S.T.A.L.K.E.R. Call of Pripyat Benchmark
This one is a static test of scene rendering using the DX11 Engine for Stalker Call of Pripyat [COP]. It has some interesting options and features. It is able to show off the improved efficiency in DirectX 11 very well even at high resolutions.

 
Click to Enlarge - Opens new Window




Click to Enlarge - Opens new Window

01 = Ultra Preset 1920x1200 DX11 render 4xMSAA SSAO=HDAO SSAO=Ultra CSVersion Tesselation and Hard Shadow
02 = Ultra Preset 1920x1200 DX10 render 4xMSAA SSAO=HDAO SSAO=Ultra CSVersion DX 10


Here we see something very interesting. The GTX 285 cannot keep up with even the older HD4850 X2. This is not entirely unexpected as the developer was obviously creating the code to run on as much MADD units inside AMD's Hemlock GPU as possible, so here we have the difference between 1600 and 3200 fully utilized shader cores and here 240 shaders on nVidia's behalf simply have nothing to keep up.


DirectCompute Benchmark
DirectCompute is new part of the DirectX 11 API that ships with Windows 7. This allows for the use of compute-shaders to provide parallel processing power for GPGPU computation. What does that mean? Well it is a simple way of saying that DX11’s DirectCompute grants more access to developers for utilizing the power behind todays’ modern GPUs. To show this off we used the Direct Compute Benchmark, created by a forum member of NGOHQ.com.




Click to Enlarge - Opens new Window

As you can see the ability of the HD5970 to perform compute functions massively exceeds even the horsepower of our i7 975, but it is still unable to reach the performance of the nVidia GTX 285. There are a few reasons for this we will cover one of them now and leave the rest for a later article. AMD’s design is a Vec5D this means there is one "fat" and four "lite" units inside. The fat unit is MULL and the lite are MADD. What this means is that AMD’s claim of 1600 Shaders per GPU is really 320 Vec5D units [320 fat and 1280 lite] so when you run something like the DirectCompute bench which loads the most complex units - you are only using 320 shaders [640 if you are actually using both GPUs which in this case you are not] as the lite units cannot handle the complex instructions.  This is compared to the 240 on the GTX 285, these can handle both complex [fat] and simple [lite] instructions. Now I know you are thinking that AMD still has more of them and you are right, where the difference comes in is in clock speed. The GTX 285 [at stock speeds] has a graphics clock of 648MHz and a shader clock of 1476MHz this means that the 240 shader cores are running at roughly twice the speed of the 320 on the HD5970.  Fortunately for AMD the people looking to pick up one of these will probably not be interested in using it for GPGPU work but for some serious gaming.


The rest of the story
So now you know a little about what is new and can see a good base for the type of performance you can expect from the new AMD Radeon HD 5970. But as with most things there is always more to the story. We will try to cover that here.

Price
The Radeon HD5970 is not going to be a value card and rightly so. It is a top-end part in all respects. It has the best speed, performance and power usage in its class [granted it is the only one in that class but still]. If you want to buy a Ferrari, you can always go for a used one, but if you want a custom-tailored one, with you deciding even the stitching, that's when you cash out. However at $599.99 I am not sure it is a great deal. Before you jump on me about that comment hear me out. Our problem is not that the 5970 is not powerful, or fast, or a nicely made product it; is that in a couple of months that price will drop. It will drop and probably drastically, so a consumer that buys right now will lose money on the purchase. This is not AMD’s fault - after all they have to compete; no this is a fault of the market and the rapid refresh of products. When you buy a top-end product you can be guaranteed that you will lose money on the deal when it comes to GPUs, there is just no doubt about that in the current market. This little fact and the looming launch of consumer-class NV100 GPUs from nVidia [even if it comes in late 1Q'2010] will impact the price of the HD5970 and force AMD to drop the prices on the HD5970 again to compete with nVidia. Of course you could argue that this is a part for the enthusiast and as such should command top dollar. Again, for us we are just not sure we can call the HD5970 a good deal despite its outstanding performance.

Our professional advice, as always is the same: when a new API appears, always go for the high-end part because it warrants you a safe life until the new API kicks in. This was the truth for DirectX 8 [GeForce3], DirectX  9 [Radeon 9700], DirectX 10 [GeForce 8800] and pretty much it is the same thing now. In 3-4 years time, your high end part will still have more horsepower than low-end parts and that's the way GPU food chain works.

Availability [or Robbing Peter to Pay Paul]
We asked AMD directly about launch availability and were told that on launch day that "thousands" would be on the shelves and available for purchase. I then asked that since the HD 5970 was made up of hand-picked 58xx ASICs how would this impact HD5870 availability and Devon told us that the HD 5970 will not affect either 5870 or 5850 volume and supply. We were further told that we should expect HD 5870 and 5850 volume to slowly return to normal levels of availability. It was this comment that stood out in my head and in a moment of clarity I realized that AMD had robbed the HD58xx store to stock the HD5970 one. The shortage over the last couple of months was not [only] about TSMC Volume. It was about stocking up for the HD5970. This was an amazingly clever move on AMD’s part. By pulling from existing stock of 5870 parts to build a surplus of 5970s they have helped to maintain demand and ensure they have sufficient product for a hard launch of the 5970. They know that there is not going to be a major rush on the 5970 so they can run on the existing stock for a couple of months while they rebuild their stock of 5870s and 5850s. This way, in a few weeks, they can have available stock of all three products and even out distribution to cover demand better. Great move for the AMD beancounters!

Naturally, not so great about AIBs that are now openly hostile towards AMD. In conversations we had with their partners, it looks like the olive branch was the tiny fact that most of HD5700 series goes to AIBs, who are focusing on these boards instead of 4800 [stock robbed by Apple] or 5800.

DX11 gaming
As mentioned, the HD 5970 is a DirectX 11 part. Now before when DX10 came out AMD was one of the first out of the gate wit DX10 compatible parts complete with working drivers [Edit nVidia actually had the first DX10 card on the market with the 8800GTX. However, they did not have DX10 drivers for it until a full two months after launch. Even then they did not have full and working DX10 support until around the launch of the 2900XT] that would use the new features of DX10. Unfortunately DX10 adoption was not as fast as it could have been. Too many game developers were not interested in DX10 and still coded for DX9. This combined with a very poor adoption rate for Windows Vista made DX10 a losing battle for both AMD and nVidia. It was not until DX10.1 hit that we really started to see games that took advantage of DX10 and DX10.1 [again AMD offered support for DX10.1 while NV lagged behind]. With DX11 things are a little different. At the time of this writing there are already two DX11 titles on the market [Stalker: Call of Prypat and BattleForge] with a third planned for December 1st [DiRT 2]. We even hear that in 1Q 2010 there will be three more including one triple A title and a candidate for Game of The Year 2010. The DX11 games expected to be released in 1Q 2010 are Battlefield: Bad Company 2, Aliens Vs Predator and Lord of the Rings Online. This is an excellent adoption rate for a new API and shows that we can expect great promise from Windows 7 and DX11.

Eyefinity, OpenCL and Direct Compute
I put these three last as they are only of marginal significance to gamers and then only Eyefinity and Direct Compute are of consequence right now. Eyefinity allows for the use of up to three monitors in an extended desktop mode using one board. This on the surface is nothing new; after all I could do that with my HD4850 X2 in fact I could run up to four on that board. But there Eyefinity differs is in its board range of possible configurations. Before you were limited to very simple choices, either you extended your desktop or you mirrored it. Well with Eyefinity you can do more than just that. You can actually have two monitors that are spanned and one that is independent but have both be an extension of the other. Examples are shown below.


Click to Enlarge - Opens new Window

We asked AMD about supporting 6 displays on a dual GPU card and they said that at this time the only currently available option is up to 3 monitors with the exception of the 5870 Eyefinity 6 edition.


OpenCL is an open computing language for the GPU. This allows your GPU to become a massively parallel computational engine. AMD offers support for this on the 5xxx series. Some of the uses for this include rendering, video and audio transcoding, and other media related functions. This new standard offers great promise and you should expect to see more software take advantage of its capabilities in the near future.

DirectCompute is, as we briefly covered above, part of DX11 and Windows 7. It is, simply put, a way to unlock the potential of the new design on GPUs to process heavy computational loads. It can be applied to many gaming features including AI calculations and Ray Tracing [for future games]. It offers expanded potential for future games and software. Unfortunately AMD has a slight design hindrance here, as we mentioned above if the OpenCL or DirectCompute instructions are complex the AMD GPU is limited to the number of shaders that can execute them [320 down from 1600 per GPU]. This puts them at a slight disadvantage to nVidia unless the work load is coded to use simple instructions where the full shader count can be used to process the code.

Conclusion
The AMD Radeon HD5970 is one wickedly fast card in all, but a in couple of tests it simply stomped on everything else. In many cases it was able to double the performance of our nearest competitor. Granted we did not have nVidia’s GTX 295 handy [which would have been a better match and test] but still we can see the potential for the HD5970 here. There were a few downsides to the 5970 though. In the game of GPGPU AMD is still lagging behind and unfortunately has not caught up just yet. This is an issue with the basic design of the GPU and not something that can be corrected with drivers. There is a solution though. AMD needs to work with developers to optimize the code for the AMD architecture. If they can get developers to write simple instructions they will be able to "unlock" those 1280 extra shaders for computational power. This should change the game for AMD and the way their stream processors perform. Fortunately for the market that will be looking into the HD5970 computational power is not high on the list of needs. But there is another side to that design that could come into play. As games become more complex the instructions sent to the GPU will become more complex as well. If AMD can again convince developers [like they did with Battleforge] to code small and simple instructions instead of large complex ones then their GPUs will consistently outperform the lower shader count nVidia cards. The same is also true of Physics [not PhysX] one of the reasons that we have not seen it fully functional on AMD GPUs is the complex nature of the Physics code used at the time of this writing. The code would have to be broken up into smaller and more manageable units to run efficiently on the AMD GPUs.

But enough of the doom and gloom, we cannot call the HD5970 anything other than what it is; the fastest single card manufactured to date. Its image quality was superb, its performance clean, powerful and fluid. It is one seriously quick graphics card and is worthy of our Editor’s Choice for the Prosumer/Enthusiast category without doubt. This is a heavy contender for GPU of The Year award.





Tags:
ATI Radeon HD 5970, AMD, Advanced Micro Devices, Radeon, ATI, ATI Technologies, HD5970, Dual GPU, Hemlock, Crossfire, PCIe, PEG, Graphics, GPU, GPGPU, cGPU, DirectX 11, DX11, PLX, 300W, TDP, thermal design power, overclocking, Vapor Chamber, 400W

© 2009 - 2010 Bright Side Of News*, All rights reserved.