Just as Intel rules the high-end CPU market, nVidia holds a Microsoft-like dominance in the workstation GPU arena. Here, it’s not just the raw GPU speed, but the card-level optimizations like gigantic memory for visualization of large datasets; properly certified and tested drivers and application environments for precise error-free OpenGL display of complex engineering objects in real-time 3D; and, of course, well tuned OpenGL capabilities beyond what you usually see on consumer-class GPUs. And, oh yes, Linux is just as important, if not more, here than Windows – look at how many professional engineering, scientific and multimedia apps work under Linux, and often faster due to more direct OpenGL pipeline [advantage from which Microsoft walked away in Vista and Win7].

So, if involved in high-end visualization for engineering, architectural, scientific or multimedia use, the added price of Quadro is justifiable as its certified application behavior, on top of a relatively small totally system price percentage, buys you not just performance but also peace of mind.

While ATI has made some progress in getting into this market, nVidia still firmly leads here, as Jon Peddie Research, IDC and other research companies would say. At the same time, Intel’s new Nehalem-EP "Gainestown" Xeons provide enormous memory bandwidth coupled with octal-core, 16-thread CPU power [you can read our four part review of Nehalem-EP W5580, the 3.2 GHz processor here: Part I, Part II, Part III, Part IV]. Thus, we decided to combine the two heavyweights and configure a workstation with unrivaled performance. From one side, we fitted the nVidia Quadro FX5800 4 GB professional GPU cards – these cards bring the most video memory per GPU out of all boards in the market and that is looking to stay throughout the rest of 2009, until the GT300-based successor arrives, probably in April or August 2010. Boards are based on 55nm GT206 nVidia GPUs, and the odd part about them is their power consumption. Thanks to advanced idle power techniques, Quadro FX 5800 consumes less power than GeForce GTX 280 cards, even though they have four times the memory. Bear in mind that you should only use Quadro FX5800 graphics card with a 64-bit operating system. The card itself has more on-board memory than 32-bit OS can address in general, as this card requires at least 4GB of system memory as well. Thus, for best peformance, Quadro FX5800 is targeted for systems with 8GB memory and above.

nVidia Quadro FX5800 4GB... times two.
Quadro FX5800 4GB SLI: 480 cores, 2.12 TFLOPS, 103.4 GTexel/s, 8GB RAM, 204GB/s bandwidth…want more?

In order to feed the graphics cards, we needed the best workstation CPU out there, and the answer was quite easy: two dual Intel Xeon W5590 3.33 GHz processors. Intel’s Turbo step-up is enabled and cranks the cores to 3.47 GHz! The third part of our configuration is whopping 48 GB of Samsung’s DDR3-1333 ECC RAM on the Supermicro X8DAi or Tyan S7010 motherboard. The thing is, if you decide to overclock the Xeons [just like the W5580 processors, the W5590 come with an unlocked multiplier], Tyan S7010 motherboard is the one for you. If you prefer to keep it at default clock, than Supermicro X8DAi motherboard is a good option too. We did most of our tests on Supermicro’s baby, and then switched to Tyan S7010 for maximum performance testing.

Intel’s Turbo mode is heavily reliant on internal sensors that keep the temperature in check, thus for high performance testing, we used Asetek’s silent yet powerful LCLC sealed liquid system, and high-speed fans on the radiator, keeping the overall machine temperature cool and enabling Turbo mode to do its magic on all eight cores.

Supermicro case and motherboard play a host for Intel Xeon W5590, fastest workstation CPU on the market, and two Quadro FX5800 cards
Supermicro’s case can fit up to four GPUs… note the X25-E SSDs hidden in the hot-swap 3.5" bays

Wow, this is a powerful beast – it could easily cache a big chunk of Google Earth at high zoom level in the local system memory, and display it in full real time as far as the eye can see :)

BSN* Ultimate Workstation Configuration – Air-cooled
2x Intel Xeon W5590 3.33 GHz CPU
2x nVidia Quadro FX5800 4GB GPU
48GB Samsung Registered ECC DDR3-1333 SDRAM [12x4GB]
Supermicro 8HDAi motherboard
Supermicro SC747TQ-R1400B chassis
256GB Intel X25-E Enterprise SSD [4x64GB in RAID0]

BSN* Ultimate Workstation Configuration – Liquid-cooled
2x Intel Xeon W5590 3.33 GHz CPU
Asetek LCLC liquid cooling for Xeon-DP system
2x nVidia Quadro FX5800 4GB GPU
48GB Kingston KHX1600C9D3K3/12GX, Non-ECC DDR3-1600 SDRAM [4 Kits]
Tyan S7010 motherboard
Supermicro SC747TQ-R1400B chassis
256GB Intel X25-E Enterprise SSD [4x64GB in RAID0]

Now, one may ask, what’s the real benefit of using the Quadro vs. a similar generation GeForce in the professional apps? Well, the full enhanced OpenGL functionality in the Quadro series, enabled and dearly paid by nVidia through certification processes with all major application vendor (yes these things can cost hundreds grand per app), does make a great difference in some of the programs – see the accelerated AutoCAD 2010 functionality, for instance.

In the Part 2 & 3 tomorrow and the day after, we’ll look at the benchmarks for both usual stuff like SPECviewperf 10, CineBench R10 OpenGL and 3DMark Vantage, plus more specific applications, like AutoCAD 2010 and Lightwave 3D 9.6, and the benefits of using Quadro in these apps beyond just performance.  I decided to give Quadro some serious competition using the factory-overclocked GigaByte GTX285 card with a respectable 2GB RAM, the highest you get per-GPU in the consumer market. By minimizing the memory size difference, I hope to see how much the added OpenGL enabled functionality and driver optimizations help Quadro win on SPECviewperf 10 benchmarks, as well as DirectX benchmarks like 3DMarkVantage in the extreme mode, of course.

The Part 4 will include a test of TWO Quadro FX5800 cards on a branded Xeon machine, a requirement due to NVidia Quadro SLI operational rules. Namely, only a handful of certified branded system configurations are allowed to run two Quadros in SLI mode. While it is perfectly understandable due to the high system performance and bandwidth demands of a high end Quadro SLI setup, I believe that NVidia should allow "uncertified SLI with disclaimer" mode for Quadro users in general to experience the speed up, which for a polygon-rich engineering app can actually be higher than in a game! As more performance means saved time, it would be a good tool for NVidia to widen its sales scoop for multi-Quadro SLI workstations. After all, at US$ 3 grand a piece, these aren’t cheap cards, and two per workstation is even better :)