BSN* The Ultimate Workstation: Of Virtual Cities and Walkthroughs...
9/14/2009 by: Nebojsa Novakovic
Editor's note: Welcome to the second part in our Ultimate Visual Workstation series. You can read the first part of the series here: BSN* Presents The Ultimate Workstation.
In our search for the Ultimate Workstation, out of all components on the market, the choice boiled down to Supermicro's 8HDA3i, and Tyan's S7010. If you want to run system with dual graphics cards, Supermicro is the better choice, but the board is not without its negatives.
Since we couldn't enable Quadro SLI due to the certified system restriction [nVidia sells the Quadro SLI certification to selected vendors, just like SLI on P55/X58 motherboards, Ed.], I decided to try the FX5800, one at a time, on another Intel Dual Xeon Nehalem-EP motherboard, the S7010 from Tyan. It has only one PCIe x16 Gen2 slot, but more x8 and x4 slots for other peripherals.
But what makes this board special is one very important fact; S7010 will run the memory stable at DDR3-1333 even with all 12 DIMMs populated. This gives you an option to run 48 GB RAM at 1333 MHz i.e. 48GB memory at seven to nine cycles distance with almost 64GB/s of memory bandwidth. Since Supermicro doesn't [yet] allow DDR3-1333 operation, we cannot pronounce Supermicro as the best workstation motherboard out there - if you don't plan on GPU scaling, Tyan yields better performance. You think that's great? Well, watch out for Westmere-EP Gulftown based 6-core 12 MB cache Xeons some five months from now (or maybe earlier), you should be able to "officially" run all 6 memory channels on that baby at DDR3-1600 at least. And, the Quadro FX5800 had the sole long PCIe slot all to itself.
Compared to Supermicro, the Tyan board also adds useful on-board power and reset buttons, useful for open testing. We got the version with built-in SAS1068E controller from LSI, while Tyan's website lists S7010 as a Nehalem-EP board without the controller.
Now, this board, with two Xeon W5590 processors and "only" 24 GB RAM [had to share it with the Supermicro] ran the complete SPECviewperf 10 benchmark at 1280x1024 resolution and all the effects, in 64-bit multithreaded mode of course. Take a look at the results, these being pretty much record single-GPU OpenGL scores, compared with the same system but with Gigabyte GeForce GTX285 OC 2 GB: nominally faster shaders and not bad memory size, but no fine-tuned OpenGL. What happens?
No questions here, optimized drivers and the enormous 4GB frame buffer make up for "heaven and earth" sized difference in performance. Taking a look at FSAA performance [Full Scene Anti-Aliasing] reveals that even with 16x FSAA mode turned on, you don't experience a significant drop in performance:
FSAA results are in - no major performance drop in AA modes
In some tests, 16x AA came ahead of no-AA results, giving us a clear view into the shader power. The fact of the matter is that application optimization in professional graphics cards results in higher possibilities and miniscule performance drop.
For the end, we leave you with multithreaded performance index - good scaling! It is a shame that SPECviewperf feature only single, two and quad threaded benchmarks - Nehalem-EP platform could take use of both eight and sixteen thread benchmark mode. We hope that SPECviewperf 11 will utilize multi-core systems even further, but for now, here are the 4-thread benchmark results:
Multi-Threaded performance scaling reveals that Quadro FX5800 asks for even more CPU power...
CAD - Computer Aided Design
The search for the ultimate workstation has to feature an in-depth look into the CAD tools. In my case, I used 64-bit version of Autodesk's AutoCAD 2010. Nvidia has a specially optimized AutoCAD 2010 Quadro performance driver, which helps both 2D and 3D operations in the world's most popular CAD package. So, even though Autodesk went with Direct3D as the primary visualization rendered in this version, you can still get a substantial productivity benefit from Quadro usage on the large models. Its large 4 GB of frame buffer memory can help with display lists and other content offloadable to the GPU in most graphics applications with large model. In order to test the card to its limits, I've used my complex 3D city model, using over 12 million polygons. Our test consisted out of navigating through the city and checking for framerate drops during playback. The city was rendered in full real time, wherever the camera went. Quadro FX5800 passed the test with flying colors, unlike cards with less frame buffer memory.
3D model of City of Johor, Malaysia: 12 million polygons. Design by Nebojsa Novakovic
The application tuning window in the Nvidia Quadro control panel has a very different set of apps compared to the GeForce. You will see, yes, AutoCAD, but also far more potent high-end software like CATIA, ProEngineer, 3ds Max, and even Photoshop optimizations. We are very happy to see nVidia dropping the Quadro CX and offering CS4 optimization for all the Quadro cards. But then, it was logical as the Quadro users are paying top dollar for this kind of system and they expect one-click certified application optimization.
When you're not benchmarking the hardware, putting Error reporting into "On" is advisable. If any errors happened during rendering, it is better to have data when did it happen.
The option "Force 10 bits per component" isn't going to win you any benchmarks, but you can expect commercial-grade quality renderers, especially if you adopt ray-tracing applications for professional 3D. Seeing this option made our video guys go into overdrive. You can connect Quadro FX5800 card with SDI Capture and Output add-in cards and with this amount of local video memory, you can expect serious acceleration of your pipeline.
Feeding the GPU
One of things we have to emphasize is that Quadro FX5800 requires a lot of CPU horsepower to be able to eat up and render all the polygons. Until software applications adopt DirectX 11 and its multi-threaded rendering capabilities, or the OpenGL 3.x counterpart features, CPU horsepower is what you need.
Linpack score on Intel Xeon W5590 processors
In the case of our processing platform, two Intel Xeon W5590 processors at 3.33 GHz each [3.46 GHz in Turbo] give out 99.01 GFLOPS of usable processing power and the scores were a bit better compared to Xeon W5580 [3.2 GHz] processors.
If you can't stand the heat...
One point I have to emphasize here is the issue of heat. This might be the key reason why only system integrators get a Quadro SLI certification. Quadro FX5800 is the only card in whole Quadro FX line-up featuring a double sided PCB and no less than 32 pieces of 2Gbit GDDR3 [256MB] memory chips. The fact that this is quite a unique board could explain why Quadro FX5800 is a bit of a hot card. The backplate is made from plastic which covers a flat metal surface that "cools" the backside memory chips. For next generation of products, we would be much happier if nVidia would go with completely metal backplate. Truth to be told, Quadro FX5800 doesn't get as hot as the high-end ATI FirePro cards, but you should take care of the system ventilation if doing DIY system setups.
Besides the comparison with GTX285, the next part will also cover more apps, both the professional OpenGL applications and consumer tests such as the people's favorite, 3DMark Vantage. Stay tuned.
nVidia Quadro FX 5800, Intel Xeon 5590, Supermicro X8DAi, Tyan S7010, AutoCAD 2010, nVidia, Quadro, FX5800, Intel, Nehalem, Nehalem-DP, Gainestown, real-time 3D, Intel Turbo, 48GB RAM, GeForce, professional graphics, computer graphics, OpenGL, Direct3D, DirectX, SPEC, SPECviewperf, Autodesk, AutoCAD, SLI, Scalable Link Interface
© 2009 - 2011 Bright Side Of News*, All rights reserved.