UPDATE - Continue to the second page of the article.
For a long time, dual processor systems intrigued many high-end desktop users: adding the second CPU propelled your personal PC into the "parallel processing" exotic workstation and HPC [High Performance Computing] realm, without jumping into the overly complex and expensive quad-socket and larger systems. And, two-socket systems can for the most part still fit within the usual high-end PC size and power envelope.
Remember the Dual Celeron 300's running on many overclockers' rigs at 500 MHz and above, stable for years? That trend was then continued with the Intel D5400XS Skulltrail "extreme desktop" version of their dual Xeon workstation motherboards. The attraction only got stronger as the multi-core push resulted in operating systems, applications and even games supporting multi-thread parallelism better than before.
However, at least on Intel systems, the shared FSB did pose a bit of MP scaling bottleneck in any memory-intensive app. Dual FSB1600 on the systems like Skulltrail and other Seaburg chipset platforms did help somewhat, when combined with four channel memory configuration and cache snoop filters in the chipset North Bridge. At that time, even those STREAM-like memory benchmarks that often eluded Intel became substantially closer in results to AMD's DP Opteron platform. But it wasn't enough to win back the memory performance crown.
Intel's new workstation platform paired with ASUS Radeon HD4870X2. Mind the number of SAS cables going to Enterprise SSD drives
Now, the upcoming Nehalem-EP "Gainestown" platform is expected to take the dual processor performance crown in pretty much every category conceivable from March 30th onwards, finally balancing the performance act. Whether we're talking about raw integer and FP performance from its sixteen threads - four cores per chip, running at up to 3.2 GHz and the multithreading scaling with two threads per core, the bidirectional QPI bandwidth of 25.6 GB/s between the two processors, and triple-channel DDR3-1333 integrated memory controllers on each CPU - the new dual processor sister platform of the Core i7 does show promise.
Cutting to the chase, enough has already been said about the Nehalem architecture. Here is an early look at one of the first Nehalem DP workstation reference platforms, Intel's reference system based on the Super Micro X8DAi motherboard. In this intro part, the first of this multi-part series extending till next week, we will have a glance at the complete system setup, and some comments on how to possibly improve it. The following part will then highlight some of the interesting BIOS features that differentiate this power monster from your typical high end PC.
Opening up the side panel, you'll see the two LGA1366 CPU sockets - yes, the same as Core i7, except that the second QPI link between the two CPUs is enabled here. Intel is finally standardizing the Socket commonality between higher-end processors. Similarities include the heat sink mounting - so yes, finally the UP and DP platforms can share the same cooling solutions, including the more exotic liquid, TEC, fridge and freeze stuff from the high end desktops as long as, of course, they fit in pairs.
Taking a closer look at the cooler shows that Intel finally moved in the right direction as far as CPU cooling go... these are not screamers.
Under the two aluminum heatsinks with front-mounted fans, each socket holds the fastest Nehalem-EP at launch: the W5580 Xeon, running at the same 3.2 GHz, 6.4 Gbps QPI speed as the Core i7 965. Note that the initial D0 stepping of this chip is the same as the expected Core i7 Extreme 975, a superb overclocker with high "performance enhancement" margins. So, if the multipliers were unlocked, the W5580 could give out some "naughty" numbers too.
Two CPUs means double the memory channels in total - most workstation boards here will be happy with total of twelve DIMM slots, two per channel, and that includes the X8DAi. With Samsung's recently announced 16 GB DDR3 R-DIMMs, you can pack a humongous 192 GB RAM in this box. Given the pricing of the parts, this could mean that you can start thinking about having a super computer on your desktop, instead of usual off-loading demanding tasks - large servers or supercomputers. The test configuration was equipped with twelve 4 GB ECC R-DIMMs, also from Samsung, for a total of "only" 48 GB memory - still the largest memory PC to ever enter my lab.
ECC memory is nice, but can we get something faster?
Now, I was curious whether Super Micro will block me from running the standard Core i7 DDR3 DIMMs, like the fast DDR3-2000 3-channel kit from Kingston. After all, these are very fast modules - CL8 at DDR3-2000 being achievable on ASUS Rampage Extreme with the newest BIOS - but they are the unbuffered, non-ECC desktop kind. So, I replaced the entire DRAM with six HyperX modules, one per each CPU's memory channel. Guess what, they worked! At only DDR3-1333 speed though, as the BIOS option for "Forced DDR3-1600" didn't seem to take effect.
Can you imagine, if this memory would work at its native speed of 2 GT/s, this system would have 96GB/s of bandwidth for CPUs alone?
Nevertheless, as you see on the photo, this big baby can take your favorite Core i7 desktop memory and spread it nicely across no less than six channels! It will be lovely to compare once the benchmark time comes next week. In theory, this system should give you just a little below 64 GB/s of memory bandwidth. This is more available memory bandwidth than most low-end and mainstream graphics cards!
Super Micro's design calls... good or bad?
The Tylersburg 36D chipset IO Hub [NorthBridge chip, Ed.] is the dual-QPI sister of the X58 desktop one, the only major difference being the second QPI channel to enable talking to either two CPUs at the same time, or one of the CPU and - get this - another Tylersburg bridge for dual IOH configuration with, say, four independent PCIe x16 paths. This board has only one IOH, therefore we're limited to two PCIe Gen2 x16 and one PCIe Gen1 x4 slot. Why isn't the x4 slot running at Gen2 speed, since the IOH supports it? Well, Super Micro, in a questionable decision allocated the IOH's Gen2 x4 lanes to an optional on-board SAS controller chip from LSI, which our board doesn't have - you need X8DA3 motherboard flavor to have it. And, the x4 slot lanes come instead from the ICH9 SouthBridge chip, which only supports PCIe Gen1 speed and is limited by the ICH-to-IOH connection bandwidth.
Now, if you use a higher end SAS RAID controller with local processor and cache, for say, your SSD array, the extra double bandwidth of the PCIe Gen2 would come in handy. So, the IOH Gen2 lanes should have been brought to that empty slot instead, and the optional on-board SAS relegated to the ICH PCIe. I've added Intel's own SAS RAID controller here with their kind help, and we will see how much the Gen1 speed limits it when using a quad SSD RAID0 array. The other interfaces - dual Gigabit Ethernet, on board SATA and USB ports, integrated audio, plus two legacy serial ports and, luckily, PS/2 keyboard and mouse connectors - round up the I/O. Nothing overly exciting there in regards to interfaces.
The system is equipped with two 800W PSUs [Power Supply Units], together supplying enough power to feed everything, including two of the fastest graphics cards you could imagine for 2009. With one PSU alone, as I tested, it couldn't even feed an Asus HD4870X2 triple-fan "Harley-lookalike" card together with the rest of the components. Two PSUs, though, do it fine.
Super Micro's beta BIOS probably needs more work...
What could Super Micro improve here on the board level, before we go into the BIOS? First off, the BIOS showed pretty unusual CPU temp readings above the 60C level, either Super Micro's temperature sensors need to be checked, or the heat sinks need to be replaced. Since the casing is fairly spacious, a heat sink replacement with higher-end units from the desktop LGA1366 market is a sure option.
Secondly, the PCIe slot layout: a 7-slot configuration, with PCIe slots configured as two x16 and two x4 (one Gen1 and another Gen2 for the latter), plus a x1 PCIe slot for more proper audio instead of that on-board software, finally rounded off by two spare PCI and/or PCI-X slots on the side, which would use the chipset resources better and provide for more flexible expansion.
Third, as the IOH North Bridge does heat up quite a bit, replacing that thin aluminum heatsink, or at least providing easy mounting option for a slim local fan to take care of it. This could be accomplished without being blocked by, say, a long graphics card.
Then, of course, some board real estate could be saved by using Intel's 82576 dual-port GbE controller with quite a decent amount of TCP/IP offload, instead to the two 82573V chips being used now. Yes, the 82576 is more expensive, but you also save a PCIe lane and valuable board space.
12 DIMMs can take up to 192GB of DDR3-1333 memory, provided that your pockets are deep enough...
Nevertheless, it's quite an impressive board feature-wise. Asus and few others do claim to have even more impressive or simply faster Nehalem-EP workstation motherboards, but we'll leave that opinion until we actually test those new boards too. We'll follow up with article that will thoroughly explore BIOS options...
UPDATE, March 30th, 2009 01:26 UTC - We have published a follow-up containing more details about this exciting new platform. You can find Nehalem-EP Workstation Preview Part II if you click here.
UPDATE, March 31st, 2009 22:58 UTC - Third part of the review, containing various benchmarks is published. You can find our Nehalem-EP Part III: Benchmarks if you click on this link.
UPDATE, April 1st, 2009 15:58 UTC - Due to your demand, we decided to run a short video containing an inside view into the beast. You can view the video below: