Turning the Hyper-Threading off in SiSoft Sandra 2009
Turning the HT off will cost you around 14 GIPS and massive 58 GFLOPS!
For Sandra's tests, leave HT on at all costs, or be punished...
Turning HT off had a somewhat small loss of performance in Integer apps and 35 MPixel/s deficit in Floating-Point. But take a look at superiority of just eight core Nehalem architecture against 16 real cores from AMD in FP operations.
Bear in mind that this AMD system is radically more expensive than our tested system. Our Nehalem-EP platform would retail for around (sans SSDs) 6000 USD. In start contrast, a 4-Socket 2.7 GHz QuadCore would set you back for almost 10,000 USD! So, these results are the reason why AMD is worried. The AMD Opteron 8000 is nothing short of a golden goose, and even the dual-socket Nehalem-EP is able to compete against it! True, not when you disable Hyper-Threading...
3DMark Vantage... sort of
For the first time in Vantage, we get triple-zero score. Clean 17K of single-card awesomeness.
Unfortunately, we had to skip a complete 3DMark Vantage run missing in this part - despite three tries, the CPU portion of the famous 3D benchmark
couldn't complete with SMT on. Back at the day, we spoke with Oliver from Futuremark and he told us that 3DMark Vantage can handle 16 threads, so does PCMark Vantage.
So, we come to the SMT off or HT-off, call it whatever you like. We were curious to see how this will impact the performance - and yes, It did lower the synthetic bench performance, but it did prop up the real app ones - at least for the compute-intensive runs
with heavy threading.
Cinebench R10 was always a stronghold of Core architecture
No room for playing around... Cinebench didn't exactly profit from Hyper-Threading, and as you can see, GPU performance even suffers when HT is on.
Look what it does for CineBench 10, not to
mention the Linpack.
Sandra CPU tests are noticeably lower - In one case, the old Xeon
X5492 overtakes the new one; my suspicion is that this is due to the cache latency
and size issues in L1 and L2 here. Remember that, as mentioned,
Turbo was on for the W5580, so pretty much all the tests actually ran CPU cores at 3.33 GHz.
The Cinebench seems to do better now, and Linpack simply shines: 95
GFLOPS of actual double precision performance obtained here, beating double precision offered by every GeForce GPU below GTX295, e.g. the dual-GT200b GPU part. Note
that Linpack is quite sensitive to the memory latency too. If you ever decide to run DDR3-1333 memory modules with CAS6 latency, you could easily see the system achieving 100 GFLOPS, very near its peak theoretical limit.
Of course, this is just a start anyway.
We will follow-up this article soon, not just with other workstation and desktop apps, but
also with faster memory, as well as other mainboards that promise better tuning,
and of course, Linux environments.
Nehalem-EP, as the new Xeon DP series leader, lays down as an
impressive processor and platform base for the new dual processor systems from Intel - whether it's a 3-D workstation, extreme
desktop PC, a mainstream server or a HPC cluster node. The near perfect
balance between the CPU, memory and I/O resources will help a lot in
many real life apps, aside from all these benchmarks. It is up to AMD
to fight and match this, not just by adding cores as in the sexa-core Istanbul,
but also combining it with HyperTransport 3.1 links plus DDR3 memory,
and they better make it real quick.
We also heard that AMD plans to bring 256-bit memory controller with 12-core Magny Cours in 2010 and if that turns out to be true, AMD may be able to stay in the game. But as for 2009, it is up to Intel to show how far they want to go promoting and
gaining market share with the Gainestown chips. Bear in mind that this is not the end. Intel plans to introduce a socket drop-in upgrade, a 32nm
sexa-core 12 MB L3 cache follow-up from Westmere "tock" architecture.
The promise of a
balanced base board platform with suitable derivatives, not to mention tune-ups and software stacks addressing the workstation and server as well as the extreme PC all at once, has finally become true.
In our Video Production Studio, we are working on an Intel V8 65nm, Skulltrail 45nm and 45nm Mac Pro systems, and there was just one major issue - the memory subsystem was getting choked by the requirements of REDCODE stream. With Nehalem-EP platform, this bottleneck is finally removed and now Intel can truly shine and start replacing all those Opteron workstations that offered insane bandwdith [yes, a dual-socket dual-core Opteron is better choice that dual-socket quad-core Harpertown. Go figure]. Intel probably doesn't even have a clue what they did with this platform: if a dual-socket Opteron and nVidia Quadro FX 3000 SDI enabled the creation of BattleStar Galactica, Nehalem-EP with Quadro FX5800 in SLI should usher us into the world of TV shows having better effects than Hollywood movies Anno Domini 2008. With proper pricing, you can assemble a machine that will eat up everything that comes in its way, that being 128, 256 or even 512 instruments in an audio production. You can now emulate complete orchestras in real-time... overall, Intel has become the king of content production. Hats off to all the guys and girls who made this possible.
© 2009 - 2013 Bright Side Of News*, All rights reserved.