During the recently held nVidia GPU Technology Conference, we spoke with a lot of supercomputer vendors and learned quite an interesting number. It turns out that roughly 90% of supercomputer orders now come with at least a single GPU order [to evaluate GPU in HPC environment]. In a lot of cases, supercomputer order would contain both ATI and nVidia GPUs for a fair evaluation.

When it came to creation of Tianhe [translated into Milky Way], this supercomputer currently carries the unofficial title of world’s most powerful supercomputer. Chinese National University of Defense Technology recently unveiled first PFLOPS computer outside United States of America.

Tianhe supercomputer - you can see a node consisting out of four Xeon CPUs and two ATI Radeon HD 4870X2 cards. Picture Credit: Xinhua/He ShuyuanWorking on a budget of 600 million RMB [Yuan], which translates to 87.88 million USD, Chinese scientists created a supercomputer consisted out of 24,576 Intel Core2-based cores [6144 Harpertown CPUs - 3072 Xeon E5540s and 3072 Xeon E5450s] and 5120 AMD RV770 GPUs [2560 ATI Radeon HD 4870 X2 2GB cards]. Together, 6144 Intel CPUs and 2560 AMD GPUs reach a theoretical speed of 1.206 PFLOPS.

This kind of performance took Chinese scientists by surprise. Zhang Yunquan, a researcher with the Institute of Software of the Chinese Academy of Sciences [CAS] was quoted: "I was shocked at the milestone breakthrough, which was beyond expectation, I previously forecast China’s first petaflop computer no earlier than the end of 2010."

Unfortunately for researches involved – due to SPEC rules and regulations, GPUs aren’t allowed to run Linpack [SOPEoptimized code is banned], so only CPU scores are taken into place. 24,576 Cores achieve 563.1 TFLOPS in Linpack, and that should be enough to qualify into Top10 of world’s most powerful supercomputers. Top500.org will announce the new list of world’s most powerful supercomputers during this month. More precisely, during Supercomputing 09 conference held in Portland, Oregon between November 14-20.

Naturally, scientists don’t care about the Linpack numbers, but rather making use of both CPU and GPU for their research. According to Dr. Zhang Yulin, President of NUDT – this system will be used for designing space ships and similar aerospace vehicles such as satellites [hence the name Milky Way One], processing seismic data in relation to oil exploration and earthquake research, bio-medical analysis and many more. Naturally, the strengths of parallel approach within GPU architecture will make CFD [Computational Fluid Dynamics] and volumetric calculations [seismic research] to deliver calculations much faster than serial CPUs are capable.
The sheer size of this system is nothing short of breathtaking: 103 large racks occupy an area over 1,000m2 [10,763 sq. ft.], the whole installation is 155 tons in weight and yes, you do need a lot of electricity to power this kind of system.