During the recently held Intel Developers Forum, we heard of moves the company is pulling on all fronts: desktop chipsets are gaining more features in order to fend off AMD’s 2013 Virgo platform, mobile chips are continuously being touted as equal to their competition (even though recent video from Qualcomm easily disputes that) and more importantly, the company is pushing to dismiss NVIDIA’s line of HPC products, the Tesla GPGPU parts.

We have spoken with senior representatives from financial as well as scientific institutions and the responses to our queries were quite mixed. From one side, legacy code will have easier ride on the Xeon Phi, even though you have to recompile if you want to use new features. Naturally, if you need to recompile – it doesn’t matter are you recompiling into CUDA, OpenCL or MIC code path – work ahead is guaranteed.

Intel Xeon Phi at glance
Intel Xeon Phi at glance

In order to sweeten the Xeon Phi pill, Intel deployed standard and some non-standard tactics in order to win businesses. Besides free samples for evaluation, the company offered very affordable pricing to its first customers. However, we thought that the company will start with realistic pricing once we start talking about deployments measured from hundreds to thousands of cards. Anything else could be viewed as price dumping and uncompetitive practices, for which the company was target of multiple anti-trust investigations by the FTC (Federal Trade Commission) and EC (European Commission).

Thus, imagine our surprise when we learned prices Intel gave to Texas Advanced Computer Center, who is currently in the final stages of building their super computer. A year ago, TACC won the $27.5 million award from NSF, targeting the creation of 10 PFLOPS machine named "Stampede".

TACC Stampede at a glance: 6,400 x86 compute nodes, almost 6,400 Xeon Phi boards, over 250 TB of DDR3 main memory, 16 TB shared memory nodes and 144 Tesla boards.
TACC Stampede at a glance: 6,400 x86 compute nodes, almost 6,400 Xeon Phi boards, over 250 TB of DDR3 main memory, 16 TB shared memory nodes and 144 Tesla boards.

In order to reach the 10 PFLOPS target, the computational power in Stampede is split in two parts. First 2 PFLOPS come from 6,400 nodes carrying two Intel Xeon E5 Series processors (Sandy Bridge-EP) and 32GB of DDR3 memory. Second part of the system comes from the countless MIC cards (now known as Xeon Phi), which were supposed to deliver 8 PFLOPS. As it turns out, the "countless MIC coprocessors" fell a bit short of the target, with TACC expecting more than 7PFLOPS, but less than 8PFLOPS. Third part of the Stampede system is 16 memory nodes with 1TB of DDR3 memory and two NVIDIA Tesla K20 boards. Furthermore, Tesla K20 boards are located in 128 out of 6,400 compute nodes for computational purposes, bringing the total number of K20 boards to 144. This number pales in comparison to around 6,500 Xeon Phi boards. The ScaleMP virtual SMP solution is used in order to create a shared memory environment, spanning across all 16TB of memory. This part will mostly target "big data".
While the prices of Intel Xeon E5 systems and the Tesla boards were delivered at special but still realistic pricing, we were quite surprised to learn that the computing center only paid around $400 per Xeon Phi board. Given that competing Tesla K20 boards retail for $3199 (available in December), this can be viewed from a price dumping perspective. Bear in mind the TACC only had $2.4 million for Xeon Phi boards, and reaching 8PFLOPS e.g. 7+ PFLOPS requires around 6,000-7,000 boards. At $400, it is quite a steal.

We have reached Intel for comment and received the following reply:

"We do not comment on any rumors or speculations related to product pricing before it is actually launched.

We have not yet announced any pricing details of Xeon Phi and you may expect to get more on this when we announce the product later this year."

Thus, the nature of the $400 price can be traced with the fact that TACC took most of pre-production Xeon Phi boards. At the time of writing, it is not known if these boards are pre-production Knights Corner or first samples of Knights Ferry. All we know is that the boards are specc’ed at 300W TDP, feature 61 mini-core, 512-bit memory interface and 8GB of GDDR5 memory.

Regardless of the price, numerous universities in the TACC ecosystem will benefit heavily from the Stampede supercomputer. According to the TACC representatives, there are several hundred projects which will be running on the supercomputer from the very first day, projected around January 7th, 2013. Clemson University, Cornell University, Indiana University, Ohio State University, University of Colorado (Boulder, CO) all teamed up with the University of Texas in Austin and El Paso.

The story about the Stampede doesn?t end here. There is a second generation already in works, which would feature 2014-class Xeon Phi processors, with the goal of bringing additional 5 PFLOPS of compute power. Given that there is enough space in the supercomputer, we expect that second generation might reach the desired 15 PFLOPS. This upgrade should meet with additional $24 million planned for the support of the Stampede project into 2017 and onward.