Wednesday, Intel showed reporters a 48-core processor nicknamed the SCC [single-chip cloud computer] that consumes about the same power as today’s desktop processors. With 1.3 billion transistors, the chip boasts 10 to 20 times the processing power of Intel’s top of the Nehalem product line. 

Intel’s CTO Justin Rattner proudly showed a multi-die manufacturing wafer. He said that they are beyond the wafer level, and have packaged and running parts. He laughingly admitted that this is not the typical Intel "flash the wafer and then wait six months."

Justin Rattner, Intel's CTO explains the details behind SCC
Justin Rattner, Intel’s CTO explains the details behind SCC

Rattner said that SCC’s 48 IA-32 [Intel Architecture, 32-bit] cores are simple, in-order designs and not sophisticated out-of-order processors. He said that these are more of an Atom-like core design as opposed to a Nehalem-class design. Rattner said the fully programmable 48 processing cores are the most Intel has ever had on a single silicon chip.

Talking about similarities between Atom core and the SCC... cloud computer is another variation of in-order core recently returned to Intel by DoD
Talking about similarities between Atom core and the SCC…

Rattner said that a data center could replace a rack full of equipment with one or a number of high-core count processors like the SCC. The SCC can operate between 1GHz to 3GHz and use from 25 watts up to 125 watts. Rattner said that this is an experimental chip and it never will be a product. Intel’s lab plans to hand-build about 100 of these experimental chips for academic research and specialized software development. Rattner said that when you put real silicon and hardware in front of people it speeds the development cycle, compared to working from emulators.

Demo machine with 48 IA-32 cores on a single die
Demo machine with 48 IA-32 cores on a single die

The SCC measures about 567 square millimeters ? about the size of a postage stamp. It is fabbed using 45nm CMOS High-K metal gate process. SCC’s design includes four DDR3 channels in a 6-by-4 2D-mesh network. The 24 dual-core modules are linked together and communicate by means of a software-configurable message-passing scheme using 384KB of on-die shared memory.

The SCC is the second generation successor to the 80-core "Polaris" that Intel’s Tera-Scale research project showed in 2007. The Polaris was just a proof-of-concept project. However, the SCC is based on the Intel Architecture [IA-32], so it runs standard x86 software.

Earlier this year, Tilera, a startup spun out of MIT [Massachusetts Institute of Technology], promised a 100-core processor. Their processor would be fabricated using 40nm technology and be available early next year, Tilera predicted.

Rattner said that they found only one significant bug and that was fixed with only a metal layer change. Because they are using IA-32, x86, cores they are able to run Windows and Linux on SCC systems. Clearly, the major hurdle to overcome with multi-core architecture is the traditional single threaded applications.

Rattner said that programmers need the tools and experience to develop applications with independent tasks running in parallel. The SCC has been extensively tested using JavaScript. Intel says that JavaScript has been under utilized because of the lack of multiple threads. Treating the SCC experimental chip as a "server farm" lets them divide the work involved in calculating complex renderings.

During the megahertz race of yesterday, processor clock frequency got faster and faster, letting single threaded applications execute faster. Then, the higher and higher heat and increasing power consumption shifted designers towards multicore CPUs for increasing computing power.

The SCC uses message passing which is an architectural change from the traditional cache coherency approach. Tim Matson of Intel Research explained that message passing is the idea of sharing data by moving messages directly to other processors over a network rather than reading and writing to a pool of shared memory. An important part of the message passing architecture is extremely low latency and high bandwidths.

Matson said that each core will communicate with the fabric like a mesh network instead of having cache coherency like Larrabee which requires that each core know all about the cache in another core. That limits the number of cores that can be tied together. Another reason for choosing a message passing architectural approach for SCC was to find out how the theory worked in the real world.

The SCC power management can independently control eight-variable voltages and 28 variable-frequency areas of the chip. A programmer can use the API and set break points for changing frequency and power consumptions for the cores. The linked video shows the experimental chip divided into eight cores that are graphically represented on the screen.

Microsoft showed their Visual Studio graphical application with extensions to control 2 through 48 cores. By increasing the number of cores, they sped up the action of a fractal image on screen. This allowed a programmer to see how their code can be refined to more evenly distribute the parallell work load across all the cores.

Intel said they would start delivering their experimental SCC chips in the spring of 2010. By this time next year, we will be hearing how the researchers are doing with their new powerful parallel x86 working environment.