During Mobile World Congress 2011 held in Barcelona, NVIDIA announced the Tegra 3 SoC (System-on-a-Chip), codenamed Kal-El. The company also disclosed a roadmap that boldly promises to deliver 100x the performance of a current Tegra 2 chip by 2014. Bear in mind that NVIDIA factors in both CPU and GPU performance in this number, so it’s not that easy to compare it to other chips. However, they also provided a performance comparison with a Core 2 Duo CPU using CoreMark and it was exactly those benchmark results that stirred quite a controversy.

It didn’t took long until someone found out about a nasty little detail about the performance numbers; when looking at the compiler versions and settings used to compile the CoreMark benchmark, the Core 2 Duo numbers were produced via GCC 3.4 and only the standard set of optimizations (-O2), while the Tegra 3 numbers were run on a more recent GCC 4.4 with aggressive optimizations (-O3). Il Sistemista website took a Core 2 Duo T7200 and re-ran the benchmark compiled with GCC 4.4 and the same optimization settings. The results were no longer in favor of NVIDIA, as the Core 2 chip scored about 15,200 points, compared to the Tegra’s 11,352.

CoreMark benchmark comparing nVidia Tegra 3 @1GHz clock to various real and hypothetical products
CoreMark benchmark comparing nVidia Tegra 3 @1GHz clock to various real and hypothetical products

So did NVIDIA outright lie about performance? Probably not, as they included the compiler information in the comparison. Also that?s not the whole story. On their blog, the company representatives disclosed that the Tegra 3 chip was clocked at a meager 1.0GHz. Leaked roadmaps showed that shipping versions of the chip dubbed T30 are expected to clock at 1.5GHz; the NVIDIA employee actually mentioned he could have run the benchmark on a 1.5GHz version. Since CoreMark scales almost perfectly linear, a hypothetical Tegra 3@1.5GHz would have scored 17,028 points, again beating the Core 2 Duo using the same compiler settings. If we extend the projections to a hypothetical 2.5GHz Cortex-A9 chip, we arrive at 28,380 CoreMarks, which is the very least we should expect from Qualcomm’s recently announced Cortex-A15 based chip at 2.5GHz.

We also did our own experiments with CoreMark, to extend the comparison to more recent CPU cores from Intel. The scores marked with an asterisk are based on projections. Note that there is a Core i7 720QM in the CoreMark database, which shows some abnormally high scaling. This is probably due to the turbo feature, which is not very helpful in such comparisons. All scores are based on gcc 4.4 with -O3 and CPU-specific optimizations (for i3-330M we used -march=native).

At this point it makes more sense to use another metric ? CoreMark/MHz. Tegra3 will come out with an 11.35 index, while Core2 scores 7.6. On a chip level, Tegra 3 is almost 50% faster than the Core 2 CPU. However, with twice the number of cores, this is not that impressive. Per core, the Cortex-A9 based Tegra 3 chip will come out at 2.83 CoreMarks/MHz, while the Core 2 Duo would score 3.79 CoreMarks/MHz. Bear in mind that Core 2 Duo was based on an five year old architecture. First-generation Core i7 (Nehalem) chips score about 4.53 CoreMarks/MHz (with factoring in Hyper-Threading). All these numbers are based on the same major version of GCC (4.4).

Comparing Tegra 3 (Codename: Kal-El) to its predecessor, Tegra 2 and competing CPUs from Intel using our CoreMark/MHz efficiency index
CoreMark/MHz index shows how much Coremarks can a particular chip extract given its frequency

This diagram shows the CoreMark/MHz performance of different chips using their full core count. The performance of Kal-El is impressive, but still dwarved by high-end x86 architectures. If we look at power consumption though, the ARM-based chip utterly destroys any high-end x86 chip.

The point is that Intel?s x86 chips still offer higher IPC (Instruction Per Cycle), at a much higher power envelope though. While Intel?s Core 2 Duo and newer chips consume from 35W upwards, the ARM-based SoCs consume only a couple of Watts, most likely in 1-4W range. Still, in order to beat current high-end x86 chips, a ARM-based CPU needs to either be clocked about twice as high or use about twice as many cores. It’s also important to note that the upcoming A15 architecture includes further optimizations at the microarchitecture level, increasing the IPC.

There is another aspect that can’t be factored in with these CoreMark number games. The Tegra chip also comes with an integrated 12-core GPU of yet unannounced architecture (Nvidia refused to answer to our question "is Tegra3 based on GeForce6/7-class architecture like Tegra 2, or newer GeForce 8/9"). As mentioned earlier, when talking chip performance, NVIDIA includes the capabilities of the GPU too. In the real world it does matter, as smartphone and tablet UIs and video playback wouldn’t work very well without a capable GPU. At this point, we can’t factor in GPU performance in any comparable way. If there is anything to be learned from the CoreMark numbers of Tegra 3, it’s that compiler version and settings matter. Intel should watch out though, as two generations from now, these chips could come dangerously close.