Integer PerformanceAlthough it might not always appear to be the case, all computing is the processing of numbers. From the words of a love letter, to the glistening dew drops on a rose, to
Johnny Cash’s anguished, repentant voice, to
Gordon Freeman’s apocalyptic universe, to the ruby slippers on
Dorothy’s feet, all are simply numbers to a computer.
For most chores, the only numbers that matter are integers. Integers are the natural counting numbers like 1, 2, 3 and their negative counterparts plus zero. With the exclusion of 3D gaming and some types of video and still image rendering, encoding and manipulation, the vast bulk of day-to-day computing is integer-based. The integer test results we look at here can give us insight into typical system performance across chores like word processing and web browsing.
EEMBC CoreMarkThe
Embedded Microprocessor Benchmark Consortium [EEMBC] released a benchmark that is freely available to anyone. Dubbed
CoreMark, this test provides a quick way to compare CPU performance across entirely different processor architectures.
We compiled CoreMark on each platform using GCC version 4.3.3 and the following flags:
-O3 -DMULTITHREAD=4 -DUSE_FORK=1 -DPERFORMANCE_RUN=1 -lrt
We chose to generate four threads to insure scaling across a variety of systems featuring multiple cores and/or Hyper-Threading like the Intel Atom.

As you can see from the graph above, the ARM Cortex-A8 is very competitive on EEMBC CoreMark, running almost as fast as the Athlon and Nano. The Atom pulled ahead thanks to Hyper-Threading combined with its 25 percent clock speed advantage over the other chips. Unfortunately, there aren’t many more overall wins for the Atom ahead; please note, however, that most of the remaining tests are single-threaded.
OpenSourceMark miniBench"miniBench" is a diverse benchmark that I’ve been working on for several years. It’s part of my
OpenSourceMark benchmarking project. miniBench contains a wide variety of popular tests and runs quickly from the command-line. I also have a GUI-based version that I wanted to use for this report but could not do so because the Qt tool chain would not install completely on the ARM system. Instead, I used the excellent and relatively lightweight
Code::Blocks IDE to create and manage the necessary C++ project files for a command-line binary.
You can download the
x86 Code::Blocks project here. An x86 Linux binary compiled with static libraries is
here. A similar
ARM Cortex-A8 Linux binary is here. Both the x86 Linux project and the ARM Cortex-A8 project will eventually be uploaded to the
OpenSourceMark SourceForge page, along with GUI adaptations of these benchmarks.

The ARM Cortex-A8 struggles on three of the five tests in this first miniBench chart. Heap Sort is the worst result for the A8 and this is almost certainly because the test appears to be significantly impacted by memory bandwidth. The i.MX515 system is saddled with very poor bandwidth as already demonstrated in this report. Integer Matrix Multiplication is another memory bandwidth sensitive test where the ARM chip comes up short.
However, the ARM Cortex-A8 is extremely impressive on the Integer Arithmetic test, blowing away the Athlon and doubling the Atom’s performance. The Integer Arithmetic test does exactly what you’d expect it to do: it performs a large number of very simple integer arithmetic calculations.
Also notice that the 800MHz ARM Cortex-A8 beats the 1GHz Intel Atom N450 on the ubiquitous Dhrystone benchmark despite the fact that the ARM chip spots the Atom a 25 percent clock speed advantage. ARM advertises that we should be able to get 1,600 Dhrystone MIPS from an 800MHz Cortex-A8. On our tests, the 800MHz ARM Cortex-A8 achieved 1,680 Dhrystone MIPS.
It’s clear that the ARM Cortex-A8 is aggressively optimized for Dhrystone performance, a fact borne out by the fact that ARM touts the chip’s Dhrystone throughput.

On the second set of miniBench integer tests, the ARM Cortex-A8 holds its own against the brawnier x86 CPUs. The ARM Cortex-A8 even beat the VIA Nano L3050 on the Sieve test. More remarkably, the Cortex-A8 is very close to parity with the Atom across all of these tests, save for one, if the Atom’s 25 percent clock speed advantage is considered.
Notice, though, that the ARM chip could not run the String Concatenation test. This is an important indication of the relatively immature state of ARM’s Linux/GNU software support. Ubuntu as a whole was often flakey. Doubtlessly, this will improve with time.

The VIA Nano L3050 obliterates all of the competition on the hashing tests because the Nano features hardware support for these important security functions.
However, the 800MHz ARM Cortex-A8 is amazingly good at hashing and thoroughly beats the 1GHz Atom on both tests and is only slightly slower than the Athlon.

The VIA Nano L3050 enjoys its biggest triumph on the miniBench cryptography tests because the Nano is equipped with robust hardware support for AES ECB encryption and decryption.
Again, the ARM Cortex-A8 remains very close to the Intel Atom if the Atom’s 25 percent clock speed advantage is considered.
HardInfo CPU Benchmarks
HardInfo is one of the few CPU benchmarks available from within Ubuntu’s repositories.
The ARM Cortex-A8 doesn’t perform quite as well on HardInfo as it did on miniBench, possibly because I used very aggressive optimization flags for both platforms when compiling miniBench. Nevertheless, the ARM Cortex-A8 stays within spitting distance of the x86 CPUs except on the FPU Ray-tracing test which is not an integer test but rather a floating-point test.
Floating-Point performanceGaming, scientific computing, certain spreadsheets like financial simulations and some image and video manipulation tasks involve fractional and irrational numbers. Called "floating-point" because the decimal or radix point can float around among the significant digits of a number, floating-point performance has become increasingly important in modern computing.
However, good floating-point performance is relatively hard to engineer and requires a substantial number of additional transistors. Of course, this drives up power usage. Typically, floating-point intensive operations consume more power than pure integer tasks. In fact, miniBench’s
LINPACK test was the worst case power consumer on the VIA Nano. Centaur discovered this while I worked there as head of benchmarking. However, this does not include "thermal virus" programs like the absolute worst case program developed by Glenn Henry, Centaur’s president.
Integrated floating-point [FP] hardware is a fairly new addition to ARM processors and even though the Freescale i.MX515 ARM Cortex-A8 features two dedicated floating-point units, there are still severe limitations. The faster of the two FP units is the "Neon" SIMD engine, but it only supports 32-bit single-precision [SP] numbers. Single-precision numbers are too imprecise for many types of calculations.
Hardware support for 64-bit, double-precision, floating-point calculations is provided by the
"Vector Floating-Point" [VFP] unit, a pretty weak coprocessor. And despite being called a "vector" unit, the VFP can only really operate on scalar data [one at a time], although it does support SIMD instructions which helps improve code density.
Oddly enough, during our performance optimization experiments, Neon generated the same level of double-precision performance as the VFP, while doubling the VFP’s single-precision performance. When we asked ARM about this, company representatives replied,
"NEON improves FP performance significantly. The compiler should be directed to use NEON over the VFP."We therefore compiled miniBench to leverage Neon for this report. Note that while the Neon compiler flag was used for the ARM chip, none of the tests are explicitly SIMD optimized - the x86 version of miniBench used in this report does not include hand-coded SSE or SSE2 routines and the ARM Cortex-A8 version of miniBench does not include similar Neon code.

In the miniBench MFLOPS tests, the ARM Cortex-A8 looks pretty bad except on division. While the VIA Nano has the best DP [double-precision] performance, note how well the Intel Atom N450 handles SP calculations.
It is also worthwhile to recognize the very good floating-point division performance of the ARM Cortex-A8’s Neon. Unlike all of the x86 chips that I have ever tested, the Cortex-A8 delivers identical throughput for both floating-point division and multiplication. Division is much slower on x86 processors than multiplication. Consequently, the Cortex-A8 keeps up very well with the x86 CPUs in this report on DP division, more than doubling the Atom’s performance when the Atom’s clock speed advantage is considered. In single-precision division, the ARM Cortex-A8 beats ALL of the x86 microprocessors it’s pitted against here.

The ARM Cortex-A8 continues to languish on the remaining miniBench floating-point tests with two notable exceptions. The Cortex-A8 is fairly strong on FFT calculations, an extraordinarily important algorithm for many, many tasks. The ARM chip is also competitive with the Atom on the Double Arithmetic test.
Observe how the old Barton-core Mobile Athlon demolishes all of the other chips on Trig. AMD has historically provided industry leading performance on transcendental calculations, while the same area has always been a big weakness for VIA’s CPUs. ARM really needs to bolster their chips’ performance on transcendental operations like the trigonometry functions exercised in this test.
The takeaway from this section is that the ARM Cortex-A8 does not deliver acceptable floating-point performance for netbooks, notebooks or desktops compared with x86 CPUs. This is an area ARM must address if the company plans to compete toe-to-toe with x86 microprocessors.
© 2009 - 2013 Bright Side Of News*, All rights reserved.