The ARM Cortex-A8 versus x86
Like the Intel Atom, the ARM Cortex-A8
is a superscalar
design. In other words, the Cortex-A8 is able to execute multiple instructions - in the case of the Atom, up to two - during each clock tick, but can only execute instructions in the order they arrive, unlike the VIA Nano and all current AMD and Intel chips beside Atom. The Nano, for instance, can shuffle instructions around and execute them out-of-order to improve processing efficiency by about 20-30% beyond superscalar in-order chips.
The immediate predecessor of the Cortex-A8 is the ARM11
which found a home in the original Apple iPhone and countless other dumbphones and smartphones. The ARM11 is a simple, scalar, in-order microprocessor, so the best it can ever do is execute one instruction per clock cycle. As the Cortex-A8 is roughly equivalent to the Intel Atom, the ARM11 is somewhat similar to the VIA C7.
In-order chips suffer a performance hit because processing can come to a screeching halt when an instruction is encountered that takes a long time to complete. On the other hand, out-of-order chips can shuffle instructions around so that forward progress can usually be made while a lengthy instruction is simultaneously processed.
The Intel Atom manages to partially overcome this problem by implementing Hyper-Threading
, Intel’s brand name for its version of symmetric multithreading
[SMT]. Like a few other Intel CPUs [and the three IBM PowerPC-based cores in the Xbox 360’s Xenon], the operating system [OS] views the Atom as if it has more processing cores than it actually does. In the case of the single core Atom N450, the OS sees two "virtual" cores. The operating system will accordingly distribute a thread [independently running task or program] to each core at once. Consequently, the Atom often churns through two unrelated instruction streams simultaneously, so even if one gets blocked by a slow, "high latency" instruction, the other thread can usually still be processed.
While Hyper-Threading doesn’t help much on single threaded tasks - and a vast amount of modern computing remains single-threaded - Hyper-Threading helps a great deal with slow input/output
[I/O] intensive instruction streams since I/O operations can take an eternity from the CPU’s vantage point and can block even an out-of-order core. For instance, the Atom boots Windows 7 relatively quickly compared with even superscalar, out-of-order, single-core chips like the VIA Nano because the Atom can continue processing a second thread and does not have to frequently stop and wait on the vast number of I/O operations encountered during boot-up.
Intel chose to equip the Atom with Hyper-Threading instead of making the chip out-of-order because Hyper-Threading is simpler and consumes less power. Intel’s Austin design team created the Atom especially for low-power environments.
However, the benefits of Hyper-Threading diminish when multiple cores are available. The newer ARM Cortex-A9 MPCore
is designed to be deployed in two or more cores, so SMT is not as important under multi-core conditions. For instance, the NVIDIA Tegra 250
boasts two ARM Cortex-A9 processors. Moreover, the A9 is superscalar, and out-of-order with speculative execution, putting it on equal footing with the newer x86 chips, at least superficially.
Keep in mind that modern x86 microprocessors tend to be very rich in execution units and, after decades of development, are extremely refined in terms of low instruction latencies and feature sets. Perhaps most importantly, the supporting x86 "ecosystems" are unmatched. "Ecosystem" is the current buzzword that refers to the surrounding chip set, memory, I/O, interconnect and peripheral infrastructure.
Moreover, ARM chips are RISC cores which have reduced instruction sets. In fact, RISC is an acronym for "Reduced Instruction Set Computer"
and ARM CPUs typify this genre in many ways.
In general, RISC chips are leaner and usually support fewer instructions than CISC or "Complex Instruction Set Computer"
. While today’s x86 CPUs wield a decidedly CISC-style instruction set, the underlying hardware has absorbed most of the advantages of RISC while implementing many complex instructions in microcode. For instance, the VIA C3 bolted a CISC x86 frontend over a very MIPS-like
An issue to watch out for when comparing ARM CPUs against x86 microprocessors is the size of binary files. In the past, RISC machines have produced larger executables because more instructions are often necessary than with CISC-derived systems. If binary sizes differ significantly, this places greater pressure on cache sizes, RAM size and memory bandwidth. With today’s terabyte-scale mass storage devices, increased binary bloat is not significant since the vast majority of drive space is consumed by video and other multimedia data.
The table above shows that ARM Cortex binaries are indeed larger than x86 binaries, but the difference is only about 10-15 percent. If this sampling is representative for both platforms, binary size differences will rarely matter. ARM L1i and L2 caches should minimally be as large as those found on x86 microprocessors, but that is not currently the case, as will be discussed shortly.
ARM representatives responded with the following:
The binary size of the ARM benchmarks is significantly lowered with the Thumb-2 hybrid instruction set. Expected results are 20-30% lower code size at equivalent or better performance. The 10.0x version of Ubuntu Linux has been optimized for Thumb-2. [The version as tested was Ubuntu 9.04]
Of course, the real story in the battle between ARM and x86 is how they measure up against each other in the performance arena. In this report, we’ll take a close look at competitive performance across a broad range of tests and also take a peek at power usage.
© 2009 - 2014 Bright Side Of News*, All rights reserved.