Benchmarking NXP i.MX 6UltraLite and Marvell PXA166, PXA168 ARM CPUs

Technology has become more than just a battle of sheer numbers. It has somewhat morphed into a “less is more” philosophy to see what’s possible with less. We pitted these three ARM CPUs together to find out if there are any positive impacts to using a slower clock speed CPU:

What we found is very promising in that in most tests, the 696 MHz i.MX 6UltraLite provided a huge advantage over the 800 MHz PXA166 and even the 1066 MHz PXA168. Let’s take a quick look at our findings.

Home

Here, we are using the Sysbench multi-threaded benchmark tool to calculate prime numbers per second. You’ll notice an obvious difference in performance, where the i.MX 6UltraLite comes out ahead at about 24 primes per second. Compare with about 13 primes per second with the PXA168 and 10 primes per second with the PXA166.

Furthermore, we use Sysbench to test RAM transfer speeds (using 1K block sizes). You’ll see from the graph above that the obvious winner here was the i.MX 6UltraLite with about 135 MB/s. The PXA168 was around 55 MB/s, and the PXA166 was about 41 MB/s.    Home

Now, we test read speeds from eMMC flash drive using DD, a utility to convert and copy files. We found that the i.MX 6UltraLite read at about 34 MB/s, while the PXA166 read at about 12 MB/s and the PXA168 at about 11.5 MB/s. Note that the PXA166 outperformed the PXA168 because the PXA166 runs the RAM slightly faster than the PXA168.

Finally, let’s throw some OpenSSL crypto benchmarks at the CPUs. Here, we’ll see that PXA168 performed slightly better than the i.MX 6UltraLite in a couple of cases and visa versa for the other half of the tests. While there is a difference, it’s not significant. Still, for a CPU with a slower clock speed to keep up with or beat out the others is impressive.      Home

Conclusion

These days the hertz count of a processor is not a very good measurement of performance. The important differences are more complex than just the clock rate, but essentially newer architectures are able to do more per clock.

The PXA166/8 is a customized ARM core originally from Intel/XScale called PJ1/Mohawk which follows the ARMv5TE architecture. This is compatible with the standard ARM9 core, but includes further customizations from Marvell.

The i.MX 6UltraLite is a 696MHz Cortex-A7 which follows the more recent ARMv7-A architecture. This architecture is a partially dual-issue architecture while the PXA166/8 is a single issue architecture. This means the PXA166/8 will execute one instruction per clock, but the i.MX 6UltraLite will execute multiple instructions in parallel when possible.

The Cortex-A7 also includes NEON instructions which support SIMD (single instruction, multiple data) operations. For code performing math on large blocks of data this allows one instruction to operate on blocks of data rather than using many instructions to iterate over that same block of data. Users typically have to go out of their way to use NEON code in their application, but some performance sensitive libraries such as fftw will use NEON without the users needing to change their code.

The i.MX 6UltraLite includes a floating point unit while the PXA166/8 does not. The i.MX 6UltraLite will offload floating point operations to its VFPv4-D32 core. The PXA166/8 uses armel/eabi code that will generate software floating code. Any floating point operations will see massive gains going to an i.MX 6UltraLite.

There will be many more nuanced changes between the CPU architectures, but ultimately performance has improved significantly in most cases going to this newer architecture. Suggesting that less really can be more.

Leave a Reply

Your email address will not be published. Required fields are marked *