BRIGHT SIDE OF NEWS About | Advertise | Contact BSN USER Login
| Register
SUBSCRIBE Newsletter | RSS Feeds
Friday, May 24, 2013
Email this to a friend.
Your friend's e-mail:
Your Name:
Your e-mail:
Message subject:
Rob Enderle, AMD, AMD Trinity, AMD Fusion A10, Fusion A10, APU, Intel, INTC, Sandy Bridge, Ivy Bridge, DirectX, DX, DX9, DX11, DirectX 11, WebGL, SPEC, SYSmark, SYSmark 2012, BatteryMark, BAPCo, Futuremark, 3DMark, PCMark, Windows 7, Windows 8, JQuery, WinZip, OpenCL, Word, Microsoft, MSFT, Productivity, SQL Benchmark, render benchmark, Battlefield, Apple, MacBook Air, MBA, AAPL

OPINION: Are Benchmarks Worthless?




After reading the benchmarking piece by Rob Enderle, one of the striking messages was that we should no longer use current benchmarks and that a new type of benchmarking should be adopted, one that would emulate the real world. As it turns out, Rob was inspired by a recent AMD-organized event in Austin, TX where select press and analysts gathered for a Trinity Review Day. Note that BSN* was not invited and as such, we are free to fully disclose the performance of the Trinity APU (page two please).

Over the past 13 years, I have tested over 3000 components, and as patterns emerged I developed an aversion to overly optimized benchmarks. Instead, I personally, and the team here at BSN* try to test the products in as real situations as possible, resulting in countless hours of work over reviewing a single part. I wish to thank the respected members of AMD, Intel, NVIDIA, Futuremark, Microsoft, ORACLE, Blizzard, CroTeam, Samsung and Epic Games and many others - who worked with me for over a decade and explaining what I did not understand at the very beginning.

What's wrong with Benchmarks today?
If we take a look today at the performance measurements, most of companies will utilize an ubiquitous benchmark for their respective fields of interests: in the case of computing, we have consumer, prosumer, workstation, server and mobile benchmarks. The performance benchmarks are divided into synthetic and real world testing.

In this article, we will focus on the consumer/prosumer systems, which make up the majority of public interest, but also are subject to most scrutiny (and the most money invested by the companies that have an interest in it). The players in the benchmark field are BAPCo on the business side, Futuremark on the consumer and business side, and a couple of irrelevant players (in the grand scheme of things). There are also real world tests, which involve professional applications (Adobe, Blackmagic, CyberLink, etc.) and video games from major companies like Activision, Bethesda, Electronic Arts, Take 2 and so on. Video games, as we have found, tend to be the most intensive measure of a computing platform’s capabilities and stability.

BAPCo: The one-sided marching band that wins billions for one side
We start off with probably the most controversial organization of them all. BAPCo used to be an industry body originally founded by Intel. In fact, for good amount of time, the allegedly independent organization was operating out of Intel's offices in Silicon Valley. As Intel began to focus more and more on optimizing, more and more companies left the organization. The companies that left (in alphabetical order) are AMD, Apple, Microsoft, NVIDIA, SanDisk, VIA and many more. A good explanation of how Intel controlled the benchmark body was an organization, ARCIntuition, which always voted in Intel's favor. As our confidential source and a former board member of BAPCo said "Intel has many ways in which it influenced the outcome of BAPCo votes. One glaring example is ARCintuition, a shell company that has no other purpose than providing services to BAPCo. The company is basically an Intel sock-puppet."

A good example of the wrong side of benchmarking ethos was EEcoMark. Roughly three years ago, BAPCo wanted to release EEcoMArk to the EPA (Environmental Protection Agency) to become part of the EnergyStar certification engine. This would influence every computer on the planet, since EEcoMark overnight "would have become the most important benchmark ever created by far. For instance, many organizations will not purchase systems that do not bear the EnergyStar logo. Organizations in Europe and Asia were already lined up to follow the EPA’s lead, so EEcoMark’s reach would have been worldwide. Board members proved that, if the EPA were to adopt EEcoMark, only systems containing Intel microprocessors would ever earn EnergyStar stickers. It was the only time BAPCo had ever stood up to Intel and said ‘no.’ In the end, to save face after even their most loyal OEM allies turned against them, Intel was forced to vote ‘no’ on their own benchmark!"

EEcoMark was also one of many reasons why Apple left the BAPCo organization. Thus, when we take a look at business magazines and see that BAPCo benchmarks have been used, with authors using those benchmark scores to decide whether you should purchase brand A or B…

You can easily see why such a benchmark would be flawed. Yet, today billions of dollars of revenue are made based upon test results provided by MobileMark, EEcoMark and SYSmark. SYSmark 2012 is the crown example how an allegedly independent benchmark is deliberately manipulated in order for one side to win. In the creation of the benchmark, BAPCo used older versions of 3rd party software which did not support hardware acceleration (otherwise for example, an AMD Bobcat, Fusion E-Series APU would rival Intel's Core i7 "Sandy Bridge" processor), and the interesting bit was that in majority of cases, a revision of software used was the last one before accelerated version was made available.

Here at BSN*, you won't ever see us using BAPCo's products, and not even then. We know how BAPCo operates and in our opinion, given the amazing performance increases we have seen over the years from Intel - these tactics should really not be used. Yet, what is done, is done.

Benchmarks - Real World - 0:1

The Curious Case of Futuremark Corporation; One Benchmark to Rule Them All?
Futuremark Corporation is a company with an interesting past, since it was started by several members of Future Crew, the legendary demo group from the 1990s. The company split into multiple entities over the course of the past decade, but their products - 3DMark, PCMark and PowerMark are an interesting example of synthetic benchmarks mimicking real life.

For example, 3DMark is a synthetic 3D engine which swings the leadership from one generation of hardware to another, but the company has not support Intel's graphics hardware since 2006 and the arrival of DirectX 10. 3DMark Vantage was a native DirectX 10, while 3DMark11 is a native DirectX 11 benchmark. Up until Ivy Bridge, which Intel is releasing soon as "3rd Generation Core" processor, Intel did not have a graphics part that fully supports Microsoft‘s DirectX 11 API (Application Programming Interface) and as such was not available for a direct comparison versus contemporary graphics architectures from AMD and NVIDIA.

Even though the 3D engine was originally "synthetic", Futuremark released Shattered Horizon, a fully-featured 3D game which utilized 3DMark Vantage engine. As such, the game only works on Windows Vista and Windows 7, something that publishers did not particularly like. At the end of the day, the only viable competitor for 3DMark is Unigine's Heaven benchmark, which originally had ties to AMD. However, today the Unigine team is pretty much on its own, since it is a difficult benchmark for both AMD and NVIDIA, with victories being won blow by blow (low end, mainstream, high end… etc.).

Moving on from the 3D world there is a benchmark, PCMark, that combines elements from 3DMark with a large number of system-stress benchmarks which are based upon Windows behavior (Windows startup pattern for the hard drive, for example), confidential file encryption and decryption, video encoding and decoding and so on, and so forth. However, going head to head against SYSmark proved to be a difficult task, unlike 3DMark which practically has no competition in the synthetic 3D benchmark space.

There is also a third benchmark which is a recent newcomer into the field, PowerMark. PowerMark goes head to head against MobileMark, and it will be interesting to see whether it can gain a foothold in market. We have benchmarked an Ultrabook from Acer using PowerMark and MobileMark and decided that for future use, we will use PowerMark as a base for our productivity/battery suite for mobile computers.

Their fourth benchmark is Peacekeeper, a browser-based benchmark with a somewhat ironic name, since it caused a lot of PowerPoint Wars between practically every company that has a product that connects to the Internet. Even though Peacekeeper has its shortcomings, it is an overall good benchmark used for all devices in the field (smartphone, tablet, notebook, desktop, workstation etc.).

This year, Futuremark will complicate the matter further, since the company plans on releasing 3DMark for Apple iOS, Google Android, Microsoft Windows 8 and Windows RT, creating a unique benchmark that works on all platforms. This benchmark will mean that for the first time, you will be able to compare a smartphone to a desktop PC in terms of high performance. Futuremark plans to do the same thing with PCMark and PowerMark, so expect a lot of fireworks.

In terms of our Benchmark vs. Real World score, FutureMark brings the score roughly back to 1:1. Onto the real world...


© 2009 - 2013 Bright Side Of News*, All rights reserved.

1 | 2 | next >>>
© 2009 - 2013 Bright Side Of News*, All rights reserved.