Recently, there?s been a bit of a stir in the online tech community over the numbers being reported for the battery life on laptops, notebooks, and netbooks. While this isn?t the first time such numbers have been questioned, it seems to have suddenly resonated with many consumers and now it seems like everyone is up in arms about the lack of realistic battery benchmarks.

Much of this fuss stems from a few blog posts made by a few people over at AMD and the Twitter community surrounding them, but that?s merely where the spark originated. The current distain for outrageous battery life claims certainly is not confined to AMD supporters, or even the PC community. In fact, the issue that seems to have finally been the trigger comes from an Apple benchmark of the battery life of the latest MacBook, which reached over 7 hours. Critics quickly pointed out that a number of conditions for the benchmark did not seem realistic, such as disabling wireless and having the screen set at 60 nits of brightness. AnandTech did a real-world test of Apple’s latest MacBook batch and saw battery results of well over five hours – which is brilliant in real world, but not in paper world where manufacturers promise that and fall short by an hour or two.

AMD?s Patrick Moorhead decided to see whether this was realistic, and did some real world testing of devices on his own, and then posted the results on his blog about MobileMark 2007. Moorhead wasn’t even the first voice at AMD to be raised, others such as Nigel Dessau and Hal Speed had been talking about it as well. But what really resonated with people was the confusion over the terminology and the hands-on testing that showed numbers in reality that didn?t come anywhere near the benchmark standards.

But just as the outrage over ludicrously inflated claims of battery life is not confined to AMD, the issue of unrealistic benchmarking of battery life does not come from Apple. The problem is pervasive throughout the industry, with many manufacturers claiming battery life numbers hours longer than what consumers say they see in real life. Instead, the focus has been on the BAPCo benchmark, MobileMark 2007, which most notebook manufacturers use as the standard for determining battery life. It?s MobileMark 2007 which actually sets the standard at 60 nits, not Apple. It also tests a particularly narrow set of applications, and not continuously but intermittently, according to BAPCo?s own whitepaper [BAPco MobileMark 2007 Whitepaper PDF download]:

What Is a Nit and Why Be Picky About It?

One of the key issues is the lack of understanding of terminology such as nits. A quick Wikipedia search will give anyone the definition, which is ?candela per meters square?, which is really only helpful for those who do things by candlelight in countries using the metric system. Unfortunately these conditions are not typical in many notebook purchasing markets. For the rest, a comparison to real world situations would be more helpful. For example, an office table in a room lit by artificial light is about 60 nits, and a white paper on it about 80. Road signs need to be lit with about 30 to 150 nits to be visible at night, and about 1000 if the sign is in direct sunlight.

But we still have a bit of a problem because a nit, or cd/m2 as many monitor manufacturers would write it, is an absolute measure of brightness. Unfortunately, most people have eyes that dynamically adjust to the environment. So for example, a 300 nit screen may be blindingly bright in a pitch black room, but completely unreadable in direct sunlight. So when MobileMark 2007 sets a value of 60 nits for benchmarking, that?s useful in that it means devices tested will use about the same amount of energy to light their screen. But it?s completely useless to a consumer who does not use their laptop in the dark most of the time.

For those who are curious about the amount of nits flying through the air in their environment but don?t want to buy expensive meters, Kodak has a guide of how to get a crude estimate of it using virtually any digital camera with an Kodak exposure meter: The only other thing you’ll need is a calibrated object to point the camera at, such as an 18% Kodak Gray Card, which typically sells for under $20:

Several online polls have been run in response to the concerns that laptops are not tested in conditions similar to how people use them. Most of the polls so far have shown the majority of users have their screen brightness set at 80-100% virtually all the time. The reason for this is obvious, screens don’t become uncomfortably bright in most situations, and so people will turn the screen up as high as it goes to improve readability. The main exception to this case is when they are desperately trying to save power, in which case they usually set it to minimum. Rarely are settings in the middle used because people are rarely in a situation where they will trade comfort for run time. Those who are will often just buy a larger or second battery.

How to Win Benchmarks and Influence People
As long as people continue to make decisions based on quick and easy numbers, scores, and pro/con lists, there will be benchmarks. And as long as there are benchmarks, there will be attempts to cheat on them, or at least bend the rules favorably. This has been true since time immemorial. People are busy, they don?t want to read a thousand words they want to see a number or a graphic that tells them how a product compares to what they have or its competitors. Reviewers constantly struggle to be portrayed as relevant, yet unbiased. But the reality is no review is truly objective. Hence the popularity of benchmarks which claim to be truly objective voices in the comparison of products.

Knowing the enemy… MobileMark 2007

If we take a look at the benchmark in question, BAPCo?s MobileMark 2007, there are some interesting things about it. For one, it tests a suite of applications that would likely raise eyebrows among even the most overworked employees with regard to the applications it uses. Two of the modules of the benchmark are just DVD playback using InterVideo WinDVD, and reading files in Adobe Reader, but the one most commonly used in testing is the productivity section, which uses the following application list:

  • MS Project 2003
  • MS Excel 2003
  • MS Outlook 2003
  • MS Powerpoint 2003
  • MS Word 2003
  • Winzip Pro 10.1
  • Adobe Photoshop CS2
  • Adobe Illustrator CS2
  • Adobe Flash 8

Now, what we need to ask is whether this is a reasonable span of applications to simulate an average user?s workload. Well, even if we ignore the nearly $3000 price tag for that set of software, it still seems pretty unlikely! For one thing, there?s not a single web browser on the list, which is the primary application for many people. Second, it seems a bit implausible that a worker who does a lot of work in Photoshop and Illustrator and Flash would be doing much with Excel, Project, and Word. While graphic design apps like Photoshop and Illustrator are often the primary concern for their users, they generally form a niche group that does not perform nearly the same type of tasks as those who primarily work with spreadsheets like Excel or do heavy typing in Word. Absent from the list are any multimedia apps or games, but this can be excused if not forgiven due to the labeling as a ?productivity? benchmark and that DVD playback is done as a separate test. However, their absence means that the benchmark is measuring a specific type of task rather than simulating a real person?s real world use as it claims.

Also, none of the applications on this list are network-centric. That is, they all tend to run primarily with local data. They don?t fetch pages like a browser, they don?t stream heavily over a LAN connection like remote desktop. This is a significant issue because it means that a laptop trying to look good on the benchmark can effectively disable its network adapters during the test. Or at least keep them in power saving mode. This is a big factor, especially in the case of high speed wireless networks, which can consume a large amount of power. Most notebook users tend to have the wireless active virtually all the time, usually due to browsing but also often due to things such as automatic updates, streaming audio or video, or network-based collaboration.

Finally, another major issue comes from BAPCo?s own design of the benchmark, which according to their whitepaper incorporates significant amounts of ?think time?. This is time when the system is left idle, to simulate when a user would be reading the screen, leaving their desk, or performing other tasks. While this at first sounds like a good idea, in reality excessive idle time is not a good measure of laptop battery use because most users would either suspend or hibernate, or close the lid and thus turn off the backlight. This changes the power usage significantly, and most people do not count suspended time towards what they consider ?useable time? of a laptop. A machine idling for 4-5 hours does not have a perceptive five hour life time to a person – it has a perceptive one hour life time because there?s no point in counting time during which a machine is not used. It also turns out that this kind of sporadic usage pattern is an extremely favorable case for batteries, due to factors we?ll discuss coming up in Part 2.