Editor’s note: This is the third part of our Batterygate Analysis, we recommend that you also read the first two parts:
Part I: The nits picking begins 
Part II: Lies, Damned Lies and Statistics

Analogies are like a box of chocolate – A mismatched array of ideas someone has tried to package as a whole by sugar-coating them with the same truth, which usually turns out a bit too dark and bitter to make it palatable, and usually more suited to the taste of the one giving it than the one it is given to.

If you’ve been reading about the computing world long enough, chances are you’ve been run over by a car analogy a few hundred times. Many people hate this tendency to constantly draw a parallel between PCs and automobiles, and there are several theories of why it is so common. Some say it’s "a guy thing", and that since computers and cars are traditionally seen as male-oriented pastimes, it’s an inevitable association. But few PC geeks are also motorheads. In some cases this is likely the order of magnitude difference in cost between the two, but cheap [and usually barely functional] cars are everywhere, much like computers in the average Linux user’s basement apartment.

The reality of the situation is that if you have to pick two expensive and overly complex machines that most people hate yet regard as a necessary evil and use daily with barely any understanding of their function, you’re probably only going to come up with a car, and a computer. Virtually no other objects in everyday use for the majority of people combine the same sense of awe, power, fear, frustration, and dependence in modern life.

And so, the inevitable car analogy barrels down on us like a runaway express train down the deck of a four stack steamship sailing towards an iceberg on a doomsday asteroid rendered with 64bit VLIW processors for the crappier of the two disaster movies of the exact same theme that seem to appear like clockwork every few summers.

Analogy Ho!

In the automobile industry, energy efficiency is not given as a simple abstract number based on flawed tests under conditions that are impossible for consumers to duplicate in the real world.

They use two numbers.

Ok let’s be fair. The city/highway MPG numbers aren’t so bad really. Well, not for the last year or two. Gone are a number of tricks like disabling features [no air conditioner], unrealistic environment controls [no wind, temperature control], special hand-built ringer models for testing, and using pure fuel without additives that consumers can’t buy [most gas in the US is E85, which means it has ethanol added] which were responsible for a pretty serious discrepancy between sticker and real world values in the past. After a bit of an outrage when Consumer Reports found over 90% of all cars underperformed their ratings, the tests were revised a bit. So yeah they’re better now. All it took was about four years of public shame of an entire industry by consumer advocate groups doing rigorous real-world testing.

Where were we? Oh yes, the analogy with laptops and batteries disabling features [like wireless], using controlled conditions [current rate, temperature], cherry picked review units, unrealistic benchmark simulations?

A New Label

So, given the strength of the analogy, it’s probably inevitable that we’ll soon be seeing laptops with some kind of energy label that attempts to provide some kind of useful information to the people buying it as to what its battery performance is like. Since the most glaring difference between the two situations is the single versus multiple number situations, we can probably expect to see at least two numbers on the new rating system.
But what should those numbers try to tell us? Well ideally we want numbers that reflect something like a real world scenario. Kind of like the idea behind the city and highway MPG ratings, in that people can estimate performance based on their pattern of use. But to do that, we need to define exactly what our equivalents of "city" and "highway" are going to be. For that to be effective, we need to come up with a reasonable compromise of test conditions, since it’s not possible to test every particular scenario.

Since computing is so ubiquitous these days, the type of jobs that make use of it and thus the type of work done on computers varies massively. Condensing this to a set of conditions to represent the average is trickier than it looks. Even if we were to poll thousands of people, the fact that the majority do task A and task B does not mean the majority does A+B.

For example, if the polls show that the majority uses a word processor, and another poll shows a majority use streaming audio, that doesn’t mean the majority of users listen to audio while typing. It could be only a few do, and the two groups only have a small overlap that tips the scales in each case. We still have to consider if it’s reasonable for certain applications to be simultaneous. Audio and browsing? Possibly. Photo editing and page layout? Probably. Spreadsheets and 3D modeling? Not so much.

The fact of the matter is we’re just not going to please all of the people all of the time. Not even close. So it’s probably going to take a few rounds of trial and error and a few revisions to get it right. There are going to be lots of proposals in this area in the near future. Here’s ours.

KWH Rating [gas tank size]
First off, there should be at least some indication of just pure kilowatt hours of the battery itself. In the last part we showed how this number can’t often be trusted, so this might seem like a bit of an odd thing to want. But there’s one good reason to have the actual kWh value of the battery shown where people can see it – manufacturers offer different batteries on the same laptop.

Just like how a car can have a different sized gas tank as an option, batteries come in different sizes. And even though tank size is an irrelevant comparison from car to car [other than cost per fill up which is analogous to recharge time], it’s one consumers often care about since as an option it gives a bit of wiggle room to trade off weight and run time.

If there are going to be tests of laptops, it’s important to know what battery was installed. Otherwise a really quick way to render the whole idea moot is to always send out machines with the largest optional battery installed to stack the battery life test. Seeing the same laptop make and model with different battery life ratings would be very confusing, so promotion of total kWh of battery as a visible stat is pretty important, and makes comparisons of the same laptop with different battery options easier.

Work Mode [City MPG]
This mode is designed to simulate average working conditions. A reasonably well lit indoor room or shaded outdoor area. Active system use [suspend for breaks longer than 5 min] and high speed wireless connectivity such as 3/4G or WiFi [with encryption please] are assumed. Ambient temperatures comfortable for hours of work [20-30C/68-86F] and a cloth resting surface should be used [do we need a spec for calibrated test pants?] An external mouse would be likely as a peripheral but an optical drive should not be assumed or required here.

  • Screen 100% bright – Let’s get this out of the way. If you’re reading a lot of text, you’re not going to be messing around with turning the screen down so you can squint at it. If anything performance is far more likely to be throttled back before screen brightness, and this isn’t going to change until screens can exceed 1000 nits, well into full-sunlight visible territory. While it’s tempting to say consumer education programs to "just turn it down one setting" might be effective if pursued [and perhaps should be], the possibility of a non-linear brightness scale with creep to attempt to cheat here is too likely, and the test is designed to err on the high side of power draw where applicable.
  • Browser ? Multiple tabs, at least one multimedia heavy (YouTube, Hulu, flash site). At least one with periodic autorefresh (forum, or intranet status page). The browser is pretty much the most universal application type in use today. As for which browser, I?m sure there will be a lively debate.
  • Streaming audio ? Doesn?t sound like a ?productivity? thing, but let?s face it: a lot of people prefer to zone out to their own music and find it helps them work and avoid distraction. This may not be so common in a cubicle farm, but if we?re discussing laptop use, then it?s more likely work is being done in a more private environment. Yes many people have portable music players, but generally if there?s network connectivity, people want to hear new stuff, and that implies streaming. Most portables aren?t optimal for that and many are currently crippled in this regard due to greed on the part of cellular network providers, so streaming audio on a laptop is a high probability as a concurrent task.
  • Constant network traffic, 1Mbps minimum ? Is this realistic? Yes! Between audio, sites with heavy multimedia content, and the annoying tendency of every program under the sun to just gorge itself on the network ?checking for updates? means that network use is nearly constant on any heavily used system. Operating systems, browsers, email, social network clients, bloated camera and printer drivers, game ?launchers?, are all guilty. Do we even need to mention the prevalence of bittorrent addiction? The network is in constant use, and with a laptop that means wireless in the vast majority of scenarios. Wireless is never off and almost never idle for most real world users.
  • At least one heavy, network-dependent application ? What does this mean? Most people who work on a computer, have a ?work app? that they focus around. Basically this is some form of editor they produce their work in. It may be a word processor, photo editor, spreadsheet, html editor, video editor, compiler IDE. Chances are very good this application is network-dependent. That is, most of the data files are loaded from and saved to some network somewhere. Maybe LAN, maybe Internet, maybe VPN. The point is generally large files are transferred with relative low frequency, as opposed to a browser-like pattern of small files with high frequency. Which application will probably be a big source of contention in the upcoming debate on the battery testing issue.

Video Mode [Highway MPG]
This mode is designed to simulate the opposite extreme, to provide a number for relatively low power use, and most likely single-task scenarios. The particular scenario in mind is one many laptop users are familiar with – entertainment during a long flight. This implies certain environmental conditions such as disabling of wireless features and a lower ambient light level which makes lower screen settings more comfortable. Also, the lack of potential AC sources means that conscious optimization for battery life by the user is more likely.

  • Screen 100 nits – Generally laptops are more likely to be in a darker environment when used for entertainment such as video watching. For one thing, work is typically done during sunlit hours. After-work relaxation typically includes a venue with lower brightness [like many restaurants or pubs]. And finally the airline flight scenario mentioned above typically has dimmed lighting, and work is less likely in this environment because of the lack of network connectivity and the modern dependence on it for most productive tasks. But seriously, 100 nits is a reasonable value. Nobody uses 60 nits this side of Pluto.
  • Video playback – This is a lot more complex than it sounds. Do we mean playback of a commercial disc like a DVD or BRD? Or from a file? The resolution and bitrate can be a factor in power consumption, as can features like GPU acceleration of decode. The former case implies use of an optical drive, which many devices now lack since once software is installed most data comes from a network or flash media. But in this scenario the network is not considered an option and there is no mass market distribution of video on a flash medium at this time.

To form a compromise in this regard, we propose a designation of the media used in the test. The video application can use DVD or BRD playback from an optical disc [either integrated or an external model], or local file playback of SD resolution [720×480] or HD resolution [1280x720p] h.264 encoded video, at discretion of the manufacturer. A subscript logo of DVD/Blu-Ray/SD/HD is then used to indicate which media format was used for testing.  

While it is unlikely that power demand for disc and file playback will be similar in even a majority of cases, the main focus is on the usage pattern of the majority of consumers. DVD remains far more common in use than BRD and as such, most people using an optical drive to watch media will likely use a DVD, but manufacturers of premium models would prefer to advertise BRD support.

Also, while many smaller devices like netbooks do not include optical drives and one might expect a user to favor a video file, many of these products currently cannot decode video with advanced compression smoothly, and the alternative is an external optical drive and DVD since most users do not have the technical skill to circumvent this limitation. There is also the issue of smaller laptops not having screens with sufficient resolution to display HD, rendering playback of such a sample an unfair performance demand that would not be meaningful to most consumers.

Splitting the test into multiple numeric results would likely be more confusing than just an additional indicator of the standard used in this case, which also serves to inform the buyer what kind of medium the manufacturer considers the laptop most optimized to display. Manufacturer designation of preferred video medium also allows hardware with special features [e.g. GPU accelerated h.264 decode] to make use of them for optimal battery life.

To Game or Not to Game?

And now we come to an even stickier issue. Gaming.

Gaming is the bane of the notebook industry. Most applications these days use only a fraction of the available power of modern hardware. But games continue to be popular, and continue to demand ever higher performance for 3D graphics and physics simulation. System designers are faced with a dilemma. Do they use high performance, power hungry parts that can drag down battery life when not fully used? Or do they risk lackluster reviews from gaming-centric reviewers influencing their sales? Usually, the solution is to have "gaming" laptops in the product lineup and then point to them as a solution whenever game performance of the mainstream is criticized, and take advantage of the fact that most users who buy a machine for gaming are rarely going to care about battery life, since most will want to be plugged in for maximum performance anyway.

But the question is, if our goal is to make a fair, realistic, and universal test for battery life, do we leave out an activity many users engage in? Or do we risk penalizing machines that were never designed to do this task? What about those which simply can’t, such as netbooks without support for the demanding 3D features modern games or benchmarks require?

Well, what good is an analogy if it doesn’t guide you when you’re lost? We already have a gas tank size, city MPG, and a highway MPG, so where does gaming fit? Is there an automotive analogy for an application that virtually all vehicles can be used for, but which is often ill advised or unpleasant for those not specifically built for it?

Sure there is! It’s called Off-Road!
Right now, the EPA does not have a separate indicator on the fuel efficiency label for off-road vehicles. This is due to a number of factors. The majority of car owners rarely drive off-road, and those that do rarely care about fuel economy in such a scenario, or if they do, prefer to test it themselves or by word of mouth from those with a similar driving environment, since off-road environments vary so widely.

So, does that mean we can’t have a Gaming score in the indicator? Absolutely not. However, one suggestion is to make sure it’s clearly distinguished as an extra score. This also means it’s probably better to only score laptops designed for gaming on this sub-test.

How do we decide if a laptop is designed for gaming? Well, we could take the manufacturer’s word for it, but that’s what got us in this battery life mess in the first place! So instead we should use some kind of objective qualification. If you ask people who review laptops what usually indicates a laptop is designed for gaming or not, the knee-jerk answer is usually "Intel Graphics".  This however, is not fair to Intel, as there exist other integrated graphics chipsets upon which gaming is unbearable. Plus chances are, even if an integrated chipset is suitable for modest gaming now, it won’t be for long, and it will only manage high quality settings on games several years old.

So the real answer of "To Game or Not to Game?" is itself a question: "Do you have discrete graphics?" This will likely draw ire from certain companies touting their integrated chipsets as "gaming worthy" [they’re not]. They should be ignored. Anyone actively looking for a gaming system is going to insist on a discrete part, and any system that has a discrete part will not show its true power use until that part is heavily loaded. So in fact not testing gaming on laptops with discrete graphics that do not claim to be for gaming is also unfair, as it effectively means some very power hungry hardware is never used [you can put your hand down, Photoshop CS4 guy, we know what laptop you’re buying anyway no matter what the benchmarks say].

So what’s in the Gaming test? This is another can of worms. But to be honest, say what you want about 3DMark, if it has done one thing it’s created a good stress-test for systems. Hopefully issues such as the lack of network traffic or enemy AI will be solved to allow it to become an even more accurate indicator of real-world performance, but for battery testing it is likely that the lack of network traffic will still be negligible [wireless should still be enabled]. So?

Gaming Mode [Off-Road]
Simulates gaming. Really, what more is there to say? The fact that it’s a laptop and we’re testing battery life and not performance negates most of the traditional reasons to fight over which benchmark to use.

  • Screen 100% – Do we even need to say this?
  • Wireless enabled – If you’re gaming on a laptop, you’re likely not that performance crazy that you care about wireless vs. wired latency. If you were you’d rent a forklift to haul your vapor phase rig down to the LAN instead of buying a laptop.
  • 3DMark – Latest version that runs on the hardware. Or if another benchmark becomes dominant, go with the most universally popular one. For battery testing, performance results are not important, and the test should just loop until the battery is exhausted.

Will There Ever Be Standards?

Probably. The way it works is people decide they need a "fair" or "objective" way to compare things. Various benchmarks are created, and their fans fight it out. Eventually reviewers who actually have to spend hours running these things get tired of pleasing different groups and force a de facto standard. With only one benchmark to worry about, hardware companies begin probing for ways to "optimize" [cheat]. Benchmark software developers compete to flatter OEMs in an attempt to maintain dominance or displace the standard. OEMs pressure reviewers to use their "approved" versions of benchmarks. Scores drift away from the real world, and people start complaining about it. Reviewers shrug and ask what they’re expected to do. People get riled up and decide they need a "fair" or "objective" way to compare things?

Once the consumers take matters into their own hands and this industry comes of age – its merely a matter of time until we begin to find a standardized method of battery life testing that satisfies the market.