We are currently faced with too many questions and everybody from the scientific community to us, regular people, are looking for answers. We are faced with questions touching on the human lifespan and how we can prevent diseases - to questions about our place in the universe and what it looks like… and in order to find the answers to these perplexing questions, we need A LOT of computing power.
Enter the realm of GPGPU or GPU Computing. Our passion for playing in virtual worlds has lead companies such as ATI and Nvidia to create graphics chips that surpass general-purpose chips by orders of magnitude at times exceeding 10x, 100x, somewhere even 1500x. This performance resulted in GPU chips starting to become household names in the world of scientific computing. Naturally, with every competition comes the ultimate question “Who has the best GPU Computing chip out there?” The answer to that question has sparked a lot of controversy, and there is no easy answer.
In order to give you at least one part of the answer, we took a look at what communities can do to accelerate the distributed computing effort. We have interviewed Gipsel a.k.a. Andreas Przystawik, from the Planet3DNow! Community. This bright programmer caused quite a stir when he optimized the Milkyway@Home client for GPU Computing. Currently, only Milkyway@Home, Einstein@Home and SETI@Home are known to offer their source code, which was the basis for Gipsel’s GPGPU effort. If you know of other projects, let us know firstname.lastname@example.org or email@example.com.
BSN*: Greetings Mr. Przystawik, for the very first question - could you tell us about your background and programming skills prior to optimizing the Milkyway@Home client?
Gipsel: I had my very first contact with some Basic on a KC85 at an age of 11. But I guess that doesn’t count. Actually, I am not a really experienced programmer. I had computer science classes in school learning some Pascal. Later, at the university I’ve heard a lecture “Algorithms and Data structures” at the Information Technology faculty - for one semester only. During that time I started with some C programming in my spare time. No big projects, it was just for fun - mostly quite low level stuff (involving assembler) as I was mainly interested in the hardware and its possibilities. But I have not much time for it during my studies (Physics). So I was not doing any such things for about five years. And then came Milkyway@Home.
BSN*: What kind of hardware did you have available at the time when you optimized the client?
Gipsel: I started with my normal PC. It was an old AMD Athlon XP 1800+, AMD760-based motherboard and Nvidia GeForce4 MX graphics card. Later, when there was the need to test some SSE2/3 binaries, I switched to a Phenom X4 9750 on a 780G chipset (using the integrated graphics e.g. Radeon 3200). I’m still working with this box. All the GPU stuff was also done on it. Unfortunately, I don’t have a recent graphics card (never needed one so far), meaning I don’t have the possibility to run and test the GPU applications by myself.
BSN*: What performance gain was achieved after you completed the optimization work on the code?
Gipsel: It depends on what are you comparing. The original source code of Milkyway@Home was grossly inefficient; it simply wasted a lot of time. The first things to do were no optimizations in a common sense; one had to clean up the algorithm Milkyway@Home is using. In the meantime most of my suggestions for improvements were implemented in the sources maintained by Milkyway@Home. That brought the calculation times for a workunit (WU) down in a massive way.
Using my CPU-optimized code, 65nm Core2 or a Phenom running at 3GHz will take just slightly above four minutes to crunch one of today’s short WUs. The stock applications distributed by the project are a bit slower; they take between about 10-18 minutes. In November 2008, it would have taken a full day for the same WUs on the same CPUs (MW uses longer WUs now). By doing my optimizations into account, Milkyway@Home experienced a speedup of factor 100 on the CPU alone.
But I think readers are mostly interested about the GPU application. ATI Radeon HD4870 completes the same WUs in only nine seconds. Since Quad-core CPU calculates four WUs at once, a 3GHz quad will effectively complete four WUs in about four minutes with the fastest WU. At the same time, ATI’s Radeon HD4870 will complete 25 WUs - six times the throughput for about the same price. Even a last-gen Radeon HD3800 will complete 8-10 WUs in four minutes, still more than double what a fast quad-core CPU can do. If you summarize all the improvements, you see that a single HD4870 is now doing more science than the whole project did couple of months ago! If you compare the beginning of the project with today’s situation, you could claim a gain from “one WU a day” on a single Core 2 processor @3GHz to almost 10,000 WUs a day with a HD4870 [this is a live testament what code optimization can achieve - imagine if every application would have such a dedicated code-optimizer - Ed.].
© 2009 - 2013 Bright Side Of News*, All rights reserved.