Hidden Dimensions - Everyone Needs a Supercomputer, Part I

by John Martellaro
June 19th, 2006

"The definition of a civilized computing platform: the time required to learn how to use it is less than the time required to complete a major simulation on it."

- Bill Buzbee, Arctic Region Supercomputing Center

The Beginnings

It was the summer of 2003, and university staff members in the computer science department at Virginia Polytechnic Institute in Blacksburg, Virginia had just heard Steve Jobs announce the G5 at WWDC. The new Power Mac G5 had a rather masculine, sharp edged enclosure, and it contained not one but two IBM PowerPC 970 processors running at 2 GHz. The Virginia Tech people weren't the only ones who went crazy hearing the G5 announcement, but they were the only ones who started thinking about building a thousand node [1] supercomputer with the G5. This was because the PPC 970 announced that day had a very favorable metric: CPU performance per watt of heat output per dollar spent. It also had some really nice CPU architectural pluses.

Of course, we all know how that project turned out. It was a like a short-handed, winning goal in game seven of the Stanley cup finals. Virgina Tech's supercomputer, "System X," was built in record time for a mere $5.2M, and while it had some minor initial problems, it has to be considered, in hindsight, a major success, a milestone in what is formally called High Performance Computing (HPC).

By the way, HPC is defined as a branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. HPC typically works at the highest technical level of computing and tackles the most difficult computational problems known to man.

However, when Apple was first contacted by Virginia Tech about building a supercomputer utilizing Power Macs with this new IBM chip, there was some appropriate concern. Apple is, after all, a personal computer manufacturer. The prospect of a university connecting over a thousand Macs together with a rather new and exotic switching fabric called InfiniBand could be risky. A failure could damage Apple's image even if Apple did nothing but sell the computers and keep their distance. Apple wasn't sure if Virginia Tech could pull it off.

In time, Apple executives became convinced that the Virginia Tech people knew what they were doing and that the great PR for this new generation of IBM powered Macs would outweigh the small risk of failure. In fact, VT had the technical talent to insure success and had a very well thought out plan for construction, power management, cooling, emergency shutoff [2], and InfiniBand assistance from Mellanox.

System X was stood up (in HPC language) with the help of pizza labor. "You work all day stacking servers, we feed you pizza." This went a very long way towards keeping the cost down. By November 2003, Virginia Tech's "Terascale Computing Facility" was completed and System X was benchmarked at 10.3 teraflops. This turned out to be the fastest university computer on the planet. The only supercomputer in the United States that was faster (using the commonly accepted but also controversial LINPACK benchmark) was at the Los Alamos National Laboratory in New Mexico. There was, to put it mildly, a considerable buzz at the annual SuperComputing 2003 Conference held in Phoenix that year, and HPC Wire, an important publication in the HPC community, gave Apple several awards for its hardware technology at that event.

Apple was off and running in the world of High Performance Computing. There was a lot of glory and image associated with the VT project. To the mild irritation of many and to the great delight of a few in the HPC industry, the third fastest supercomputer on the planet was running on Macintoshes. The impact on HPC scientists at universities and in the federal government was enormous. Shortly thereafter, the U.S. Army purchased an even larger Apple supercomputer.

I must insert here a minor detail in the interest of technical accuracy. Supercomputing is typically done with rack mounted units [3], and those units require, for the kinds of calculations they do, something called error correction code (ECC) memory. Virgina Tech switched from Power Macs to 1U Xserves with ECC memory shortly thereafter. Why they didn't just wait for the G5 Xserve to ship is something I can't get into.

So far, so good.

Backspace

To understand what happened next, we have to go back and look at Apple's culture and their development as a manufacturer of brilliantly designed personal computers that are based on the UNIX operating system. For example, there was a time in 1999-2001 when there was a fairly healthy market for Macintoshes as scientific workstations running UNIX, but that UNIX was not Mac OS X. Mac OS X wouldn't ship until March, 2001. Rather, there were some Apple salesmen who were promoting Yellow Dog Linux (YDL) on G4 desktops. (YDL, at that time, was a port of Red Hat Linux for the PowerPC supplied by a great company in Colorado, Terra Soft Solutions.) Those sales people were happy to sell G4 Macs, and the customers were happy to install YDL after the fact.

This early customer use of Unix on Mac, in my experience, was very successful because it appealed to scientists who had great pride in the quality of their work and their tools. UNIX, of any kind, running on a beautiful G4 tower (with a vector processor!) in 2000 was substantially cooler, technically capable, and more productive than Windows 98. As a result, the technical and marketing seeds were sown for Apple's introduction of their own BSD UNIX OS. When combined with the fabulous PPC 970 chip in 2003, many scientists all over the world basically went into a severe state of euphoria. The Power Mac G5 started selling very well at major research institutions and in the federal government.

In concert with that, a considerable amount of effort was put into marketing to scientists and convincing Apple developers to bring their scientific software to Mac OS X. Apple marketing worked long hours, built a fabulous Web site participated with a booth and presentations at the most important bioinformatics conferences, and started to put more and more energy into the premiere event in HPC, called SuperComputing held every November in a different city.

The Market Reaction

Being a marketing and image driven company, Apple leveraged this initial success very strongly and, as a result, sent a message to customers that Apple was serious about HPC.

So it was not surprising that in 2003-2004, HPC scientists that I engaged were starting to express an interest in Apple as a company that could not only supply inexpensive, capable supercomputers but also make HPC life easier for them.

The largest supercomputers are like dragons. Euphemistically, they're cranky, breathe fire, belch, and need an army of servants to keep them happy, fed, and productive. To make matters worse, to ensure the consistency and migration of huge code bases, software is written in Fortran [4] and often edited with, can you believe, "vi". Command line FTP is common. That's all because there is no universal UNIX UI that spans all platforms except the UNIX shells. Government scientists at conferences I attended spoke of missing Christmas and their kid's birthdays due to the awkwardness of their tools and the demands and workloads of critical national defense programs.

This huge outpouring of good faith was something new for Apple. Typically, in the office environment, Apple has to play catchup, focus on specific segments, or just grab the crumbs. But UNIX is the lingua franca of supercomputers, and so Apple fit right in with the HPC community and their customary vendors Cray, IBM and SGI. [5]

Moreover, there was the hope that if Apple could do for supercomputing what they did for ease of use on the Macintosh desktop, the lives of scientists would improve. I sometimes heard suggestions that some HPC vendors were complacent and new blood would be good for all concerned.

Another demand that supercomputing facilities have is great planning. The men and women who build and operate supercomputers at universities and government facilities can spend their entire careers dedicated to a computational facility or project. Many have doctorates because the technical level of understanding in advanced scientific computation required to support Grand Challenge-like problems is enormous. So when they're thinking about building the next generation system, not only do they need to know what's for sale, they need to know what's coming down the road to make sure their million line programs can migrate and remain certified accurate [6]. There is a huge amount of planning, technical analysis, and perhaps simulation to be done before a supercomputer is built and its successor contemplated.

Apple of course traditionally keeps its computer roadmaps secret. We all know the obvious advantages to this from a marketing standpoint. But an additional advantage is that such secrecy and resistance to promised roadmaps allows Apple great flexibility to change its course. Specifically, if Apple had contractually promised, say, for the sake of argument, dual or quad-core G5s in the Xserve and a related IBM roadmap for five years to Government computing customers, they would have never had the flexibility to convert Apple's line as quickly as they did to Intel CPUs.

The near-term result of the Intel switchover has been somewhat of a lull with Xserves, Apple and supercomputing. The G5 Xserve was unveiled by Apple in January 2004, and the only significant upgrade has been increasing the clock from 2.0 to 2.3 GHz. Some scientists have wondered about this lull and what they can expect from Apple and HPC in the future.

Where might Apple be going next in supercomputing? And what are the opportunities and challenges? I'll continue that discussion in Part II next time.




[1] A node is single rack mounted unit.  It may contain 1 or more CPUs.

[2]. With 1100 Macs pulling on the average of 2 amps each, a cooling failure
would mean that the ambient temperature would rise from 65F to well over in 100F
in just a few seconds creating a safety issue. This is typical of
any supercomputing facility. In fact, Xserves run cooler and require
less power than most other similar products.

[3] Measured in vertical 1.75-inch units. So 1.75-inch high is "1U", 3.5-inch high is "2U", etc.

[4] A lot is still Fortran 77 for the sake of compatibility and migration.
Fortran and C are the primary languages for supercomputing problems;
the investment must be hundreds of millions of lines of code.

[5] Sun Microsystems has faded in this market. SGI is in bankruptcy. Cray is not exactly swimming
in money, but things are improving.  IBM is doing well thanks to the Blue Gene system. 

[6] I'll say more about that challenge next time.