Apple has chosen the Intel Xeon 5500 (8 core) and 3500 series (quad core) CPU (code named "Nehalem") for its newest generation Mac Pro. These CPUs use a revolutionary new architecture that paves the way for processors of the future and provide the new Mac Pros from early 2009 a hefty computational punch.
Nehalem die Credit: Chip Architect
The Nehalem series, based on a 45 nanometer (nm) process, has several, notable technical improvements that will be of interest to Mac Pro users, namely:
- Turbo Boost Technology. The CPU can, during heavy computational loads, boost its speed in increments of 133 MHz until the thermal threshold is reached. Each individual core can be boosted or the entire suite of cores at oncecan be accelerated. This is no substitute for the developer exploiting all the cores fully, but it is a useful stopgap. Apple says that a 2.93 GHz system can boost to 3.33 GHz (3 increments).
- Simultaneous Multithreading, SMT Also called hyper-threading, this feature allows a core to manage two virtual threads. Intel first experimented with this technology in the Pentium 4. So, under the best of circumstances, an eight core Mac Pro could be running 16 simultaneous threads. Developers are going to need all the help they can get from Apple's Grand Central technology to fully exploit a system like this.
- QuickPath Technology and Integrated Memory Controller. This is a highspeed, point-to-point interconnect that does away with the legacy Front Side Bus (FSB). Previous experience has shown that as the number of cores increase, and they compete for bandwidth on the FSB, what's called "bus contention" occurs, slowing down the system. Intel's Nehalem architecture does away with that bus contention and can triple the memory bandwidth compared to the older Intel Xeon 5400.
Some Technical Specifications
Here are some additional technical specifications of interest:
- Quad core Nehalem has a total of 731 million transistors
- Average power consumption for the X55xx series is 95 watts. However, at idle, the Nehalem uses about half the power of its predecessor, Harpertown and uses about 30 percent less power than Harpertown for the same overall performance.
- Up to three memory channels. That's why the Mac Pro base system ships in increments of 3 slots x 1 GB. However, the slight gain in speed of a 3 GB system vs the advantage of, say, a 4 GB system hasn't yet been quantified.
- 32 KB L1 instruction and data cache per core.
- 256 KB L2 cache per core.
- 8 MB L3 cache shared by all processors.
- SSE4 (4.2) SIMD (Single Instruction, Multiple Data) Vector Processor, Intel's equivalent of the old AltiVec/Velocity Engine in the IBM and Freescale CPUs. This is a suite of instructions, for the sake of simplicity, that allows a single instruction to operate on an array of data.
- Estimated speed of Dual Processor (8 core) core Mac Pro: 90 gigaflops. That's about as fast as the fastest Cray supercomputer from 1993.
Benchmarks, Short Version
Benchmarks designed for typical Mac users are easily found using suites like Photoshop and Cinebench where improvements from the previous generation Mac Pro ranged from 12 percent for Cinebench to over 2x for a Photoshop CS4 test of 25 actions on a 500 MB image. Apple's own tests suggest a speed of about twice the previous generation Mac Pro in specific tests.
Awhile back, Macintouch compared the previous generation Mac Pros to the older G5 Power Macs using a wide variety of tests. Taken together, these tests provide an overall a feel for the span from the dual processor 2.0 GHz Power Mac G5, late 2003, to the current Mac Pros.
Benchmarks test a wide range of a system's capabilities: the memory bandwidth, I/O bandwidth, scalar processor performance, vector processor performance, utilization of threads, and so on. As a result it's not easy to put systems on a common ground, and one has to dig to find benchmarks that put all systems on a common ground. That's a subject for a future article.
One of my own favorite tests is the single threaded, minimal I/O, floating point computational punch when computing 100 million sines. Here's the data I've accumulated over the years, using a common Perl script and run with processor speed on high and power connected. It's presented for informational purposes only.
- PowerBook G4, Titanium, 500 MHz ... 171 sec
- PowerBook G4, 17-inch, 1.5 GHz ... 88 sec
- PowerMac G5, dual processor, 2.0 GHz ... 40 sec
- MacBook Pro, unibody, 2.4 GHz ... 20 sec
- Mac Pro, early 2009, quad, 2.66 GHz ... 15 sec
As I mentioned, this test is highly focused on a single threaded, scalar, floating point calculation that ignores many (most) components of overall system design. However, the test does put the simple number crunching capability of all the Macs I've tested in perspective.
Mac Pro, Early 2009
The Nehalem technology is a significant step forward for Intel and Apple. While there will always be future technological developments, such as the next generation "Westmere" with a 32 nm process, Apple customers who've been looking to replace an aging PowerMac G5s and even first generation Mac Pros stand to benefit substantially from these early 2009 Mac Pros.