I cannot believe...
...that such a big deal is being made over these particular benchmarks. After thinking about the nature of these particular benchmarks, I could have predicted the results. And the guys at Bare Feats should know this too, if they are going to be in the business of comparing performance they should be aware of what aspect of the system is being tested by any particular benchmark.
First of all, a special note to all of you seem to think that these benchmarks prove the DDR implementation is worthless; remember, independent of the DDR, the new machine does have a 25% faster system bus, so it can definitely talk to the RAM 25% faster than the old machine, and as the Bare Feats article mentioned this didn't seem to make a difference either. Does this clue you in that there is something about these benchmarks that makes them less than ideal to measuer the effective RAM bandwidth?
Let's take the Altivec Carbon Fractal benchmark first, that yielded identical times across the board. I'm the author of a fractal program myself and I know what's involved. This doesn't surprise me at all.
What a fractal program does, in simple terms, is walk across a set of points, an image if you will. for each pixel in the image, it repeats a simple set of arithemetic calculations over and over again; for some pixels this can be repeated hundreds of times. So, for instance, you might have something like:
newx = x*x - y*y + cx
newy = 2xy + cy
x = newx
y = newy
in a loop repeated many times for each point.
For something like this 99.999% of the time is spent by the CPU doing calculations, not accessing memory; in fact the code and a large part of the image will reside in the cache throughout the calculation, and I'm not even talking about the L3 cache, most of the time it is accessing the registers and the on-chip cache!
Well, you can see why this would be a very good test of CPU power and floating point speed in particular. You couldn't come up with something that was more CPU intensive. Or, less memory bandwidth intensive. This kind of algorithm spends a miniscule amount of the total time left reading from or writing to RAM anyway; if you made the memory bandwidth infinite it would barely have made a difference.
The other benchmarks suffer from a similar criticism. These things are all quite CPU intensive, although not quite as single-mindedly as fractal calculation, and they spend a small amount of time moving data to the internal cache compared to the time they spend crunching the data once it is in the cache. Apple likes to show off CPU-intensive software like PhotoShop filters, at least if it has been Altivec-optimized, because it shows off the speed of the Altivec unit. But for that very reason, these tasks don't really test the performance of the rest of the system, but JUST the CPU. The only thing that would speed them up is increasing the clock speed of the CPU.
So why bother with anything other than speeding up the CPU clock speed? Because we don't spend our entire life running fractal programs and PhotoShop filters. It is great to have these things speeded up, because they are so time consuming, but most of our time we are doing other things too, like web browsing, text editing, etc.
So where would we see the 25% system bus increase? Well, if you are a PhotoShop user, maybe something as simple as scrolling through a very large image would seem faster. That is a task that requires a lot of data being shifted through memory and relatively little CPU execution time. That's the sort of thing that might not seem much faster if you had a 4GHz CPU but would seem faster with a higher memory bandwidth.
And where would we see the improvement from Apple's DDR implementation, since it doesn't increase bandwidhth to the CPU? Well, since it does allow other parts of the system to do DMA without necessarily stealing bandwidth from the CPU (or from other parts of the system) maybe you would see it when, say doing a big file transfer and simultaneously watching a full screen movie without seeing a glitch in the movie. (Probably you can do that on the old dual 1GHz too, but if you keep adding on real time tasks you can bring any system to its knees, perhaps the DDR implementation in the new machines would push that point further out). So the DDR may enable the system to have more things happen reliably in real time when some of them don't require intense CPU intervention.
That seems like a worthwhile improvement to me, but how do you benchmark it? There probably is a way to do it, maybe running several benchmarks simultaneously, some of which exercise the I/O; but just running CPU-intensive benchmarks is not going to do it.
(I think I might send a copy of this to Bare Feats and see what they have to say).