Faster Is Better
C is fast. Faster than most (all?) other popular programming languages. Fast enough that, earlier this year Daniel Jalkut called it “the new assembly language” (courtesy Daring Fireball), by which he probably meant that when you need some part of your app to be especially fast (say, a critical graphics function), you write that part in C, but otherwise you use a scripting language, or at least an OO language like ObjC or C++.
It’s easy to think that C++ is just as fast, because “the compilers are so good these days, they can turn my C++ code into machine code that’s probably just as speedy.” But that will be true only if you mix C into your C++ to the right extent. Using C++ as the OO system it was intended to be is not a recipe for C-like speed.
Here’s a simple array access in C:
x[i]
When compiled, how does it translate into machine code? Something like this fictional (pseudocode) assembly:
LOAD REG1 with i
LOAD REG2 with (x) indexed by REG1
Pretty simple.
But what if x is an OO array? How does this:
x[i].value
translate into machine code? My guess is that it becomes something like this:
LOAD REG1 with i
LOAD REG2 with x-relative index of x’s upper-bound value
LOAD REG3 with (x) indexed by REG2
COMPARE REG1 with REG3
BRANCH-IF-LESS to ErrorHandler
LOAD REG2 with x-relative index of x’s lower-bound value
LOAD REG3 with (x) indexed by REG2
COMPARE REG1 with REG3
BRANCH-IF-GREATER to ErrorHandler
LOAD REG2 with index of x’s element-location array
LOAD REG3 with (REG2) indexed by REG1
LOAD REG2 with (REG3) indexed by #[element-relative position of value]
(The above code assumes that the OO array x contains an element-location array — if instead the array is a linked-list where each element includes a pointer to the next element, then the above pseudo code is very overoptimistic.)
Assuming, arguendo, that each machine instruction takes about the same amount of time to execute, the OO-generated code takes about 6 times as long to execute (12 vs. 2 instructions). Or to put it another way, 83% of the processing power is being wasted on OO overhead.
How fast do processors need to get before it’s OK for your app to be wasting 83% of its processing power dealing with OO structures? Answer: A lot faster than they are today. Or will be tomorrow. We’re always coming up with new ways to challenge the speed of our current processors, and there’s no sign that we won’t continue to do so far into the future.
OK, you might say, but surely there are many common apps that don’t need to squeeze super speed out of the processor, right? For example, how fast does a word processor really need to be? Maybe it needs maximum speed when searching large documents for a specified phrase, but when you’re just merrily typing some text, the app is probably idle most of the time, so who cares how efficient the code is, right?
Wrong. When the user is typing text in a word processor, the time the user spends thinking about what to type next is indeed a processing eternity, and no special speed is needed for that. But no code is needed for that either, because an idle app isn’t running code. So it doesn’t matter how achingly long the delay is before the user presses a key — the relevant question is: What happens when the user does press a key?
Keep in mind that any half-ass touch-typist user is going to be pressing another key in maybe 0.1 seconds. Ideally, to make the typing process seem snappy and responsive to that user, the word processing app should update the window contents within a fraction of that 0.1 seconds — say, one quarter of it, which would be 0.025 seconds. So, even if the just-pressed key causes a cascading re-format of the entire paragraph and other text below the insertion point, the change should be almost instantaneous: all finished in just 0.025 seconds. And this must happen even as screen resolutions increase dramatically as they seem about to do in the next several years.
Even an allegedly processor-unhungry app like a word processor needs all the speed it can get. And so does every other app — really, the user wants everything to happen instantly.
Faster is better. Faster processors don’t change that equation, they just draw more tasks into the category of “tolerably fast” that previously resided in the “intolerably slow” bucket. Plus, I don’t think I’ve seen an app yet that didn’t exhibit a noticeable delay when under some computationally stressful situation.
And even if your app really doesn’t do anything processor-demanding enough to need C for near-instantanous reaction to all user input — wouldn’t it be nice if your app left more processing power for other apps and OS processes to use? And what happens if your app is sharing processing power with some other, processor-ravenous app? Every cycle counts.
Update 2008.01.07 — Fixed grammar of “extent” sentence.
