Programming for Efficiency
foot shooting, dunce-hats, and failure

Sure, you want your program to run fast. So you learn all the secret magic that makes programs run fast, and apply it faithfully.

You tell everybody how well you have applied all the magic speed stuff. And wow it’s so fast! It must be, because of all that magic!

Then somebody comes along with a program that does the same thing twenty times faster. Clearly they cheated somehow.

learning programming on the playground

Most programmers pick up their magic speed tricks on the playground, like dirty secrets passed from one generation of urchins to the next. Most of these gems are grotesquely wrong, for various reasons. Here are a few that I've heard earnestly proclaimed by many innocents.

Pop quiz: which of these is even remotely correct?

Only one is. (Read on, to find the answer!)

All the others are various miscomprehensions, prejudices derived from some ancient mistake, partial truths, or…simply sour grapes.

Don’t believe me? Here’s the difference between me and you. You believed what somebody told you. I did some experiments.

why it’s so bad

Two years later you need to make a small change, but your code was built for speed, not for maintenance. You waste months of your precious life to make the little change, and inadvertently introduce a bug.

Your code is really quite slow, and everybody knows that you are living in a fantasy world about its speed. The most charitable of your colleagues feels sorry for you; the others think you are a fool.

“Premature optimization is the root of all evil.”

—attributed to Sir Tony Hoare

de-bunking

Function calls are slow

Well, function calls can incur some overhead (although modern inlined functions incur none). How much overhead? The answer is: unless you do something stupid, it is the time of a few arithmetic operations per call. It depends on how much data is passed in the arguments, and on the language.

You might think twice before calling a non-inlined function in the innermost loop of a number-crunching code. However, to use this small overhead as a reason for not making a function call for code that does hundreds of millions of operations, is sheer foolishness. It is the waste of a great deal of programmer's time, for no benefit whatever

Where did this priceless piece of wisdom come from? The story I heard was that, many decades ago, on machines that have long since ceased to exist, there were some broken compilers that failed to optimize within functions.

Division is slow

Well, yeah. floating-point division is in fact quite slow (about 10 times slower than multiplication) on many modern CPUs (notably Intel’s), due to choices made in the design of the instruction set. Some compilers will do various tricks, such as to replace division by multiplication by a reciprocal. As things stand, yes, one should be on the look-out for floating-point division in compute-intensive loops.

For speed, one should use preprocessor macros (in C, C++)

I don't have time to re-hash all the reasons not to use preprocessor macro functions. Do your own web search.

And—welcome to the 21st century!

The memory of the computer is really sequential, so for efficiency, one should program in terms of arrays.

There are times and places for arrays, and all modern languages feature this data structure somehow.

Although the memory in most modern computers is accessed as an integer address, this pronouncement a gross simplification of the efficiency issues of memory access.

First off, modern CPUs all feature multiple levels of memory caches, into which memory from RAM is always read before processing. Further, these caches are paged to effectively cache different areas of RAM, non-sequentially.

So, for doing some extremely simple kinds of calculation, data might flow most quickly by doing a sequential access, the infrastructure is designed to quickly access multiple areas of RAM.

For most applications however, the fastest algorithms must rely on sophisticated data structures which are far easier to represent by other means than arrays. The insistence on this one data structure is crippling.

Programming language X is slow, where X is not in {Assembler, FORTRAN}

It depends! These days, the compiler manufacturers build all their compilers on a single base, so there is very little speed difference between similar code written in FORTRAN and C, and compiled with compilers from the same suite.

Everybody knows Java is slow. But how slow is it?

These days, typically, low-level (e.g. numerical array arithmetic) code runs some 2 to 4 times slower in Java than the equivalent code in optimized C. This depends strongly on the Java Virtual Machine (VM) on which the code is running, however. The old IBM Java 2 1.3.1 virtual machine ran arithmetic code at speeds almost the same as C.

Of course, Java simply does more than C. It always does array bounds checking, for example, and that takes a little time. But this is not why Java is slower: the reason for that is social.

Programming methodology X is slow, where X is in {structured, object-oriented, (or really, anything)}

This is simple philistinism.

Programming methodologies were developed to tackle real-life, tough programming problems. Any experienced programmer has had personal experience with those problems and appreciates the effort to address them.

Any programming methodology is better than no programming methodology.

advice

flexibility first

If you know you will have a specific performance issue, investigate various possible solutions, and leave the possibility of implementing any particular solution open.

Do not build everything around a particular solution—to do so is a form of inappropriate association. (Ask yourself: is the program about the particular algorithm, or about getting the problem solved effectively?)

read

Pick up some books on effective programming and read them.

inline functions not macros

There is no longer any reason to use macro functions in programming. In all C and C++ compilers of the last decade, provide function inlining, which has all the advantages of macros without their pitfalls.

Macro programming is now to be considered the worst style: sloppy, error-prone, obfuscational, counterproductive, and ultimately, slow.

use a profiler

If your program runs slow, learn how to use a profiler. A pass of a slow program in a profiler will usually indicate the bulk of the time being taken by something easily fixed, often having nothing to do with the primary function of the program.

write clean code

If you write the most numerically intensive parts of your code cleanly, the compiler has the best chance of doing a good job of optimization for you. And of course it will then be much easier to see if you’re doing something stupid.

optimize from the top

The biggest improvements in speed are not to be had at the level of the innermost loops, but in the overall algorithm level. By writing clean, flexible code, you will find it much easier to replace a top-level algorithm, and in fact, much easier to implement any particular algorithm.

consider the economics of optimization

A factor of 10% improvement that takes you a year to implement is unlikely to impress anybody.

A factor of 10 improvement in speed, in code that is only run once, and finishes before the screen redraws, is a huge waste of your time.

Run some tests, before you spend days implementing a speed-improvement measure, to see if and how much your efforts will really pay off.

Make sure your efforts result in a net savings of time for people (when your own time for the effort is included).

social stuff

the reason for FORTRAN’s speed

The community using FORTRAN cares more about that last bit of efficiency than other communities do. Also, FORTRAN intrinsically does less than most other languages, so at some level it is easier to implement code optimization.

For these reasons FORTRAN compilers have historically been rather faster than C and C++ compilers. Recently, however, the difference has largely disappeared, due to each manufacturer building all its compilers on a common base.

numerical math

The numerical mathematics community build almost all their algorithms around a computer model, where memory was a big array. For many simple problems, such algorithms are probably in fact optimal. But for many problems, especially complex ones, no easily implemented algorithm involving just arrays is going to be very efficient.

FORTRAN 77 was a great language at the time (1957), but by modern standards, among its many weaknesses is an extremely limited notion of data structure (essentially, consisting of arrays of numbers). The unfortunate fact is that the vast bulk of numerical algorithms are built around this limited notion—a case of fitting problems to a solution.

“If the only tool you have is a hammer,
you tend to see every problem as a nail.”

—attributed to Abraham Maslow

supercomputers

The biggest, “fastest” computers in the world are usually specified by people who have the least idea of what would really make a computer fast. So they naturally say, give me the fastest clock speed available!

The press and the politicians are very impressed by how loud the machine is, and how uncomfortable the computer room is. That must be a really powerful computer!

But of course, if the primary application being run is intrinsically bound by, say bus speed, or network speed, or something else, it will run no faster with the fastest CPUs than with CPUs 20% slower but half the price. (But the administrators and the politicians couldn’t care less.)