Sam Gentle.com

Efficiency

In many cases you can guarantee that there will be a solution to a given problem, as long as it falls within certain bounds. For example, the problem of "my computer should do this thing" for most values of "thing" is guaranteed to have a solution. Indeed, perhaps the biggest frustration in software development is when someone tells you "look, I know that a solution to this problem would look roughly like this, so why can't you just do that?" It's hard to explain that the challenge of software isn't finding any solution, it's finding a good solution within an infinite space of bad ones.

I like to make analogies between software and physical engineering a lot, because people have better intuitions about physical problems and tools. One hard problem in architectural engineering is making a really tall building. But that's not hard at all! Anyone can make a tall building given enough resources. Here's one solution: take a lot of material and keep putting bits of it on top of other bits. Every time the resulting structure is unstable, add more material to the sides. Bonus: the material will add itself to the sides if you make it too tall.

It turns out what you really want from your skyscraper is a bit more complicated than just making it tall. You want one that is not too wide because you have limited land area to work with. You also have limited materials and time to work with. Most importantly, you want a building that doesn't cost too much. Many problems are trivial to solve if you can just commit infinite resources to them, but practicality dictates that you have to work with less. What you want is the solution that gives you the most of what you want for the least resources. Which is to say, the best solution is the most efficient solution.

In software, efficiency is mostly used to talk about resources like processor time, storage space, and, more recently, energy usage. Those are the resources the software consumes when it's running. However, when creating software, it makes sense to think about the resources that are used for its creation. Those resources are developer time, developer working memory, and amount of code.

Developer time is probably the most well-understood and agreed-upon resource in software development. If you write one function, it will take an hour. If you write two functions, it will take two hours, that kind of thing. For a long time, it was thought that you could draw a linear relationship between size of problem and number of developer hours. Unfortunately, it doesn't, for reasons mostly related to the next two resources.

Developer working memory is less well-appreciated, but it is a significant mechanism behind the non-linear slowdown in building larger systems. Once a system goes above a certain size, you can no longer fit the entire thing in your mind at once. To understand it, you need to break it down into subsystems and only think about individual subsystems at a time. This adds a switching cost between subsystems and a new source of errors and design problems, as nobody is capable of reasoning about the entire system and its subsystems at the same time.

But both of these resources are dwarfed by the last resource, which is amount of code. If you can only concentrate on one resource, it should be this one. It is the equivalent of the amount of material in the skyscraper. It's not just that more is worse, it's that too much fundamentally changes the kind of building you can make. A skyscraper, by its nature, has to be built out of a structure that has a high strength but a low weight, because that structure has to support the whole rest of the building as well as itself. In other words, a skyscraper is only possible if you use material efficiently.

Similarly, certain kinds of software are only possible if you use code efficiently. A tangled mess of spaghetti code won't get in the way too much when your scope is small and your requirements are modest, the same way you can build a small house out of sticks and mud. However, as the amount of complexity your system needs to support scales up, the efficiency of your code becomes more important. It's vital that your code is able to support the weight of the problem's complexity while introducing as little complexity itself as possible.

Now, I say that, but obviously you can just naively build a really big codebase the way you might naively build a really big building: just add stuff on top of other stuff until it gets big enough. Assuming you have an unlimited number of developers and an unlimited amount of time, this is a perfectly reasonable way to go about things.

But, assuming you don't, too much code is a problem you can't afford to have.