Sam Gentle.com

Abstractions

If there's one thing I love as a software developer, it's a good abstraction. It takes a large, complex set of things and turns them into a smaller, simpler set. Maybe you have thousands of different colours that are hard to reason about until you realise you can represent them all as mixtures of red, green and blue. Or you have all these different chemical elements but they all have properties seemingly at random, until you realise you can lay them out periodically by atomic number and the properties line up.

Except for Hydrogen, which kind of doesn't behave properly. And Helium. And there are some ambiguities with the transition metals. It's not even clear that there is any fundamental physical basis to the current layout over other options. And, come to mention it, RGB actually misses out on a significant number of colours and is generally a bad fit for human vision.

I've heard abstractions that don't completely encompass the things they're meant to represent described as leaky, with the understanding that all abstractions leak. To me, that is perhaps a bit of an abstraction-centric view. I like to think of it in terms of information theory: there is some fundamental amount of information that you are trying to compress down into a smaller amount of information. The extent to which you can do that depends on how much structure is in the underlying information, and how much you know about that structure.

If I give you a piece of paper with a list of a million numbers written on it that look like 2, 4, 6, 8, and so on, I have provided you with no more information than if the paper said "even numbers up to 2 million". The abstraction, in that case, was really just a more efficient way of representing the information. On the other hand, if I gave you that same piece of paper and it was mostly the even numbers up to 2 million but some numbers were different, you've got a hard choice to make. Either you keep track of the (potentially large number of) exceptions, or you just remember "it's even numbers up to 2 million" and be wrong some of the time.

It's this kind of lossy compression that represents most abstractions in the real world, for the simple reason that most real-world problems are too complex for the amount of space we're willing to give them. You can prove all sorts of interesting results about probabilities of coin flips, but you have to ignore the possibility that the coin will land on its edge. These simplifications throw away information in the hope that you can compress your understanding much more at the cost of only occasional errors.

So I don't believe that all abstractions leak. I think we often choose to make our abstractions imperfect to save space, or because we don't know enough of the underlying structure to describe it succinctly. However, it is possible to make a perfect abstraction, we just don't think of them as abstractions. An abstraction that completely describes the underlying information is just the truth.