The unbearable rightness of CouchDB

As I mentioned recently, this website is built on CouchDB. CouchDB is in many ways a very innovative but still very simple database, and it has the unique quality of genuinely being a "database for the web", as the marketing copy claims. However, lately most of the time what I feel about CouchDB is not joy but more a kind of frustration at how close – how agonisingly close – it is to being amazing, while never quite getting there.

The first one that really gets me is CouchApps. They're so close to being a transformative way of writing software for the web. Code is data, data is code, so why not put it all in one big code/database? Then people can run their own copies of the code locally, have their own data, but sync it around as they choose. Years before things like unhosted or serverless webapps were even on anyone's radar, CouchDB already had a working implementation.

Well, kind of. Unfortunately CouchApps never really had first-class support in CouchDB. The process of setting one up involves making a big JSON document with all your code in it, but the admin UI was never really designed to make that easy. The rewriting engine (what in a conventional web framework you might call a router) is hilariously primitive, so there certain kinds of structures your app just can't have, and auth is a total disaster too. The end result is that most of the time you need to tack some extra auth/rewriting proxy service on the front of your gloriously pure CouchApp. What a waste.

There are other similarly frustrating missed opportunities too. CouchDB had a live changes feed long before "streaming is the new REST" realtime databases like Firebase showed up, but never went as far as a full streaming API or Redis-style pub/sub. It has a great inbuilt versioning model that it uses for concurrency, which could have meant you magically get versioned data for free – but didn't. It has a clever master-master replication system that somehow doesn't result in being able to generate indexes in parallel.

I should say that, although it frustrates me to no end, I really do respect CouchDB. At the time it came out, there were no other real NoSQL databases and a lot of the ones that have come since went in a very different direction. Compared to them, I admire CouchDB's purity and the way its vision matches the essential design of the web. But in a way I think that's exactly what makes it so frustrating. That vision is so clearly written in the DNA of CouchDB, and it's such an amazing, grandiose vision, but the execution just doesn't live up to it.

Are categories useful?

I remember reading some time ago about the Netflix Prize, a cool million dollars available to anyone who considerably improved on Netflix's movie recommendation algorithm at the time. Of course, the prize led to all sorts of interesting techniques, but one thing that came out of it was that none of the serious contenders, nor the original algorithm (ie the actual Netflix recommendation engine) used genres, actors, release years or anything like that. They all just relied on raw statistics, of which the category information was a very poor approximation.

So I wonder, if it's true for Netflix, is it true for everything? The DSM-5, effectively the psychiatry bible, had a bit of controversy at least partially because of its rearrangement of diagnostic categories. What was once Asperger's is now low severity autism, and many other categories were split further or otherwise changed. However, the particular validity of a treatment for particular symptoms hasn't changed (or, if it has, not because the words in the book are different now).

Medical diagnostics seems to mostly be a process of naming the disease, and then finding solutions that relate to that name. However, that process can take a long time and doesn't always work. Maybe it would be better if we got rid of the names, and used some kind of predictive statistical model instead. You'd just put as much information is as you can and be told what interventions are most likely to help. The medical landscape would certainly look pretty interesting, but I suspect not in a way that doctors or patients would reassuring, even if it did result in better outcomes.

Ultimately, that seems like the point of categories. They're not good for prediction by comparison to other methods, and often they're plagued by disagreements over whether a particular edge case fits the category or not. However, the alternative would mean putting our faith in pure statistics, and I'm not sure people are ready for that.

Can you imagine a world where we don't categorise things? Where you don't need to determine if something is a chair or not, just whether it's likely you can sit on it? You wouldn't be considered a cat person, just someone statistically likely to be interested in a discussion about feline pet food. Maybe we could all get used to predicting outcomes, rather than needing to understand the internal system that leads to those outcomes. It sure would make life a lot simpler.

But I doubt that's going to happen any time soon.

Wet floors

An amusing anecdote from the first time I met a good friend of mine: He was writing some code to dedupe files on his fileserver and needed to pull some logic out of a loop to run it somewhere else. He copy-pasted it rather than abstracting it out into a function, saying "oh man, I bet this is going to come back to haunt me". Literally ten seconds later he changed the logic in the body of the loop without changing it in the place he'd copied it to, hitting the exact problem he was worried about.

I think of those situations as wet floors, after a time I was in a KFC and I saw the workers behind the counter skidding around on an oily floor right next to the deep fryers. I spent a long time thinking about how one of those kids was going to slip and put their hand in boiling oil before I even realised I could do something to prevent that outcome. Of course, when I went up to warn them the response was "oh, yeah, that is dangerous". I'm fairly certain they didn't actually clean the floor.

It occurs to me that this is a consistent pattern in software development and elsewhere: you see a problem just waiting to happen, and you notice it but instead of doing something you say "that's going to be a problem". Later on when it is a problem, you can even say "I knew that was going to be a problem". Though that is a deft demonstration of analytical and predictive ability, it could perhaps have been put to better use.

It sometimes seems like the drive to understand things can be so strong that you lose sight of the underlying reality. "I understand how this works" can be so satisfying that it makes "I should change how this works" unnecessary. Or perhaps it's just that understanding is always a positive; it's often not that difficult, and it feels good when you do it. Whereas acting in response to your understanding can be a lot of effort and doesn't always work the way you want.

There is also an element of confidence. Something you believe in a consequence-free way is very different from something that has serious costs if you're wrong. I've heard it said that the hardest job is being responsible for the Big Red Button. When you press the Big Red Button, it brings everything to a halt and costs hundreds of thousands of dollars, but not pressing it costs millions, maybe destroys the whole company, and definitely your career. It must take enormous confidence to press that button when necessary.

A related technique that I quite like is the pre-mortem, where you pretend something has gone wrong and explain why you think that was. What's considered powerful about it is that it removes the negative stigma from predicting failure, but I think there's something else as well: a pre-mortem directly connects your knowledge of failure to the reality of failure. That is, it forces you to imagine the eventual result of "this is going to be a problem": an actual problem.

Perhaps all that is required to defeat wet floors is to drive up your own confidence in that belief, or associate it strongly enough with the actual failure it predicts.

Website

Well, getting back up to speed took slightly longer than I thought. However, as of this post I am now officially writing in the future, which is fairly exciting. I figure it seems like as good a time as any to go into a little bit of detail on the website itself.

The whole thing is a couchapp being served and rendered entirely by CouchDB. Each post is created as a JSON document in the database. Here's this post, for example. All documents of a certain type are then rolled up into the bytype view. You can then query that view to get recent posts, for example all of the posts in September. Finally, those views and documents are rendered by some database-side Javascript (yes, really) using Mustache templates into the amazing website you see before you.

Obviously a lot of this stuff is really tightly coupled with the CouchDB philosophy. I think Couch has a lot of qualities that make it really great for a site like this, not least of which is that I can have my own local copy of the website and through magic replication, the site just copies itself into production when I'm ready. In fact, you can copy it too! Just point your CouchDB's replicator at the API endpoint.

I've also finally gotten around to putting the code up on GitHub. I'm not sure why that would necessarily be useful to you, but in case you're curious, there it is. Various parts have been floating around since 2011 or so, which is at least four stack trends ago. Feels good to put it up at last.

Prototype discipline

As I've started to make more of a habit of prototyping, I've noticed that actually the difficulty isn't so much in making the prototypes themselves. On the contrary, making prototypes is usually fun and interesting in equal parts. Instead, the big difficulty is making prototypes the right way, so that you get something useful out of them, and so that they stay light and exploratory.

The first thing I've noticed is that it's important to have a particular direction in mind. I've heard it said that prototypes should answer a question, but I'm not sure that's necessarily true. There's definitely a place for that kind of specific question-answering prototype, but for me I've found the most benefit in using prototypes just to explore. That said, the exploration goes a lot better if it's focused on a specific idea-space.

Another important thing is keeping the scope and the expectations small. It seems to be particularly easy for new ideas to creep in – which is great, in a way, that's the point – but you have to be able to figure out what to say no to. The other risk is to start treating the code like something that has to be perfect-complete, with all the trappings of a kind of project that it isn't. I've also heard similar-but-not-quite-right advice on this front: that it's okay for prototype code to be bad. I think you lose a lot by writing code you're not happy with even in the short term. The trick is letting it be good prototype code and not something else. The goal is exploration, not to make a polished final product.

I'm beginning to see prototypes as an essential component of the continuous everywhere model: if you can decrease the size of the gap between having an idea and seeing a working version of that idea, it gives you a lot more information and a lot more flexibility in which ideas you explore and how.