Failure

This failure was nearly identical to my last one so instead of spending a lot of time on analysing it I think I'll go meta and get straight into figuring out why my previous attempt to fix the same problem didn't work.

Part of it was that it takes some time to set up a new habit, and trying to write in the morning to avoid these kinds of failues hadn't quite taken yet. But, looking at my plan from last time, it was also lacking in specificity. I weaseled a bit with "I should be aware of" the tendency to fail on weekends (although this time it wasn't a weekend, it was pretty similar), and "aiming" to write in the morning "may be the solution".

So I'm not going to make any changes to my overall strategy, but I will clarify the particular tactic I want to use to get there. I'm going to set myself a motte and bailey goal. The motte is that I intend to write each post by the start of the 24-hour posting window. The bailey is that I must write it by the end. As I said, effectively the same as my current plan but better specified.

Wish me luck!

Expected value

It's a curious thing, being a self-modifying system. A whole lot of neat assumptions and abstractions we can make with more well-behaved systems just end up not working very well for us at all. In theory, we are quite well-modelled as predictive optimisation systems, and there's a lot of AI research going down that path, the assumption being that if you can build a good enough predictive optimiser, you'll eventually get intelligence. Whether or not that's true, it's pretty clear that, as far as optimisers go, we're fairly unoptimal.

I have a long-standing disagreement with a friend about wireheading, a kind of brain hack where you would find your brain's pleasure centres and just set them to maximum all the time. Instead of doing anything, you would just roll around in a state of maximal bliss until you die. This is not currently possible and there's no guarantee that it ever will be, but it's an interesting philosophical and moral puzzle. My friend thinks not just that wireheading is inevitable, but that it is rational for us to do, and that it would be rational for an AI to do the same thing!

The thinking goes like this: you can model any decision process as one giant black box, where lots and lots of different inputs go in, some magic happens, and all your available options receive a value. So eating lunch gets a 5, racing motorbikes with Henry Winkler gets a 7, and hitting yourself in the face with a brick gets a 0. This all happens very quickly and subconsciously, of course, but that internal structure is hidden somewhere driving your preferences for motorbike rides over brick-face. So, if we have a value for how good things are, why not cheat? Henry Winkler and lunch are harder to come by than bricks, so what if we could just set our brick value to, like, a billion, and just be blissfully happy hitting ourselves with bricks?

If that seems like a bad idea, I agree! In fact, in a sense it's what I wrote about in goalpost optimisation. Letting your optimiser optimise itself is a recipe for tautological disaster. But the question isn't whether it seems like a good idea, the question is whether it's a rational thing to do, and whether we would expect an intelligent machine in the same position to do it.

The reason I don't think so is that, although cheating your value function to infinity would satisfy your value function after you've done it, you still have to make the decision to do it in the first place. And if the most important thing to you is, say, collecting stamps, there's no world in which changing your values from enjoying stamps to enjoying staring at the ceiling twitching in ecstasy meets your current values. But the one niggle with that argument is that our values don't just include things like stamp collecting or meeting Henry Winkler, people also want to be happy.

Can we take a second to realise how weird that is? It's as if you built an artificial intelligence whose job was to clean up oil spills, and instead of saying "do the thing that cleans up the most oil", you said "try to clean up the most oil, but also store a number that represents how well you're doing that, and also try to make that number as large as possible". What an inelegant and overcomplicated way of doing things! There's every reason why a machine would just set that number to infinity, but also no reason why you would give a machine that number in the first place.

Of course, any discussion of wireheading is really a proxy for less extreme discussions about value systems and the role of pleasure. It gives us pleasure to maximise our pleasure, and we desire to satisfy our desires. And if that curious fact doesn't lead to wireheading, we should at least expect some pretty weird results. You could imagine a person whose pleasure is very low, but whose pleasure-about-pleasure is very high. That is, they aren't doing things that make them happy, but their "am I doing things that make me happy?" system has been short-circuited. Or someone whose expected value for things stays really high even though the actual value is low, because their expected value numbers are being modified directly.

That, I think, is the real concern of the modern age: not wireheading oureslves, but being subtly wireheaded by others. Our curious quirk that leads us to value getting the things that we value is very easily exploited as a means of shaping behaviour. Giving someone what they want all the time is very difficult and costly. Far better if you can make people act as though they're getting what they want using flaws in human biology.

Frailty, thy name is dopamine.

The two-axis model

It's quite common, when talking about happiness and sadness, pleasure and pain, good and bad, to place both options on either end of a single axis. Let's say you're a sentient computer of some kind. Somewhere within you is a number for how good or bad you feel, with 0 as neutral, -5 as a bit bad, and +100 as the best ever. This is very elegant, and seems to feel intuitively correct. If someone is sad, you want to make them happy to stop them from being sad. Or if something bad happens, you might try to do something nice to make up for it.

However, I believe the one-axis model is not sufficient. Things can be both good and bad, and a thing that has both significant positive and negative consequences (say, bombing Japan in World War 2) can hardly be equivalent to something with no consequences. We don't accept the idea that if you go out and save someone's life, that gives you one free murder. I think a better mapping to reality is a two-axis model, where good and bad are considered independently.

Similarly, a feeling of mixed happiness and sadness is of course possible and quite different from neutral. It is possible to cheer someone up when they feel bad, but I'm not convinced that the mechanism is by the happy feelings subtracting from the sad feelings. You can also distract someone from sadness with anger, pain, or mindless internet clicking. And it's true that just feeling like you're being cared for is soothing but, I should stress, that's not the same thing as enjoyable.

I once heard that as the difference between compulsion and desire: you want to go to the park because it brings you pleasure, but you are compelled to scratch an itch because not doing so brings you discomfort. For that reason, although you might describe a pleasurable feeling of relief when you scratch an itch, it is not real pleasure. The proof of this is simple: would you choose to be itchy?

Another place this shows up is reinforcement learning. The original one-axis model of reinforcement was just reward vs punishment, but it was later revised to have two axes: positive/negative and punishment/reinforcement. Positive and negative in this case should be read as additive and subtractive, so a smack is a positive punishment, because it adds something bad. A negative reinforcement takes away something bad. You can also have the opposite: a negative punishment takes away something good, and a positive reinforcement gives something good. These are four distinct ways to influence behaviour in this model, with different consequences for each.

And that is the thing I think is most important: not that two-axis is better in some abstract theoretical way, but that it has better consequences. If you go around thinking good is the opposite of bad, you're liable to wonder strange things like "nothing is bad in my life, why aren't I happy?".

Of course, the two-axis model makes that obvious: pleasure isn't anti-pain, and you would be happier with more pleasure and more pain than none of either.

The savant effect

You meet some strange people in online games. Modern matchmaking systems are quite sophisticated, almost universally being based around Bayesian predictive models of player skill. There are flaws with that approach, but as far as anyone can tell the fairness of the matches isn't one of them. And yet somehow you still get people who are short-tempered, mean, rude, and dumb. In team games, someone like this has an enormous negative effect, often singlehandedly losing the game for their team. But just when you're ready to write them off, those very same neanderthals often pull out some surprisingly skillful play.

A friend was recently telling me about a fairly well known designer in the fashion industry who is a total, utter nightmare to work with. He's bad with money, bad with business, bad at management, bad at organisation, bad – as far as I could make out – at everything. Apparently the continued survival of his label is a miracle that constantly surprises everyone around him. So, hearing this, I assumed that mister bad-at-everything must be a pretty average designer as well. I had the chance to check out one of his pieces and it was... incredible. It was really, really good. I couldn't have been more wrong.

In psychology, there's a phenomenon called the halo effect. When you learn about a positive quality in someone, you tend to generalise that to their entire personality and everything about them. You assume a successful person is also happy, or a beautiful person is also kind. My aunt once said, completely straight-faced, "Michael Jackson can't be a pedophile; he's such a talented musician!" It's a fairly pervasive bias, and it also works negatively: once you get the impression someone is no good, they must be bad at everything.

Okay, so I unfairly halo-effected this designer's artistic ability from his business skills, and the toxic teammate's in-game skill from their attitude. But there's something more there: it's not merely that I shouldn't have inferred one bad quality from another, it's that I should have inferred the exact opposite! The existence of all those negative qualities all but guaranteed a positive quality, in the situations where I encountered them.

The reason why is this: in both cases, there was a filtering function (the matchmaking system or the continued operation of the design business) that was at least partly linear (by which I mean that some degree of in-game skill can make up for some degree of being a huge douche). It's another variant of the anthropic principle; in this case, the fact that the person still has the matchmaking ranking that they do, or can still hold on to a business despite being so incapable, strongly suggests that there is some very large compensatory factor. There are surely people without that factor, but they're not still in business so the question would never come up.

I call this the savant effect, after the fascinating phenomenon of savant syndrome, where some people with severe mental disabilities have surprisingly exceptional abilities in specific areas. Presumably there are a great many people who have severe mental disabilities without any superpowers to compensate for them, but they don't pass the filtering function for being newsworthy or interesting enough to make into a movie starring Dustin Hoffman.

It's sometimes quite tempting, when you see someone or something who appears vastly unfit for the position they're in, to assume that they must have somehow cheated, or that the system is otherwise broken. I hear it in software all the time: "oh, this piece of software is objectively better, but everyone likes the other one for no reason". "How'd that guy get promoted when he's not as good a developer as me?". I mean, for sure, sometimes the system is broken, and people do cheat, but before you jump to that conclusion it's worth considering that maybe the system is fine, you just have a limited view of the candidates and the criteria in question.

And perhaps it's worth considering how much benefit there can be in knowing savants. Someone who manages to be so good at something that it can make up for them being bad at nearly everything else has got to be worth learning from.

Cliff jumping

One of my favourite scenes in Back to the Future is the bit where Marty McFly jumps off the building rather than be shot by Biff. Oh no, our hero is dead – but wait, what's that? He's flying! The DeLorean was waiting below, caught Marty on its hood, and flies up to deliver a triumphant gull-wing door to Biff's face. To me, that's the quintessential action hero trait; to throw yourself into an impossible situation knowing your hero skills will bail you out. Of course, it doesn't always work like that.

I once did improv comedy, and that was the best part. Anyone can make up things on the spot, but you only got a really great scene when you took a crazy risk that would make the entire audience breathe in as they think "holy hell, this is going to be a disaster", and then laugh in surprise and relief as the rest of your crew come in to bail you out. You jumped off the cliff, everyone thought you were dead, but you miraculously survived. I'd had that audience experience before, but the feeling as a performer doing a successful cliff jump is really something else.

Of course, there's nothing miraculous about it. Good improv performers train in how to make those leaps and how to bail out their fellow performers to the point where it's a very reliable process. The danger is an illusion; in a sense, it's more about trust than risk. You trust that people are there to catch you. You build that trust over the course of working with and training with other performers until you know what you can get away with. However, that appearance of danger still feels real. Real enough to impress the audience, and real enough that a common mistake among newer performers is playing it safe because they don't trust their skills enough.

It is easy to make that mistake outside of improv as well. You can spend a career – in some cases a lifetime – building your skills, and still treat risks as conservatively as a novice. I've seen very capable people beg off taking the obviously better hard road because there's a marginally workable easy road. This isn't laziness, either; in many cases the easy road takes more work, even if the difficulty is lower. It's really a kind of risk aversion, or more accurately a lack of trust in their own abilities. And the result isn't failure, it's mediocrity.

This is part of the reason underconfidence can be worse than overconfidence; taking on too much might mean you fail and have to correct your behaviour, but taking on too little means you never have a chance to really succeed. And how would you even know?

I believe the antidote is to learn to love the feeling of cliff jumping, of knowing that you're taking a big risk and that what you're doing feels suicidal and certainly looks suicidal from the outside. But you know something everyone else doesn't: beneath you is a DeLorean, and you're going to fly up and surprise the shit out of them.