Expected value

It's a curious thing, being a self-modifying system. A whole lot of neat assumptions and abstractions we can make with more well-behaved systems just end up not working very well for us at all. In theory, we are quite well-modelled as predictive optimisation systems, and there's a lot of AI research going down that path, the assumption being that if you can build a good enough predictive optimiser, you'll eventually get intelligence. Whether or not that's true, it's pretty clear that, as far as optimisers go, we're fairly unoptimal.

I have a long-standing disagreement with a friend about wireheading, a kind of brain hack where you would find your brain's pleasure centres and just set them to maximum all the time. Instead of doing anything, you would just roll around in a state of maximal bliss until you die. This is not currently possible and there's no guarantee that it ever will be, but it's an interesting philosophical and moral puzzle. My friend thinks not just that wireheading is inevitable, but that it is rational for us to do, and that it would be rational for an AI to do the same thing!

The thinking goes like this: you can model any decision process as one giant black box, where lots and lots of different inputs go in, some magic happens, and all your available options receive a value. So eating lunch gets a 5, racing motorbikes with Henry Winkler gets a 7, and hitting yourself in the face with a brick gets a 0. This all happens very quickly and subconsciously, of course, but that internal structure is hidden somewhere driving your preferences for motorbike rides over brick-face. So, if we have a value for how good things are, why not cheat? Henry Winkler and lunch are harder to come by than bricks, so what if we could just set our brick value to, like, a billion, and just be blissfully happy hitting ourselves with bricks?

If that seems like a bad idea, I agree! In fact, in a sense it's what I wrote about in goalpost optimisation. Letting your optimiser optimise itself is a recipe for tautological disaster. But the question isn't whether it seems like a good idea, the question is whether it's a rational thing to do, and whether we would expect an intelligent machine in the same position to do it.

The reason I don't think so is that, although cheating your value function to infinity would satisfy your value function after you've done it, you still have to make the decision to do it in the first place. And if the most important thing to you is, say, collecting stamps, there's no world in which changing your values from enjoying stamps to enjoying staring at the ceiling twitching in ecstasy meets your current values. But the one niggle with that argument is that our values don't just include things like stamp collecting or meeting Henry Winkler, people also want to be happy.

Can we take a second to realise how weird that is? It's as if you built an artificial intelligence whose job was to clean up oil spills, and instead of saying "do the thing that cleans up the most oil", you said "try to clean up the most oil, but also store a number that represents how well you're doing that, and also try to make that number as large as possible". What an inelegant and overcomplicated way of doing things! There's every reason why a machine would just set that number to infinity, but also no reason why you would give a machine that number in the first place.

Of course, any discussion of wireheading is really a proxy for less extreme discussions about value systems and the role of pleasure. It gives us pleasure to maximise our pleasure, and we desire to satisfy our desires. And if that curious fact doesn't lead to wireheading, we should at least expect some pretty weird results. You could imagine a person whose pleasure is very low, but whose pleasure-about-pleasure is very high. That is, they aren't doing things that make them happy, but their "am I doing things that make me happy?" system has been short-circuited. Or someone whose expected value for things stays really high even though the actual value is low, because their expected value numbers are being modified directly.

That, I think, is the real concern of the modern age: not wireheading oureslves, but being subtly wireheaded by others. Our curious quirk that leads us to value getting the things that we value is very easily exploited as a means of shaping behaviour. Giving someone what they want all the time is very difficult and costly. Far better if you can make people act as though they're getting what they want using flaws in human biology.

Frailty, thy name is dopamine.