I was thinking today about the way that reposts survive on sites like Reddit. You might think that there's no value in posting something that's already been posted earlier, and thus the existence of reposts reveals a flaw in the ranking system in some way. However, I'm not convinced this is necessarily the case.
Firstly, people can sometimes appreciate being reminded of something, like an old joke that you've forgotten still makes you laugh when you hear it again. I'd call that individual forgetfulness. But there's also a second kind: over time new users join the site, older users drift away, and often users will miss new content as it comes in. The result is that even if individuals had perfect memory, the group would lose information. I'd call that population forgetfulness.
We have a fairly robust system for managing individual forgetfulness: spaced repetition. You repeat each thing you want to remember on an exponential scale with an exponent that you adjust for each item depending on how difficult it is to remember. This is a promising idea for managing population forgetfulness; could we generalise it to groups?
To do that you'd first need a way to effectively measure how well the population remembers something. For something like Reddit, you could possibly rely on the repost's score. In the more general case, you could probably use a sampling technique and survey people on their memories. Either way, you'd then need some extra statistical trickery to turn that into the right factor for your spaced repetition exponent. Presumably the tuning would then look like targeting a certain confidence interval of remembering.
Anyway, it occurs to me that once you start thinking about the more general problems you can solve, a lot of them turn out to be pretty unsavoury. For example, how often you should show someone an ad seems like it would be modelled fairly well by population spaced repetition. Similarly, how often should you repeat a message as a repressive government in order to indoctrinate people? I guess it would work for that too.
Well, hopefully it can be used more for good than for evil. Or, if not, I hope we come up with some decent defenses against it.
A friend remarked the other day that if you want to make a lot of things, it's worth spending a lot of time on your tools. With my recent prototyping kick I've been noticing how often I seem to be repeating a fairly similar sequence of setup steps. I've been mostly messing around with shiny web technology things, so the setup mostly involves local webservers and Coffeescript build scripts, but each kind of project tends to have its own standard setup process.
It occurs to me that depending on the balance of new vs existing projects in your work, the total cost of setup would be vastly different. If you tend to work on multi-year-long ongoing projects, really any degree of setup cost is unlikely to matter. On the other hand, a 37signals-style web consultancy business will probably see multiple new projects a month. So it's important to keep that cost down. However, even that is a very different calculation compared to creating a new project each day, or even multiple per day.
It might sound excessive, but I actually think making multiple projects per day can actually be a pretty good way to do things. If you're looking for new ideas and trying a few different designs, or you want to write code in a highly decoupled (dare I say microservice) style, or you want to validate your assumptions with some throwaway code before you go all-in – all of these are great reasons to create new projects early and often.
But for that to make sense you really need your new project creation process to be really efficient. If it takes, say, 15 minutes I think that's still too long. Ideally it'd be under a minute from deciding to make a new project to being able to start meaningfully working on it. I'm nowhere near that point at the moment, but I think it could be feasible with the right set of creative tools.
I think the biggest improvement would be something like a palette of semi-reusable code chunks. When I find myself doing the same thing a few times in different projects I could drop a copy of that repeated code in the palette and then pull it out the next time I need it. I'd want to be able to do that at different scales – from single lines of code to whole files all the way up to multiple files spread across different directories.
There'd be a lot of tricky work involved to make something like that work well, but I think it'd be pretty useful. The less friction for creating a new project, the easier it is to create and the more experimental you can be.
Things can seem pretty difficult at times. I'm worried about money, or my work isn't going well, or I'm just in a bad mood. I think what life must be like for people who don't have these problems, and I feel envious of that. How easy it must be to be wealthy, happy and successful! Perhaps you are immediately jumping to say "but everyone has problems!" Yes, perhaps. But is it so difficult to accept that for some people life is just better than it is for you? Is it so impossible that there is someone out there who, for no good reason, has your life plus a little bit more?
I think the more interesting response is to ask: how weak is your imagination that the only kind of better you picture is you plus a million dollars? What about you plus a thousand limbs? Plus a brain the size of a planet? What about you plus a galaxy of robot servants capable of rendering a creation so expansive that the sum of today's humanity couldn't even comprehend it? What about your mind modified to be in a state of pure, absolute bliss without beginning or end?
It seems evident in these moments that there is so much – infinitely much – that we don't have. That we will never have. And yet the thought that I may never rule the universe or transcend time and space itself doesn't really bother me on a day-to-day basis. And if that doesn't bother me, perhaps there is no reason to be bothered that my life isn't better in other ways either.
There are a lot of things out there, and to distinguish them from each other it's necessary to have some kind of reference that identifies them. I'd like to make an argument that there are really only two ways to do that: Names and IDs. I believe they are distinctly different, even opposite, and that attempts to mix them in forms like usernames or logins result in name-like IDs that are inelegant, ineffective and user-hostile.
An ID is something that uniquely identifies a resource. It is designed primarily for the use of machines and consequently it is not necessary that it be human-readable or meaningful in any way. A name, by contrast, is a representation of how humans identify things. A thing can have more than one name, a name can refer to different things depending on context, and the right name for something depends on the person.
I first encountered this dichotomy ages back on Freenet, where there were two main kinds of identifiers: Content-Hashed Keys (CHKs) and Keyword-Signed Keys (KSKs). The former are defined by their content, and are thus a perfect kind of ID: that content will always have that ID, and that ID can only ever be assigned to that content. The latter, on the other hand, used keys derived from a simple word like "gpl.txt". That was more like a name, but unfortunately still had some ID-like semantics. A KSK was expected to map to exactly one resource, even though anyone could define one. The predictable thing happened and someone eventually replaced "gpl.txt" with goatse.
Web addresses are another example of names vs IDs. The DNS system maps human-meaningful domain names to machine-meaningful IP addresses. An IP address is fairly ID-like, though not totally unique (the same site can usually be reached through multiple IPs, though there are interesting ideas to change this). However, what really lets down the internet is the structure of domain names.
Domain names, much like Freenet's KSKs, are an attempt to define a human-readable but still unique name-like ID. The result is something that doesn't function well as either. To prevent the GPL-to-goatse problem, we maintain at great expense a global database of domain-to-IP mappings and treat them as purchasable property. The result? Valve, the multi-billion-dollar videogame company and creators of Steam, the biggest online game store, own neither valve.com nor steam.com. In fact, at the time of this reading, neither site has anything on it. What a total failure at representing name-like semantics.
On the other hand, Googling "Steam" or "Valve" will give you the right result. Not only that, but if you're a plumber and you spend all day searching for plumber things, you'll get personalised results that are more likely to suit your plumbery interests. The result is that Google is a better name system than DNS, and user behaviour reflects it. It's very common now to type the name of a website into Google to find it, with sometimes hilarious results.
I believe this is part of the reason for the runaway success of Google. Search, as it's usually defined, is a process of finding information. Queries like "best Beatles album" or "Kim Kardashian's baby" are examples of finding information. But we also use search as a lookup, the same way you would have once looked up a name in a phone book, or a book in a library index. What set Google apart was that it was fast enough, and simple enough, to function not just as a search engine, but as a universal name lookup service.
Other systems can learn a lot from search engines in their name handling. For example, on Facebook you don't really use usernames to refer to people. Instead, you have a user ID that's just a big meaningless string of numbers, and your real-world name. When you want to look someone up, you either have a link to their profile (using their ID) or, more commonly, you just type their name in the search box and they appear. Those names aren't unique, and they're not universal - they change depending on the context and the person doing the searching.
But the thing you have to give up with names is a sense of ownership. If your name is Chris and you join a group with another Chris, there's no trademark dispute resolution process, you just both get called Chris. Or maybe one of you, whoever's better known, gets to be Chris and the other becomes "Other Chris". But these rules aren't designed to protect the primacy of your name as a piece of intellectual property, they're fluid and based around whoever is associated most strongly with the name in a given situation.
Most of our name-like IDs, things like usernames and domain names, are compromises because of weaknesses in computers or human-computer interfaces. In many cases, the computing power and system sophistication just wasn't there in the early days of software to allow for handling names properly. Usernames date back to the unix logins of the 70s, and DNS from the 80s. Back then it would have been not only computationally difficult to do proper name searching, but difficult to build UI for doing name lookups that would be responsive enough. And if your only method for exchanging IDs in the real world is writing them down or saying them out loud, it's important that they be memorable.
However those restrictions are way out of date now, and we have more than enough resources to revisit those compromises. Modern multi-user desktops select a user from a list rather than typing a login. Modern website lookups are mostly done through Google. And I think other name-like IDs will also lose their relevance as we build new systems that supplant the old. The next big frontier is website logins, which a lot of different companies are trying to own.
My hope is that once this particular internet turf war is over and we leave behind our current balkanised mess for a universal notion of identity, we can take on the big dog of ugly name-like IDs: email. Can you imagine if, instead of messaging an arbitrary series of characters, you just message a person? What a triumph of what over how!
I got partway writing through my post last night, but it turned out to be significantly longer than I expected. This has happened a couple of times and normally I just eventually cut my losses and start over with something shorter, but I ran out of time to do that.
I usually have some idea of what will turn out to be a long post rather than a short one, so I think the right way to prevent this in future is to give those some consideration in advance, and maybe set aside time specifically for the ones that will be longer. I have a few half-finished longer posts from earlier, so I think it would be helpful for resurrecting those as well.