Sam Gentle.com

Distributed data pattern

Diagram of the distributed data pattern

I've recently been working on yet another distributed system, and I noticed a pattern that I've seen sometimes but that I wish I could see more of. I think it provides a useful lens for thinking about and designing distributed systems.

To start with, you have local data and remote data. You address data using a scope, which is some way of addressing that data. Think URLs, database queries, document ids, that kind of thing.

On top of that, you have events. These are messages attached to a particular scope that you access using a pub/sub pattern.

Lastly, you have updates that are implemented in terms of events. That is to say, updates are a special kind of event to synchronise the data associated with that scope.

Here are some varying levels of this pattern in real systems:

IRC
Channels are scopes. IRC protocol messages are events. JOIN/PART/NAMES are update events to synchronise user lists. TOPIC/MODE synchronise other channel state.
CouchDB
Database names and document ids or view functions are scopes. Replication (the _changes feed) implements update events. However, no other events are possible so there's no way to send messages to other people watching the same document.
RabbitMQ
Queues and topics are scopes. Events are implemented as publish/subscribe. However, there's no notion of persistent data attached to the scope.
Redis
Database numbers and keys are scopes. PUBLISH/SUBSCRIBE commands implement events, but they are not filtered by scope. The various update commands are not implemented as events, so there's no way to watch for database changes.
HTTP/Web
URLs are scopes. You can use server-sent events or websockets to implement events, but they're not scoped and not pub/sub. There's also no connection between HTTP documents and those streams - they accept different urls and you have to manage that mapping manually.

Ultimately, most things I work on that don't implement this pattern end up needing it to be reimplemented in some way, either by implementing events on top of the database (as in Couch), building your own mapping between events and data (as in Redis), or just doing whatever and hoping it works out (as in the web)