Open Library architecture

You’ve no doubt already heard about the Open Library demo site from the Internet Archive, brainchild of Brewster Kahle and Aaron Swartz. I think it’s a really exciting project, and I’m sure I’ll have more to say about it soon.

One thing that struck me as interesting is a technical detail. On the “About the technology” page, there’s this tidbit:

We wanted a database that could hold tens of millions of records, that would allow random users to modify its entries and keep a full history of their changes, and that would hold arbitrary semi-structured data as users added it. Each of these problems had been solved on its own, but nobody had yet built a technology that solved all three together.

So we created ThingDB (tdb), a new database framework that gives us this flexibility. ThingDB stores a collection of objects, called “things”. For example, on the Open Library site, each page, book, author, and user is a thing in the database. Each thing then has a series of arbitrary key-value pairs as properties. […] Each collection of key-value pairs is stored as a version, along with the time it was saved and the person who saved it. This allows us to store full semi-structured data, as well as travel back thru time to retrieve old versions of it.

This sounds really interesting. It also reminds me very much of Maya’su-forms (pdf), aside from the fact that the identifiers aren’t UUIDs. Although I’m not really database-savvy enough to know much about the underlying infrastructure that makes any of this happen, so my interest is something like an ape staring at a power drill, but still, I thought it worth noting.

Leave a Reply

Your email address will not be published. Required fields are marked *