I really like CouchDB. It is a NOSQL database, but really in the not only SQL sense. It is not designed to work accross thousands of nodes like Cassandra would. It is neither designed to handle petabytes of data as Hadoop. But it works really great for providing real time results of moderately complex queries ran data that seldom changes.

Indexed views

At the root it represents a simple document store, JSON documents are stored against a simple ID:

document_1 => { key: "value 1", count: 10 }
document_2 => { key: "value 2", count: 5 }

But what makes its real value is the use of indexed views. A view is the result of a map/reduce query ran on stored documents, which allow quite complex processing. It is indexed on keys, to allow efficient retrieval of results as well as cursor navigation.

When a document is updated, all views are kind of patched:

  1. map is ran on the new version of the document
  2. events emitted on the previous version of this document are discarded
  3. the tree of results is reduced again only for branches of binary trees with modified events, untouched branch results are kept identical

This allows to always have up to date results without triggering the map/reduce process manually, and is very efficient if documents do not change too often.

Prepared result set

The first use case I personally like is that the result of a complex query can be computed once. A recent case I had was when recording weights as suggested by the Hacker’s diet: doing an average on the last week to smooth variation.

In many other systems it would either require to be computed on each and every query (for example here an avg() combined with a WHERE clause in SQL) while content does not change that often. An other solution is to put a cache on top of the store, which then needs to be pruned on updates and will redo the whole computation for all entries.

With CouchDB updates cost a small amount of CPU time because only entries in the average range would be touched, and it would be automatic.

Augmented result set

A second use case is the access to computed properties. I had the case in a recent reflection that it would be great to have a collision resilient hash of a model.

In most systems the solution is to pollute the functional model with additional purely technical properties that are to be updated accordingly on model change, and that may become obsolete as the system evolves.

With CouchDB, a view can be created that adds the computed property, nobody will never miss updating it, and if a new version of the system needs it in another form the view can be modified accordingly. The model is kept clean.

Limitations

So of course CouchDB has limitations: replication is mostly manual, sharding is not possible due to the design itself, and to me the most important to me there is no way to do chained map/reduce operations (for that the Cloudant clone can be used).

But it has its use cases.