I found an interesting talk from Justin Sheehy on distributed systems on Travis Swicegood’s blog. The presentation suppose to be about Riak, but it explains general concepts that are usually misused.

The first part is about NoSql at Riak, with justification on why did they make these choices for replication, relations between objects and use of map/reduce, the use of a REST interface.

The first concept treated is the CAP theorem. It is important to remember that the theorem states that we can only guaranty two of consistency, availability and partition tolerance simultaneously. That means that we must to make compromises on the level of each we guaranty, and that if the application allows it it’s possible to change them at runtime, but we don’t throw one away.

The second concept is scalability, that tells nothing as a word, but need to be explained on which kind of load the system can handle, and at which cost. There is also an interesting thought about being able to scale down.

The next buzz word explained is distributed. His description is more about an ideal view distributed systems, with a capacity to incorporate or loose nodes in a really easy way. It looks like a P2P system in a way. I think that however this comes at a cost and a more rigid view may be more useful for some applications.

Another demystified word is reliability: the system can always crash, whatever we do. We just want the system to be the most resilient possible, and we must specify in terms of what we are resilient.

The last expression is eventual consistency. His main point here is that if the compromise due to the CAP theorem is done on consistency, there must be application level logic that explains how to merge data when several versions are found, and that overwriting one instance is not acceptable as general solution at the database level.