- Andrew Kos
- Bill Burlein
- Bryan Williams
- Christian Vozar
- Jeff Brown
- John Kraus
- Joseph Mak
- Josh Durbin
- Mark Daugherty
- Matt Van Bergen
- Melissa Geoffrion
- Michael Kang
- Michael Chan
- Michael Hodgdon
- Mike Motherway
- Molly McDaniel
- Nadia Maciulis
- Pat McLoughlin
- Paul Michelotti
- Puru Hemnani
- Rohit Srinath
- Ryan Lunka
- Tom Kelly
All Blogs
CITYTECH Blogroll:
How I learned to say ‘No’ to SQL
Wednesday, September 30, 2009
This is the story of how I learned to say ‘No’ to SQL and to cope with my wife’s addiction to Coach bags.
The answer is support groups. I kid. No, I don’t.
There are a number of alternatives to relational databases. However, for the purposes of this post, I am focusing on persistent, distributed key/value stores.
It all seems to have started with Amazon’s Dynamo.
In order to understand how today’s key value stores are borrowing heavily from Dynamo, it is worth briefly going over some of its characteristics.
- Fault Tolerance – The data is replicated and/or partitioned.
- Distributed Hash Table (DHT)
- Consistent Hashing – The mechanism behind distributed hash tables.
- Eventually Consistent
- Distributed Consistency
Consistent Hashing
Consistent hashing is a means of looking up a node based on a key. The idea is that a key is generated for each node. When you do a get/put/delete, you use the key to look up the key of the closest node.
For proper descriptions of consistent hashing check out these blog posts:
- Consistent Hashing
- Programmer’s Toolbox Part 3: Consistent Hashing
- Consistent Hashing in memcache-client
Eventually Consistent
CAP Theorem
- Consistent – The notion that all clients have the same view of the data at any given time.
- Available – The notion that the data can always be viewed by all clients at any given time.
- Partition Tolerance – The notion that the system can tolerate physical network partitions at any given time.
The theorem states that you can only have two of these properties at a time. This is interesting because I’ve found that a number of distributed key value stores are sacrificing consistency for availability and partition tolerance.
This is where the notion of eventually consistent comes into play. The idea is that if there are no further updates, the data will eventually become consistent.
For proper descriptions of the CAP theorem check out these blog posts:
For a proper description of eventually consistent check out this blog post by Werner Vogels (CTO – Amazon).
Distributed Consistency
- Locking
- Multi-Version Concurrency Control (MVCC) - The idea is that each user views a snapshot of the data based on a point in time.
- Vector Clock – The idea is that each node maintains a local copy of the system state (node event counters), as it knows it, and relies on messaging for synchronization.
- Read Repair – The idea is that rather than having the system prevent conflicts, it will rely on the client to resolve them.
Serialization & Protocols
- Byte Array
- JSON (Binary)
- Java Serialization
- protobuf (Google Protocol Buffers)
- Thrift
- HTTP/REST (JSON)
- memcached
Key Value Stores
There seem to be two types of variations.
- Key/Value Store – The value is arbitrary.
- Document Store – The value is semi structured data (e.g. JSON).
Project Voldemort – LinkedIn
Voldemort is a key value store. It has, by far, the best documentation.
- Replicated & Partitioned
- Consistent Hashing
- Eventually Consistent
- Versioned
- Vector Clock with Read-Repair
- JSON (Binary/Typed), String, Java, protobuf (Google Protocol Buffers), byte[]
- Other
- Java
Tokyo Cabinet/Tyrant – Mixi
Tokyo Cabinet is a key value store. Tyrant is the server that provides network access to Cabinet.
While I wouldn’t consider Cabinet to be distributed, I had to add it because it is ridiculously fast.
About 1 million inserts in 0.4 seconds fast.
- Replicated (Tyrant – Asynchronous)
- Locking – Read/Write (Transactions are available via Cabinet, but not Tyrant.)
- Write Ahead Logging/Shadow Paging
- byte[], memcached, HTTP/REST
- Other
- Sorted Keys (B Tree Index)
- C – pwrite, pread, mmap
- Thread Safe (pthreads)
- Cache Flushing – Writes are persisted to an in memory buffer. Once the buffer is full, they are written to disk.
The general consensus is that if your data can fit onto a single server, use Tokyo Cabinet/Tyrant. If not, use something like Voldemort.
Notes
This blog post provides additional information with respect to Tokyo Cabinet/Tyrant. And here is the presentation.
CouchDB is a document store, and because of that it supports search functionality.
- Replicated (Incremental, Shared Nothing, Asynchronous)
- Eventually Consistent
- HTTP/REST (JSON)
- MVCC
- Other
- Sorted keys (B Tree Index)
- Erlang – Concurrency
- Index/Query (Map/Reduce via JavaScript)
Notes
This blog post provides a nice comparison of the differences between CouchDB and Tokyo Cabinent/Tyrant.
Apache Cassandra – Facebook
Cassandra is really a BigTable clone. Think of it as a completely denormalized database. However, it too borrows heavily Dynamo and uses a DHT.
- Replicated & Partitioned
- Eventually Consistent
- Consistent Hashing
- Read-Repair
- Thrift
Notes
You can find two (useful) presentations on Cassandra here and here.
Conclusion
Why are distributed key/value stores gaining momentum? It is simple: performance and scalability.
I like to think of it as an exercise in simplicity. A typical key/value store only supports 3 operations:
- put(key, value)
- get(key)
- delete(key)
That is it!
The problem with databases is that they are difficult to scale horizontally. One solution is sharding. One problem with sharding is that you will likely lose the ‘relational’ aspect due to the performance cost of running a query across several nodes. The other problem is the cost of rebalancing the nodes if additional sharding is needed. Ultimately you will likely end up denormalizing the data. The is the first step on the path to key/value stores.
That being said, there are still use cases where you want to use a database. One case might be search. Typical key/value stores are not going to provide search functionality. However, document stores (CouchDB) and BigTable clones (Cassandra) will. The other resolves around consistency. As mentioned before, these distributed key/value stores are relaxing on consistency in favour of availability and partition tolerance. A common example is banking transactions. I don’t think anyone wants relax on consistency when working with banking transactions.
On the other hand, here is a use case from my own experience where I think a key/value store would be more appropriate.
I used to work on an application for generating and processing insurance applications. There were two steps to this process. The first step was determining if insurance was even available to the applicant. The next step was building a dynamic application for the applicant. For the first step, we maintained a complex set of business rules in our database. Essentially we used the applicant’s zip code and business type to determine if insurance was even available. For the second step, we maintained a list of questions based on the applicant’s insurance type and the carriers available.
The SQL for these operations was quite complex and included stored procedures as a result. A better alternative might have been to denormalize the database and push both the business rules and the applications to a key/value store. That is not to say that we wouldn’t continue to use a database. We would. It is just that the database would be used to enter the business rules/applications. The key/value store would be used to retrieve the business rules/applications. Ultimately, the business rules could be collapsed to a key that is the hash of the applicant’s business type and zip code. The value could be a serialized application. Now we just call get(key) and if the value is null, then there is no insurance available. If it is not, then insurance is available and this is the serialized application. On top of that, the applicant’s can save their application and update it at a later date. So, we might as well persist the application instance to a key/value store as well.
Of course, I have simplified things quite a bit here. However, my point is that this process is not transactional in the sense that banking transactions are. The only writes are for updating the answers, and for all intensive purposes this will be done a single client (the applicant). Concurrency is not really a problem here.
Notes:
This blog post provides a nice summary of the issues surrounding sharding.
Additional Reading – General
Anti-RDBMS: A list of distributed key-value stores
Performance comparison: key/value stores for language model counts
Some Notes on Distributed Key Stores
Evaluating key-value and document stores for short read data
NoSQL: If Only It Was That Easy
Building Scalable Databases: Denormalization, the NoSQL Movement and Digg
Drop ACID and Think About Data
Readings in Distributed Systems – Great List of Papers
Shane Johnson
Recent Posts
- Descriptive JMX Beans in AEM/CQ
- Invisible requirements within Business requirements
- Building a better Options Predicate
- Javascript, This, and You.
- Extensionless URLs with Adobe Experience Manager
- The Life of a Tester in Adobe CQ World!
- Limitations of the CQ Parsys Model and the Implementation of a Nested Paragraph System
- Google Analytics and AEM: No JavaScript? No Problem.
- Using Apache FOP to generate a PDF document based on a form submission data
- Configuring SAML in AEM 5.6