CouchDB – The Schemaless Database

Most corporate databases used today are relational in nature. You have tables, columns, rows, column indexes, relations between tables using associated columns, etc. All this means you are dealing with structured data. Think banking systems, mortgage systems, customer and order tracking systems. These systems are well served with relational databases.

Now think about the systems that a lot of us deal with nowadays. Unstructured or semi-structured data as available in Social sites (facebook), wiki, blogs, news sites,etc. This data is clearly not relational in nature. Instead they are document-centric in nature. While some RDBMS can store this type of data, its not natural for them.

Then again for such sites you need to change the structure of the document and add new metadata to it. In the relational world you will have to add new  columns to existing tables and that generally affects all the data that already exists. If you’ve been there you know what I mean. Another aspect that most of these databases encourage is using cheap commodity hardware to run a distributed database. When you apply search frameworks such as Map/Reduce on top of such a distributed database you can get very impressive response times.

In comes a family of “databases” that are document-oriented and schema-less. Schema-less here means just that – no tables or columns, no structure, no foreign keys, no indexes – forget all of that. And with names such as CouchDB, MongoDB, Cassendra, SimpleDB, etc. they definitely catch ones attention. One such database is called CounchDB. Lets take a look at it using some real examples.

Document In CouchDB

In CouchDB you store documents. A document is represented using JSON. An example is:

{

“blogname”: “Introduction to CounchDB”
“publishdate”: “5/5/2011”
“catgories”: [“schemaless”,”document-oriented”,”nosql”]
“blogtext”: “blog text goes here”

}

The format above is <field>:<data>

So for your application you will have many such documents, each with a unique id. Lets say you have 100,000 such documents. After which you have a new requirement that requires you add 10 more fields to the above document. You can go ahead and do that for all new documents going forward. The existing 100,000 do not need to change. Big difference from relational world here.

While the data is unstructured, often for reporting purposes you need to query it. CouchDB provides the concept of views – implemented using SpiderMonkey JavaScript. The one big area where CouchDB stands out, is in the way you access the database. It provides an HTTP-based RESTful API to access the datastore. Standard HTTP methods – GET (retrieve), POST, PUT (insert or update) and DELETE (delete data) are used. Responses are returned in JSON – which makes sense since the data itself is stored in JSON format. Finally did I mention this is a distributed database. Each peer in the cluster can have a full copy of the database. Changes are automatically synchronized between servers (only deltas are). This is actually quite amazing.

 

Install CouchDB on Ubunti 11.04

To install on Ubunto 11.04 run

>> sudo apt-get install couchdb

This will install and start the CouchDB database. To test it type in URL http://127.0.0.1:5984 into your browser. You should get a JSON response of:

{“couchdb”:”Welcome”,”version”:”1.0.1″}

Next go to http://127.0.0.1:5984/_utils and you will be taken to the web-based administration application named Futon. You can also run it from the terminal using the command line:

>> curl http://127.0.0.1:5984

If you do not have curl, then type in “sudo apt-get install curl”.

 

Work With CouchDB

For the rest of the blog I will use “curl” to interact with CouchDB. If you are using a Java application you could use JAX-RS to communicate with CouchDB. For starters we need to create a database (of course).

  • Create database: curl -X PUT http://127.0.0.1:5984/blogs
    • Response: {“ok”:true}
  • You could delete this database by issuing the command -X DELETE.
  • Insert a new document: curl -X PUT http://127.0.0.1:5984/blogs/nosqlblog -d ‘{}’
    • Response: {“ok”:true,”id”:”nosqlblog”,”rev”:”1-967a00dff5e02add41819138abb3284d”}
  • You just created an empty document with a unique id of “nosqlblog”. CounchDB also created a revision id for you. CouchDB never updates the same document. If you issue an update for “nosqlblog”, CouchDB will create a new document with a new revision id. Queries will always use the latest one. Older revisions can be deleted.
  • In case you want to use a generated unique identifier run ‘curl -X GET http://127.0.0.1:5984/_uuids’ to get one and use that in your PUT request.
  • Retrieve the document: curl -X GET http://127.0.0.1:5984/blogs/nosqlblog
    • Response: {“_id”:”nosqlblog”,”_rev”:”1-967a00dff5e02add41819138abb3284d”}
  • Update the Document (use id and revision number): curl -X PUT http://127.0.0.1:5984/blogs/nosqlblog -d ‘{“_rev”:”1-967a00dff5e02add41819138abb3284d”,”blogname”:”Introduction to CounchDB”}’
    • Response: {“ok”:true,”id”:”nosqlblog”,”rev”:”2-3b492c6ac21d09bf35830457848773f2″}
  • Retrieve the document: curl -X GET http://127.0.0.1:5984/blogs/nosqlblog
    • Response: {“_id”:”nosqlblog”,”_rev”:”2-3b492c6ac21d09bf35830457848773f2″,”blogname”:”Introduction to CounchDB”}

 

Using Views in CouchDB

Next lets try out Views to actually access this data. Views are for CounchDB what SQL is for relational databases. Views are written using JavaScript. Before we create our own view here is how you execute on that is available OOTB.

Run view _all_docs to retrieve count of rows:

>> curl -X GET http://127.0.0.1:5984/blogs/_all_docs

Response:

{“total_rows”:2,”offset”:0,”rows”:[
{“id”:”nosqlblog”,”key”:”nosqlblog”,”value”:{“rev”:”2-3b492c6ac21d09bf35830457848773f2″}},
{“id”:”someblog”,”key”:”someblog”,”value”:{“rev”:”1-fb5af6a59c9624a7908315f43900116d”}}
]}

Lets create a temporary view using Futon. Navigate to the database and select the option to create a temporary view (not recommended for production). You will be presented with a MapReduce entry screen. Map is used to generate key-value pairs from the document database using. Reduce then works off of that key-value set and selects only those that are of interest.

To turn this into a permanent view, simply save it as a design document using the “save as” button in Futton.

Additional References: