Skip to content

Create an ultra fast, distributed & resilient datastore with Riak

data storeIn one project I am working on, I needed to store a *large* amount of data. My main constraints were that it should be able to store data as a simple key/value pair collection, blazing fast, with a quick learning curve & highly resilient. I also tend to avoid swiss knives and prefer simple tools that do one simple thing, but do it extremely well.

Enters Riak from Basho team, what is Riak ?

A Riak cluster is masterless, automatically redistributes data when you scale, and keeps data available when physical machines fail. It stores data as key/value pairs, has a simple operational model, and comes with an HTTP API and many client libraries.” – Basho.com

Riak is an open source, distributed database written in Erlang. Its main features are Availability, Fault-Tolerance, Operational Simplicity and Scalability. Riak is used by companies like BestBuy, Ideeli, OpenX, Bump, Kiip, Yammer and the list goes on.

PART 1/ RIAK (FIRST) NODE INSTALL First, we will install Riak on a Debian 6.0. You can check the instructions over here. It’s actually very simple : First get the signing key from Basho via curl:

curl http://apt.basho.com/gpg/basho.apt.key | apt-key add -

Then you just have to add Basho repository to your apt sources list:

bash -c "echo deb http://apt.basho.com $(lsb_release -sc) main > /etc/apt/sources.list.d/basho.list"
apt-get update

Now you are ready to install Riak:

apt-get install Riak

Now we should be ready to start Riak:

riak start

You can check that Riak is running with the following command, Riak should reply “pong” if your node is running or “pang” otherwise:

riak ping</pre>

You can also check that Riak is working using this simple command :

curl -v http://127.0.0.1:8098/riak/test

For more commands don’t forget to check Basho’s quick start doc here.

At this point, your Riak installation is bound on localhost, so we want our Riak node to be open to the whole world. Everything in Riak is specified in the app.config file.  If like me you encounter the “ulimit” warning, please follow instructions here.

You might want to have a web interface to manage your nodes and your datastore content.  Then you can install Rekon to browse your datastore, it takes only one command line and a second to install, simply type the following command:

curl -s -L rekon.basho.com | sh

And we’re done, we can access your content through a web interface. No web server needed.

PART 2/ LET’S CREATE OUR CLUSTER A.K.A “THE RING”

riak-ring
Riak cluster architecture

A Riak cluster is architectured as a ring. In our example, we will set up 5 nodes. This is what the Riak team suggests as a minimum for a Riak cluster. We’ll see later why in Part 3 5 is a good minimum. As you can see on the diagram, Riak supports until 2^160 nodes (possible IPv6 adresses are “only” 2^120). For the time being, let’s start with five :) So we need to repeat our node install (described in Part (1)) on 5 different physical (or virtual) machines. Once we are done, we need to make each Riak node to join our cluster with the following command:

riak-admin cluster join riak@xxx.xxx.xxx.xxx

xxx.xxx.xxx.xxx stands for the IP of your first node. Run this command on each host.  If everything went well, you should have a message saying “Success: staged join request for…” You can now plan & commit your changes, once again you just invoke riak-admin :

riak-admin cluster plan
riak-admin cluster commit

From this point, you can check your ring status with the riak-admin status command or if you prefer something more graphical, I advise you to try Riak Control. We can enable Riak Control (on your first node for example) which will allow us to have a graphical view of our nodes. Enable Riak Control. I was quite surprised by Riak Control GUI, it is extremely simple with a neat user interface.

control_ring

Riak Control Web Graphical User Interface

Your Riak Control should be available at https://riak.myfirsthost.com:8069/admin#/cluster

PART 3/ PLAYING WITH RIAK

You can find libraries for almost any client. The cool thing is that you can test drive your Riak cluster with simple curl commands. In my examples, I decided to go with C#. There’s a great Riak driver in .Net : CorrugatedIron which is available from NuGet. Just fire the following command:

Install-Package CorrugatedIron

After you install the nuget package in your project, it should create a new section in your app.config (or web.config) where you can specify all your nodes. This section should look like the following:


 <!-- Add all five your nodes here -->

You are now ready to write some code to store (and retrieve :) data. Riak storage is organized by Buckets. Think of Buckets as logical folders where you would store your collections of Key/Value Pairs (ex: Users, Orders, Messages, Tweets, Sessions…) you get the idea. Inside a bucket, all keys should be unique and are the (fastest) way to store/read your data. First, let’s connect to Riak and check if it is alive.


var cluster = RiakCluster.FromConfig("riakConfig");
 var client = cluster.CreateClient();

RiakResult result = client.Ping();

if (result.IsSuccess)
 {
 Console.WriteLine("Riak is alive !");
 }
 else
 {
 Console.WriteLine("Something seems wrong.. Really.");
 return;
 }

As you can see, the code is pretty straightforward. Now let’s try to write something.


// Write
 StockPrice sp = new StockPrice();
 sp.dayClose = 1.3338M;
 sp.dayMin = 1.3345M;
 sp.dayMax = 1.3391M;
 sp.time = new DateTime(2013, 8, 9);

var o = new RiakObject("EURUSD", sp.time.ToString("yyyyMMdd"), sp);

client.Put(o);

You will notice that we don’t need to create a bucket, if we create a key/value pair to a bucket that didn’t exist, Riak creates it for us. Buckets are case sensitive, so make sure to respect the case in your buckets/keys naming. Here I have a bucket where I store all my EUR/USD prices for each day. I decided to use the yyyyMM What about updating ?

Well, it’s exactly the same code. Updating is simply writing an object to an existing key. To check that your data is written or updated, you can simply check the following URL which should output your data as JSON: http://riak.myfirsthost.com:8098/buckets/EURUSD/keys/20130809

If you want something more graphical, you can use Rekon which we installed in Part 1. Rekon should be available at the following URL : http://riak.myfirsthost.com:8098/buckets/rekon/keys/go#/buckets/EURUSD/keys/20130809

Note: For a production setup, do not install Rekon as it should never be used in production mode (check Github page here.) Reading from C# is really simple as well as you can expect :

DateTime dt = new DateTime(2013, 8, 9);</pre>
var response = client.Get("EURUSD", sp.time.ToString("yyyyMMdd")); if (response.ResultCode == ResultCode.Success) { StockPrice curSp = response.Value.GetObject(); Console.WriteLine("Stock price found : " + curSp.dayClose); } else if (response.ResultCode == ResultCode.NotFound) { Console.WriteLine("Item not found"); } 

The client.Get() function takes 2 parameters, the bucket name and the key. Here I use the “yyyyMMdd” as my key. Also note that my StockPrice class is a POCO object, no need to decorate your class or inherit anything. Just a  plain & simple class.

Ok it’s fun, but what’s so special about it ? Well you probably have noticed that in your Riak response object, you have 3 variables called N, R, W. Remember in Part 2 when we created 5 nodes for our cluster ?

PART 4/ THE ROAD TO RESILIENCE

Resilience is not magical, it always comes with some trade offs. Following the CAP theorem we know that a distributed system cannot satisfy Consistency, Availability and Partition tolerance at the same time. In order to ensure your data is safe, a Riak cluster stores your data on several nodes. As a result, when reading or writing data, you can also specify how many nodes should have your data replicated before returning (completing).

These 3 parameters are known as N, R and W on Riak.

N is the number of nodes that should store your data. R is the number of nodes that should reply to a read request before returning your data. W is the number of nodes that should acknowledge your write request before completing. The larger N, the safer your data is, but the more space you need (actual data will be datasize * nb_nodes).

R & W raison d’être are simple to get too. Since a Riak cluster is a masterless ring of nodes and replication can be asynchronous, it is therefore possible that subsequent read/write will lack consistency. The higher R, the more nodes will have to reply, so theoretically the slower it is. In the other hand, the surer you are about consistency in your read operation. The same remain true for W with writes.

As you can see, there is no “ideal” parameters, how to set these will largely depend on what you do with your data and constraints. As you already guessed, these settings are not cluster-wide but bucket-wide, it means that depending on your bucket, you may wish to set different N/R/W parameters. Here’s some code to query the N/R/W values from our “EURUSD” bucket.


var bucketProperties = client.GetBucketProperties("EURUSD");

Console.WriteLine("NVal : {0}", bucketProperties.Value.NVal);
Console.WriteLine("RVal : {0}", bucketProperties.Value.RVal);
Console.WriteLine("WVal : {0}", bucketProperties.Value.WVal);

It is now up to you to decide what are your constraints and set right parameters for your usage to have a rock solid data store.

CONCLUSION

Riak is not a silver bullet, but if you want to easily create a simple NoSQL datastore for your data that is fast, distributed and resilient, it is sure a solid candidate. My thoughts are that such data store should always be used with great caution. Unlike SQL Databases, having no integrity check on the database side means that it is now up to the developer to take care of it. With Great Power Comes Great Responsibility :)

In this article, we just barely scratched the surface of what Riak can do. If you want to learn more about Riak and dig a bit more in its fundamentals, I recommend you to read the awesome book “A little Riak book” written by Eric Redmond which is a great introduction. Mathias Meyer also released the second edition of his Riak Handbook. I will also probably post more on Riak later with Map/Reduce usage.

Until then, happy cool product building !