Call us Toll-Free:
1-800-218-1525
Email us

 Sponsors

How to install Cassandra + Thrift (and why you should care)

Mike Peters, 04-05-2010
Cassandra is a decentralized (fast reads), highly available (fast writes), fault tolerant database that can allow you to scale out well beyonds what's available with traditional RDBMS like MySQL.

Index optimization, database denormalization, replication and sharding are great techniques to squeeze more juice out of MySQL...

But eventually, as your tables and queries grow, you're going to hit a brick-wall.

Storing huge amounts of data with MySQL is easy. But when it comes time to Retrieve those records, using filters, sorts and joins, you'll be lucky if you can ever scale beyond 1 million records (without aggressive sharding and memcached) while still maintaining high front-end speeds.

With companies like Digg, Facebook, Twitter, switching over from MySQL and betting all cards on Cassandra, this is one technology you should become intimately familiar with.

Key features of Cassandra:

* Fault Tolerant: Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
* Decentralized: Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
* Flexible: Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
* Highly Available: Writes and reads offer a tunable ConsistencyLevel, all the way from "writes never fail" to "block for all replicas to be readable," with the quorum level in the middle.

Understanding Cassandra Data Model

Keyspace = Database name. (Usually one per application)

Column = One Table cell (field name, field value, timestamp)

{ // this is a column
name: "emailAddress",
value: "[email protected]",
timestamp: 123456789
}

Important: Columns are always sorted by their name. Sorting supports BytesType, UTF8Type, LexicalUUIDType, TimeUUIDType, AsciiType and LongType. Each of these options treats the Columns' name as a different data type.

SuperColumn = One row in a table.

{ // this is a super column
name: "homeAddress",
// with an infinite list of columns
value: {
street: {name: "street", value: "1234 x street", timestamp: 123456789},
city: {name: "city", value: "san francisco", timestamp: 123456789},
zip: {name: "zip", value: "94107", timestamp: 123456789},
}
}

Important: Supercolumns are always sorted by their name. Internally the columns inside each super column are also sorted by their name.

ColumnFamily = Table holding Columns. (Structure that contains an infinite number of Rows)

UserProfile = { // this is a ColumnFamily
phatduckk: { // this is the key to this Row inside the CF
// now we have an infinite # of columns in this row
username: "phatduckk",
email: "[email protected]",
phone: "(900) 976-6666"
}, // end row
ieure: { // this is the key to another row in the CF
// now we have another infinite # of columns in this row
username: "ieure",
email: "[email protected]",
phone: "(888) 555-1212"
age: "66",
gender: "undecided"
},
}

SuperColumnFamily = Table holding SuperColumns. Similar to ColumnFamily, but in this case every "row" holds SuperColumns.

AddressBook = { // this is a ColumnFamily of type Super
phatduckk: { // this is the key to this row inside the Super CF
// the key here is the name of the owner of the address book

// now we have an infinite # of super columns in this row
// the keys inside the row are the names for the SuperColumns
// each of these SuperColumns is an address book entry
friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},

// this is the address book entry for John in phatduckk's address book
John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
...
// we can have an infinite # of ScuperColumns (aka address book entries)
}, // end row
ieure: { // this is the key to another row in the Super CF
// all the address book entries for ieure
joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
},
}

--

1. Getting started

Download the latest version of Cassandra from this page.

Use the Bin version. No need to recompile.

mkdir /usr/tmp
fetch "http://apache.raffsoftware.com/cassandra/0.5.1/apache-cassandra-0.5.1-bin.tar.gz"
tar xvfz apache-cassandra-0.5.1-bin.tar.gz

2. Download JRE

Download and install the version of JRE matching your system (32bit or 64bit) from the Java SE Download page.

3. Configure Cassandra

cd /usr/tmp/apache-cassandra-0.5.1/conf
vi storage-conf.xml

Update the ClusterName to something meaningful.

Update the folders where the database is to be stored: CommitLogDirectories, DataFileDirectories, CalloutLocation and StagingFileDirectory.

4. Set JAVA_HOME

Before you can start Cassandra, it's important to set the JAVA_HOME environment variable, pointing it to the location of the Java JRE bin files on your machine.

For example:

setenv JAVA_HOME /usr/home/admin/htdocs/services/java/jdk7_64
set JAVA_HOME=/usr/home/admin/htdocs/services/java/jdk7_64

5. Start Cassandra

cd /usr/tmp/apache-cassandra-0.5.1/
bin/cassandra -f

If you get any errors about ClassNotFound, it means there's something wrong with your JAVA_HOME environment variable. Make sure it is pointing to the proper JRE/JDK directory on your machine.

6. Install Thrift

Now that we have one Cassandra Node running, let's move on to interface with our new database.

Thrift is a software framework for scalable cross-language services development. Cassandra supports Thrift, thereby allowing integration across multiple programming languages and platforms.

As part of this guide we'll use Thrift over PHP.

First, we need to install Boost.

cd /usr/ports/devel/boost
make all
make install

We need automake and autoconf:

cd /usr/ports/devel/automake110
make all
make install
/usr/ports/devel/autoconf262
make all
make install

Now, assuming you meet Thrift requirements, we can proceed.

You can download the latest version of Thrift from this page.

cd /usr/tmp/
fetch "http://apache.raffsoftware.com/incubator/thrift/0.2.0-incubating/thrift-0.2.0-incubating.tar.gz"
tar xvfz thrift-0.2.0-incubating.tar.gz
cd thrift-0.2.0
./bootstrap.sh
./configure --with-boost=/usr/local
make
make install

If you're on FreeBSD, you can simply install Thrift from the FreeBSD ports:

cd /usr/ports/devel/thrift
./bootstrap.sh
./configure --with-boost=/usr/local
make all
make install

7. Interfacing with Cassandra

Creating a record in Cassandra using Thrift:

/* Insert some data into the Standard1 column family from the default config */

// Keyspace specified in storage=conf.xml
$keyspace = 'Keyspace1';

// reference to specific User id
$keyUserId = "1";

// Constructing the column path that we are adding information into.
$columnPath = new cassandra_ColumnPath();
$columnPath->column_family = 'Standard1';
$columnPath->super_column = null;
$columnPath->column = 'email';

// Timestamp for update
$timestamp = time();

// Add the value to be written to the table, User Key, and path.
$value = "[email protected]";
$client->insert($keyspace, $keyUserId, $columnPath, $value, $timestamp, $consistency_level);

Check out our Cassandra PHP Wrapper for an easier way to interface with Cassandra from PHP, or refer to the Thrift Examples.

Mike Peters, 04-07-2010
When using Thrift over PHP, if you're getting weird "TSocket: timed out reading 4 bytes" errors, make sure you patch your TSocket.php as follows:


diff
--git a/TSocket.php b/TSocket.php
index ba3a631
..ae4c6ab 100644
--- a/TSocket.php
+++ b/TSocket.php
@@ -257,9 +257,10 @@ class TSocket extends TTransport {
 
stream_set_timeout($this->handle_, 0, $this->recvTimeout_*1000);
 
$this->sendTimeoutSet_ = FALSE;
  }
$md = stream_get_meta_data($this->handle_);
+  if (
$md['unread_bytes'] > 0 && $md['unread_bytes'] < $len ) $len = $md['unread_bytes'];
 
$data = @fread($this->handle_, $len);
  if (
$data === FALSE || $data === '') {
$md = stream_get_meta_data($this->handle_);
  if (
$md['timed_out']) {
    throw new
TException('TSocket: timed out reading '.$len.' bytes from '.
             
$this->host_.':'.$this->port_);

Also, make sure you are using Framed transport for Cassandra.

In your storage-conf.xml, set this to true:

true

And change this line in your code, from:


$this
->transport = new TBufferedTransport($this->socket, 1024, 1024);

To:


$this
->transport = new TFramedTransport($this->socket, 1024, 1024);

Mike Peters, 04-08-2010
Still getting unexplained "timed out reading 4 bytes" errors?

Please install version 0.6 or higher, it reports the real error messages so you can tell what's wrong.

Version 0.5 will always spit out the "timed out reading 4 bytes" regardless of what the error is.

Jeremy Hutchings, 04-08-2010
Step 7 is a PHP file (though with no includes) that you run, or some other command interface ?

Not getting that bit right is likely why I'm getting :

-------
2010-04-08 08:36:58 CassandraDB ERROR: Keyspace SPI does not exist in this schema.
2010-04-08 08:36:58 CassandraDB ERROR: Keyspace SPI does not exist in this schema. Array ( )
-------

When running your demo on the later post ?

Mike Peters, 04-08-2010
Jeremy -

When using the SPI Cassandra PHP Wrapper, Update your Cassandra /conf/storage-conf.xml, setting up the Cluster name to SPI. The default is TestServer.

This is the line to update:


<ClusterName>SPI</ClusterName>

Jeremy Hutchings, 04-10-2010
Thank you for taking the time to answer, this a great couple of posts and making my learning curve of this technology which I *must* figure out a lot easier :)

I checked the conf.xml and I think it might of been :

<Keyspace Name="SPI">

Opposed to cluster name, though now I'm thinking there is other config I have to do to get the hello world working as mytable isn't set up :

2010-04-10 02:05:27 CassandraDB ERROR: unconfigured columnfamily mytable
2010-04-10 02:05:27 CassandraDB ERROR: unconfigured columnfamily mytable Array ( )

Is step 7 a PHP script that you run to prepare the Cassandra data store ?

Adrian Singer, 04-12-2010
Did you setup your Cassandra conf/storage-conf.xml? The CF mytable used in this example should be defined as:


<ColumnFamily CompareWith="BytesType" Name="mytable"/>

Jeremy Hutchings, 04-13-2010
Now I did, thanks :)

Was rushing a bit there opposed to spending the time to learn the config file.

Up and running and testing now, going to add some more nodes and see what happens.

Landscapers Web Design, 05-07-2010
Its good, interesting post.

Pieter Maes, 06-21-2010
Thanks a lot for the "TSocket: timed out reading 4 bytes" fix!
you spared me a headache!

Alberane, 07-25-2010
Excellent article.
I'll do some testing with cassandra and try to carry my experiences here. Thank you.

Mike Peters, 07-29-2010
To build a PHP Thrift interface for Cassandra:

./thrift -gen php /cassandra/interface/cassandra.thrift

(Replace /cassandra/interface with the folder where you have Cassandra installed)

Suku, 08-05-2012
Very nice tutorial... but I searching on php and cassandra I came across to something like PDO will be required for windows installation. Can you please tell m how to install it in windows.
Enjoyed this post?

Subscribe Now to receive new posts via Email as soon as they come out.

 Comments
Post your comments












Note: No link spamming! If your message contains link/s, it will NOT be published on the site before manually approved by one of our moderators.



About Us  |  Contact us  |  Privacy Policy  |  Terms & Conditions