Recent Posts

Sponsors
![]() |
How to install Cassandra + Thrift (and why you should care)Mike Peters, 04-05-2010 |
Cassandra is a decentralized (fast reads), highly available (fast writes), fault tolerant database that can allow you to scale out well beyonds what's available with traditional RDBMS like MySQL.
Index optimization, database denormalization, replication and sharding are great techniques to squeeze more juice out of MySQL...
But eventually, as your tables and queries grow, you're going to hit a brick-wall.
Storing huge amounts of data with MySQL is easy. But when it comes time to Retrieve those records, using filters, sorts and joins, you'll be lucky if you can ever scale beyond 1 million records (without aggressive sharding and memcached) while still maintaining high front-end speeds.
With companies like Digg, Facebook, Twitter, switching over from MySQL and betting all cards on Cassandra, this is one technology you should become intimately familiar with.
Key features of Cassandra:
* Fault Tolerant: Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
* Decentralized: Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
* Flexible: Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
* Highly Available: Writes and reads offer a tunable ConsistencyLevel, all the way from "writes never fail" to "block for all replicas to be readable," with the quorum level in the middle.
Understanding Cassandra Data Model
Keyspace = Database name. (Usually one per application)
Column = One Table cell (field name, field value, timestamp)
Important: Columns are always sorted by their name. Sorting supports BytesType, UTF8Type, LexicalUUIDType, TimeUUIDType, AsciiType and LongType. Each of these options treats the Columns' name as a different data type.
SuperColumn = One row in a table.
Important: Supercolumns are always sorted by their name. Internally the columns inside each super column are also sorted by their name.
ColumnFamily = Table holding Columns. (Structure that contains an infinite number of Rows)
SuperColumnFamily = Table holding SuperColumns. Similar to ColumnFamily, but in this case every "row" holds SuperColumns.
--
1. Getting started
Download the latest version of Cassandra from this page.
Use the Bin version. No need to recompile.
2. Download JRE
Download and install the version of JRE matching your system (32bit or 64bit) from the Java SE Download page.
3. Configure Cassandra
Update the ClusterName to something meaningful.
Update the folders where the database is to be stored: CommitLogDirectories, DataFileDirectories, CalloutLocation and StagingFileDirectory.
4. Set JAVA_HOME
Before you can start Cassandra, it's important to set the JAVA_HOME environment variable, pointing it to the location of the Java JRE bin files on your machine.
For example:
5. Start Cassandra
If you get any errors about ClassNotFound, it means there's something wrong with your JAVA_HOME environment variable. Make sure it is pointing to the proper JRE/JDK directory on your machine.
6. Install Thrift
Now that we have one Cassandra Node running, let's move on to interface with our new database.
Thrift is a software framework for scalable cross-language services development. Cassandra supports Thrift, thereby allowing integration across multiple programming languages and platforms.
As part of this guide we'll use Thrift over PHP.
First, we need to install Boost.
We need automake and autoconf:
Now, assuming you meet Thrift requirements, we can proceed.
You can download the latest version of Thrift from this page.
If you're on FreeBSD, you can simply install Thrift from the FreeBSD ports:
7. Interfacing with Cassandra
Creating a record in Cassandra using Thrift:
Check out our Cassandra PHP Wrapper for an easier way to interface with Cassandra from PHP, or refer to the Thrift Examples.
Index optimization, database denormalization, replication and sharding are great techniques to squeeze more juice out of MySQL...
But eventually, as your tables and queries grow, you're going to hit a brick-wall.
Storing huge amounts of data with MySQL is easy. But when it comes time to Retrieve those records, using filters, sorts and joins, you'll be lucky if you can ever scale beyond 1 million records (without aggressive sharding and memcached) while still maintaining high front-end speeds.
With companies like Digg, Facebook, Twitter, switching over from MySQL and betting all cards on Cassandra, this is one technology you should become intimately familiar with.
Key features of Cassandra:
* Fault Tolerant: Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
* Decentralized: Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
* Flexible: Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
* Highly Available: Writes and reads offer a tunable ConsistencyLevel, all the way from "writes never fail" to "block for all replicas to be readable," with the quorum level in the middle.
Understanding Cassandra Data Model
Keyspace = Database name. (Usually one per application)
Column = One Table cell (field name, field value, timestamp)
{ // this is a column
name: "emailAddress",
value: "[email protected]",
timestamp: 123456789
}
name: "emailAddress",
value: "[email protected]",
timestamp: 123456789
}
Important: Columns are always sorted by their name. Sorting supports BytesType, UTF8Type, LexicalUUIDType, TimeUUIDType, AsciiType and LongType. Each of these options treats the Columns' name as a different data type.
SuperColumn = One row in a table.
{ // this is a super column
name: "homeAddress",
// with an infinite list of columns
value: {
street: {name: "street", value: "1234 x street", timestamp: 123456789},
city: {name: "city", value: "san francisco", timestamp: 123456789},
zip: {name: "zip", value: "94107", timestamp: 123456789},
}
}
name: "homeAddress",
// with an infinite list of columns
value: {
street: {name: "street", value: "1234 x street", timestamp: 123456789},
city: {name: "city", value: "san francisco", timestamp: 123456789},
zip: {name: "zip", value: "94107", timestamp: 123456789},
}
}
Important: Supercolumns are always sorted by their name. Internally the columns inside each super column are also sorted by their name.
ColumnFamily = Table holding Columns. (Structure that contains an infinite number of Rows)
UserProfile = { // this is a ColumnFamily
phatduckk: { // this is the key to this Row inside the CF
// now we have an infinite # of columns in this row
username: "phatduckk",
email: "[email protected]",
phone: "(900) 976-6666"
}, // end row
ieure: { // this is the key to another row in the CF
// now we have another infinite # of columns in this row
username: "ieure",
email: "[email protected]",
phone: "(888) 555-1212"
age: "66",
gender: "undecided"
},
}
phatduckk: { // this is the key to this Row inside the CF
// now we have an infinite # of columns in this row
username: "phatduckk",
email: "[email protected]",
phone: "(900) 976-6666"
}, // end row
ieure: { // this is the key to another row in the CF
// now we have another infinite # of columns in this row
username: "ieure",
email: "[email protected]",
phone: "(888) 555-1212"
age: "66",
gender: "undecided"
},
}
SuperColumnFamily = Table holding SuperColumns. Similar to ColumnFamily, but in this case every "row" holds SuperColumns.
AddressBook = { // this is a ColumnFamily of type Super
phatduckk: { // this is the key to this row inside the Super CF
// the key here is the name of the owner of the address book
// now we have an infinite # of super columns in this row
// the keys inside the row are the names for the SuperColumns
// each of these SuperColumns is an address book entry
friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},
// this is the address book entry for John in phatduckk's address book
John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
...
// we can have an infinite # of ScuperColumns (aka address book entries)
}, // end row
ieure: { // this is the key to another row in the Super CF
// all the address book entries for ieure
joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
},
}
phatduckk: { // this is the key to this row inside the Super CF
// the key here is the name of the owner of the address book
// now we have an infinite # of super columns in this row
// the keys inside the row are the names for the SuperColumns
// each of these SuperColumns is an address book entry
friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},
// this is the address book entry for John in phatduckk's address book
John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
...
// we can have an infinite # of ScuperColumns (aka address book entries)
}, // end row
ieure: { // this is the key to another row in the Super CF
// all the address book entries for ieure
joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
},
}
--
1. Getting started
Download the latest version of Cassandra from this page.
Use the Bin version. No need to recompile.
mkdir /usr/tmp
fetch "http://apache.raffsoftware.com/cassandra/0.5.1/apache-cassandra-0.5.1-bin.tar.gz"
tar xvfz apache-cassandra-0.5.1-bin.tar.gz
fetch "http://apache.raffsoftware.com/cassandra/0.5.1/apache-cassandra-0.5.1-bin.tar.gz"
tar xvfz apache-cassandra-0.5.1-bin.tar.gz
2. Download JRE
Download and install the version of JRE matching your system (32bit or 64bit) from the Java SE Download page.
3. Configure Cassandra
cd /usr/tmp/apache-cassandra-0.5.1/conf
vi storage-conf.xml
vi storage-conf.xml
Update the ClusterName to something meaningful.
Update the folders where the database is to be stored: CommitLogDirectories, DataFileDirectories, CalloutLocation and StagingFileDirectory.
4. Set JAVA_HOME
Before you can start Cassandra, it's important to set the JAVA_HOME environment variable, pointing it to the location of the Java JRE bin files on your machine.
For example:
setenv JAVA_HOME /usr/home/admin/htdocs/services/java/jdk7_64
set JAVA_HOME=/usr/home/admin/htdocs/services/java/jdk7_64
set JAVA_HOME=/usr/home/admin/htdocs/services/java/jdk7_64
5. Start Cassandra
cd /usr/tmp/apache-cassandra-0.5.1/
bin/cassandra -f
bin/cassandra -f
If you get any errors about ClassNotFound, it means there's something wrong with your JAVA_HOME environment variable. Make sure it is pointing to the proper JRE/JDK directory on your machine.
6. Install Thrift
Now that we have one Cassandra Node running, let's move on to interface with our new database.
Thrift is a software framework for scalable cross-language services development. Cassandra supports Thrift, thereby allowing integration across multiple programming languages and platforms.
As part of this guide we'll use Thrift over PHP.
First, we need to install Boost.
cd /usr/ports/devel/boost
make all
make install
make all
make install
We need automake and autoconf:
cd /usr/ports/devel/automake110
make all
make install
/usr/ports/devel/autoconf262
make all
make install
make all
make install
/usr/ports/devel/autoconf262
make all
make install
Now, assuming you meet Thrift requirements, we can proceed.
You can download the latest version of Thrift from this page.
cd /usr/tmp/
fetch "http://apache.raffsoftware.com/incubator/thrift/0.2.0-incubating/thrift-0.2.0-incubating.tar.gz"
tar xvfz thrift-0.2.0-incubating.tar.gz
cd thrift-0.2.0
./bootstrap.sh
./configure --with-boost=/usr/local
make
make install
fetch "http://apache.raffsoftware.com/incubator/thrift/0.2.0-incubating/thrift-0.2.0-incubating.tar.gz"
tar xvfz thrift-0.2.0-incubating.tar.gz
cd thrift-0.2.0
./bootstrap.sh
./configure --with-boost=/usr/local
make
make install
If you're on FreeBSD, you can simply install Thrift from the FreeBSD ports:
cd /usr/ports/devel/thrift
./bootstrap.sh
./configure --with-boost=/usr/local
make all
make install
./bootstrap.sh
./configure --with-boost=/usr/local
make all
make install
7. Interfacing with Cassandra
Creating a record in Cassandra using Thrift:
/* Insert some data into the Standard1 column family from the default config */
// Keyspace specified in storage=conf.xml
$keyspace = 'Keyspace1';
// reference to specific User id
$keyUserId = "1";
// Constructing the column path that we are adding information into.
$columnPath = new cassandra_ColumnPath();
$columnPath->column_family = 'Standard1';
$columnPath->super_column = null;
$columnPath->column = 'email';
// Timestamp for update
$timestamp = time();
// Add the value to be written to the table, User Key, and path.
$value = "[email protected]";
$client->insert($keyspace, $keyUserId, $columnPath, $value, $timestamp, $consistency_level);
// Keyspace specified in storage=conf.xml
$keyspace = 'Keyspace1';
// reference to specific User id
$keyUserId = "1";
// Constructing the column path that we are adding information into.
$columnPath = new cassandra_ColumnPath();
$columnPath->column_family = 'Standard1';
$columnPath->super_column = null;
$columnPath->column = 'email';
// Timestamp for update
$timestamp = time();
// Add the value to be written to the table, User Key, and path.
$value = "[email protected]";
$client->insert($keyspace, $keyUserId, $columnPath, $value, $timestamp, $consistency_level);
Check out our Cassandra PHP Wrapper for an easier way to interface with Cassandra from PHP, or refer to the Thrift Examples.
![]() |
Mike Peters, 04-07-2010 |
When using Thrift over PHP, if you're getting weird "TSocket: timed out reading 4 bytes" errors, make sure you patch your TSocket.php as follows:
diff --git a/TSocket.php b/TSocket.php
index ba3a631..ae4c6ab 100644
--- a/TSocket.php
+++ b/TSocket.php
@@ -257,9 +257,10 @@ class TSocket extends TTransport {
stream_set_timeout($this->handle_, 0, $this->recvTimeout_*1000);
$this->sendTimeoutSet_ = FALSE;
}
+ $md = stream_get_meta_data($this->handle_);
+ if ($md['unread_bytes'] > 0 && $md['unread_bytes'] < $len ) $len = $md['unread_bytes'];
$data = @fread($this->handle_, $len);
if ($data === FALSE || $data === '') {
- $md = stream_get_meta_data($this->handle_);
if ($md['timed_out']) {
throw new TException('TSocket: timed out reading '.$len.' bytes from '.
$this->host_.':'.$this->port_);
Also, make sure you are using Framed transport for Cassandra.
In your storage-conf.xml, set this to true:
true
And change this line in your code, from:
$this->transport = new TBufferedTransport($this->socket, 1024, 1024);
To:
$this->transport = new TFramedTransport($this->socket, 1024, 1024);
diff --git a/TSocket.php b/TSocket.php
index ba3a631..ae4c6ab 100644
--- a/TSocket.php
+++ b/TSocket.php
@@ -257,9 +257,10 @@ class TSocket extends TTransport {
stream_set_timeout($this->handle_, 0, $this->recvTimeout_*1000);
$this->sendTimeoutSet_ = FALSE;
}
+ $md = stream_get_meta_data($this->handle_);
+ if ($md['unread_bytes'] > 0 && $md['unread_bytes'] < $len ) $len = $md['unread_bytes'];
$data = @fread($this->handle_, $len);
if ($data === FALSE || $data === '') {
- $md = stream_get_meta_data($this->handle_);
if ($md['timed_out']) {
throw new TException('TSocket: timed out reading '.$len.' bytes from '.
$this->host_.':'.$this->port_);
Also, make sure you are using Framed transport for Cassandra.
In your storage-conf.xml, set this to true:
And change this line in your code, from:
$this->transport = new TBufferedTransport($this->socket, 1024, 1024);
To:
$this->transport = new TFramedTransport($this->socket, 1024, 1024);
![]() |
Mike Peters, 04-08-2010 |
Still getting unexplained "timed out reading 4 bytes" errors?
Please install version 0.6 or higher, it reports the real error messages so you can tell what's wrong.
Version 0.5 will always spit out the "timed out reading 4 bytes" regardless of what the error is.
Please install version 0.6 or higher, it reports the real error messages so you can tell what's wrong.
Version 0.5 will always spit out the "timed out reading 4 bytes" regardless of what the error is.
![]() |
Jeremy Hutchings, 04-08-2010 |
Step 7 is a PHP file (though with no includes) that you run, or some other command interface ?
Not getting that bit right is likely why I'm getting :
-------
2010-04-08 08:36:58 CassandraDB ERROR: Keyspace SPI does not exist in this schema.
2010-04-08 08:36:58 CassandraDB ERROR: Keyspace SPI does not exist in this schema. Array ( )
-------
When running your demo on the later post ?
Not getting that bit right is likely why I'm getting :
-------
2010-04-08 08:36:58 CassandraDB ERROR: Keyspace SPI does not exist in this schema.
2010-04-08 08:36:58 CassandraDB ERROR: Keyspace SPI does not exist in this schema. Array ( )
-------
When running your demo on the later post ?
![]() |
Mike Peters, 04-08-2010 |
Jeremy -
When using the SPI Cassandra PHP Wrapper, Update your Cassandra /conf/storage-conf.xml, setting up the Cluster name to SPI. The default is TestServer.
This is the line to update:
<ClusterName>SPI</ClusterName>
When using the SPI Cassandra PHP Wrapper, Update your Cassandra /conf/storage-conf.xml, setting up the Cluster name to SPI. The default is TestServer.
This is the line to update:
<ClusterName>SPI</ClusterName>
![]() |
Jeremy Hutchings, 04-10-2010 |
Thank you for taking the time to answer, this a great couple of posts and making my learning curve of this technology which I *must* figure out a lot easier :)
I checked the conf.xml and I think it might of been :
<Keyspace Name="SPI">
Opposed to cluster name, though now I'm thinking there is other config I have to do to get the hello world working as mytable isn't set up :
2010-04-10 02:05:27 CassandraDB ERROR: unconfigured columnfamily mytable
2010-04-10 02:05:27 CassandraDB ERROR: unconfigured columnfamily mytable Array ( )
Is step 7 a PHP script that you run to prepare the Cassandra data store ?
I checked the conf.xml and I think it might of been :
<Keyspace Name="SPI">
Opposed to cluster name, though now I'm thinking there is other config I have to do to get the hello world working as mytable isn't set up :
2010-04-10 02:05:27 CassandraDB ERROR: unconfigured columnfamily mytable
2010-04-10 02:05:27 CassandraDB ERROR: unconfigured columnfamily mytable Array ( )
Is step 7 a PHP script that you run to prepare the Cassandra data store ?
![]() |
Adrian Singer, 04-12-2010 |
Did you setup your Cassandra conf/storage-conf.xml? The CF mytable used in this example should be defined as:
<ColumnFamily CompareWith="BytesType" Name="mytable"/>
<ColumnFamily CompareWith="BytesType" Name="mytable"/>
![]() |
Jeremy Hutchings, 04-13-2010 |
Now I did, thanks :)
Was rushing a bit there opposed to spending the time to learn the config file.
Up and running and testing now, going to add some more nodes and see what happens.
Was rushing a bit there opposed to spending the time to learn the config file.
Up and running and testing now, going to add some more nodes and see what happens.
![]() |
Landscapers Web Design, 05-07-2010 |
Its good, interesting post.
![]() |
Pieter Maes, 06-21-2010 |
Thanks a lot for the "TSocket: timed out reading 4 bytes" fix!
you spared me a headache!
you spared me a headache!
![]() |
Alberane, 07-25-2010 |
Excellent article.
I'll do some testing with cassandra and try to carry my experiences here. Thank you.
I'll do some testing with cassandra and try to carry my experiences here. Thank you.
![]() |
Mike Peters, 07-29-2010 |
To build a PHP Thrift interface for Cassandra:
(Replace /cassandra/interface with the folder where you have Cassandra installed)
./thrift -gen php /cassandra/interface/cassandra.thrift
(Replace /cassandra/interface with the folder where you have Cassandra installed)
![]() |
Suku, 08-05-2012 |
Very nice tutorial... but I searching on php and cassandra I came across to something like PDO will be required for windows installation. Can you please tell m how to install it in windows.
|

Subscribe Now to receive new posts via Email as soon as they come out.
Comments
Post your comments