Full-service Internet Marketing & Web Development
Recent Posts

Sponsors
![]() |
Troubleshooting CassandraMike Peters, 08-31-2010 |
Keynotes from a great presentation titled Cassandra Troubleshooting: out of the shadows, presented by Benjamin Black at the Cassandra Summit in San Francisco two weeks ago.
The slides are here
-
Is your Ring unbalanced?
That's because when you add one node at a time using RandomPartitioner, the new nodes takes over half of the most balanced node:
Note that as long as you're doubling-up the size of your cluster, everything will be balanced. But when you're growing one node at
a time, the cluster will be unbalanced.
To fix: Manually assign tokens.
How do you know which tokens to assign? Use this Python script:
Writes are slow
Make sure your commitlog is on a separate drive.
Writes are fast. Reads keep getting slower
Step 1:
Look at iostat -x to see if you're maxing out utilization
If you are, get more nodes
Step 2:
Look at nodetool tpstats
Focus on the middle column (pending) and specifically:
* Row-Read-STage
* Message-Deserializer-pool
If these two are high (4096 is the max), it means your client is sending too many reads to this node.
Update your client or get more nodes to distribute reads.
Step 3:
Adjust memtable settings
When does a memtable get flushed to disk?
Size: When it gets to a certain size
Time: If it hasn't been flushed in x seconds
Operations: When certain operations occur
If you're flushing memtables too often, you're triggering follow-up effects (compactions, sstable merges) that is consuming a lot of bandwidth.
You want less frequent memtable flush, which leads to less frequent compaction and less disk bandwidth demand.
If memtable is not compatible with your data needs, you begin consuming huge amounts of your bandwidth on compactions.
once a minute = bad
Step 4:
Use SSDs for the disk drives. Makes no difference on the commit log drive.
I inserted a bunch of data, now my nodes are flapping
Flapping = nodes are marked down/up
Step 1:
Monitor swap (vmstat on linux, swapinfo on freebsd)
mmap takes 2gb per segment.
Swapping can delay gossip long enough to cause a node
to be marked down.
Swapping is bad.
To fix: Change DiskAccessMode in the Cassandra config file, to mmap_index_only
We avoid risking driving ourselves into swap by the JVM
allocating large chunks of mmap blocks.
Step 2:
Tell the O/S you want to avoid swapping if possible.
On FreeBSD: add this line to /etc/sysctl.conf
On Linux, echo 0 into /proc/sys/vm/swappiness
The slides are here
-
Is your Ring unbalanced?
That's because when you add one node at a time using RandomPartitioner, the new nodes takes over half of the most balanced node:
32
16 16
8 8 16
8 8 8 8
4 4 8 8 8
4 4 4 4 8 8
16 16
8 8 16
8 8 8 8
4 4 8 8 8
4 4 4 4 8 8
Note that as long as you're doubling-up the size of your cluster, everything will be balanced. But when you're growing one node at
a time, the cluster will be unbalanced.
To fix: Manually assign tokens.
How do you know which tokens to assign? Use this Python script:
def tokens(nodes)
0.upto(nodes - 1) do {n}
p (n * (2**127 - 1) / nodes)
end
end
0.upto(nodes - 1) do {n}
p (n * (2**127 - 1) / nodes)
end
end
Writes are slow
Make sure your commitlog is on a separate drive.
Writes are fast. Reads keep getting slower
Step 1:
Look at iostat -x to see if you're maxing out utilization
If you are, get more nodes
Step 2:
Look at nodetool tpstats
Focus on the middle column (pending) and specifically:
* Row-Read-STage
* Message-Deserializer-pool
If these two are high (4096 is the max), it means your client is sending too many reads to this node.
Update your client or get more nodes to distribute reads.
Step 3:
Adjust memtable settings
When does a memtable get flushed to disk?
Size: When it gets to a certain size
Time: If it hasn't been flushed in x seconds
Operations: When certain operations occur
If you're flushing memtables too often, you're triggering follow-up effects (compactions, sstable merges) that is consuming a lot of bandwidth.
You want less frequent memtable flush, which leads to less frequent compaction and less disk bandwidth demand.
If memtable is not compatible with your data needs, you begin consuming huge amounts of your bandwidth on compactions.
once a minute = bad
Step 4:
Use SSDs for the disk drives. Makes no difference on the commit log drive.
I inserted a bunch of data, now my nodes are flapping
Flapping = nodes are marked down/up
Step 1:
Monitor swap (vmstat on linux, swapinfo on freebsd)
mmap takes 2gb per segment.
Swapping can delay gossip long enough to cause a node
to be marked down.
Swapping is bad.
To fix: Change DiskAccessMode in the Cassandra config file, to mmap_index_only
We avoid risking driving ourselves into swap by the JVM
allocating large chunks of mmap blocks.
Step 2:
Tell the O/S you want to avoid swapping if possible.
On FreeBSD: add this line to /etc/sysctl.conf
vm.swap_enabled=0
On Linux, echo 0 into /proc/sys/vm/swappiness
![]() |
Mike Peters, 12-15-2010 |
Got a few requests for this one so here goes -
PHP equivalent code for calculating tokens for n nodes:
$nodes = 5; // Change this
// Print node tokens
for ($i=1; $i<=$nodes;$i++)
{
echo "node $i token = ";
echo number_format (($i* (pow(2.0,127.0)-1) / $nodes),0,0,'');
echo "\r\n";
}
PHP equivalent code for calculating tokens for n nodes:
$nodes = 5; // Change this
// Print node tokens
for ($i=1; $i<=$nodes;$i++)
{
echo "node $i token = ";
echo number_format (($i* (pow(2.0,127.0)-1) / $nodes),0,0,'');
echo "\r\n";
}
|
|
Subscribe Now to receive new posts via Email as soon as they come out.
Comments
Post your comments

