Call us Toll-Free:
1-800-218-1525
Live ChatEmail us

 Sponsors

Troubleshooting Cassandra

Mike Peters, 08-31-2010
Keynotes from a great presentation titled Cassandra Troubleshooting: out of the shadows, presented by Benjamin Black at the Cassandra Summit in San Francisco two weeks ago.

The slides are here

-

Is your Ring unbalanced?

That's because when you add one node at a time using RandomPartitioner, the new nodes takes over half of the most balanced node:
32
16 16
8 8 16
8 8 8 8
4 4 8 8 8
4 4 4 4 8 8

Note that as long as you're doubling-up the size of your cluster, everything will be balanced. But when you're growing one node at
a time, the cluster will be unbalanced.

To fix: Manually assign tokens.

How do you know which tokens to assign? Use this Python script:
def tokens(nodes)
0.upto(nodes - 1) do {n}
p (n * (2**127 - 1) / nodes)
end
end

Writes are slow

Make sure your commitlog is on a separate drive.

Writes are fast. Reads keep getting slower

Step 1:

Look at iostat -x to see if you're maxing out utilization

If you are, get more nodes

Step 2:

Look at nodetool tpstats

Focus on the middle column (pending) and specifically:
* Row-Read-STage
* Message-Deserializer-pool

If these two are high (4096 is the max), it means your client is sending too many reads to this node.

Update your client or get more nodes to distribute reads.

Step 3:

Adjust memtable settings

When does a memtable get flushed to disk?

Size: When it gets to a certain size
Time: If it hasn't been flushed in x seconds
Operations: When certain operations occur

If you're flushing memtables too often, you're triggering follow-up effects (compactions, sstable merges) that is consuming a lot of bandwidth.

You want less frequent memtable flush, which leads to less frequent compaction and less disk bandwidth demand.

If memtable is not compatible with your data needs, you begin consuming huge amounts of your bandwidth on compactions.

once a minute = bad

Step 4:

Use SSDs for the disk drives. Makes no difference on the commit log drive.

I inserted a bunch of data, now my nodes are flapping

Flapping = nodes are marked down/up

Step 1:

Monitor swap (vmstat on linux, swapinfo on freebsd)

mmap takes 2gb per segment.

Swapping can delay gossip long enough to cause a node
to be marked down.

Swapping is bad.

To fix: Change DiskAccessMode in the Cassandra config file, to mmap_index_only

We avoid risking driving ourselves into swap by the JVM
allocating large chunks of mmap blocks.

Step 2:

Tell the O/S you want to avoid swapping if possible.
On FreeBSD: add this line to /etc/sysctl.conf
vm.swap_enabled=0

On Linux, echo 0 into /proc/sys/vm/swappiness

Mike Peters, 12-15-2010
Got a few requests for this one so here goes -

PHP equivalent code for calculating tokens for n nodes:


$nodes
= 5; // Change this

// Print node tokens
for ($i=1; $i<=$nodes;$i++)
{
echo
"node $i token = ";
echo
number_format (($i* (pow(2.0,127.0)-1) / $nodes),0,0,'');
echo
"\r\n";
}
Enjoyed this post?

Subscribe Now to receive new posts via Email as soon as they come out.

 Comments
Post your comments












Note: No link spamming! If your message contains link/s, it will NOT be published on the site before manually approved by one of our moderators.



About Us  |  Contact us  |  Privacy Policy  |  Terms & Conditions