Call us Toll-Free:
1-800-218-1525
Live ChatEmail us

Cassandra for PHP Sessions

Hojda Vasile Dan, January 26, 2011
Building on Dawn's Memcached for PHP sessions post, we've now converted our php-sessions handling from Memcached to Cassandra.

Cassandra supports built-in caching, sharding & replication and scales to infinity, overcoming the shortcomings of the memcached-for-sessions approach.

Click here to download the new dbsession.php and here to download common_cassandra.php

redis: a persistent key-value store

Mike Peters, November 5, 2010
redis is a key-value store, similar to memcached but with data persistence.

redis supports multiple value types, counters with atomic increment/decrement and built-in key expiration.

To achieve persistence without scarifying speed, like Cassandra, redis performs updates in memory as well as adding them to an append-only file, which is synced to disk from time to time.

redis is fast (110,000 writes per second, 81,000 reads per second), supports sharding and master-slave replication (no master-master yet)

Why redis?

Those of you keeping track, know we've always been big fans of MySQL but at the same time, we keep writing about migrating different parts of our application to Memcached, Cassandra, Lucene and ElasticSearch.

Why do we keep jumping from one storage engine to another? Can't we make up our minds already and settle with the "best" storage engine that meets our needs?

In short, No.

A common misconception is the belief that all storage engines are created equal, all designed to simply "store stuff" and provide fast access to your data. Unless your application performs one clearly defined simple task, it is a dire mistake to expect a single storage engine will effectively fulfill all of your data warehousing and processing needs.

* MySQL is great when you need ad-hoc queries and you're dealing with a relatively small data set.

* Memcached comes into play when you have a read-heavy environment and need a quick volatile cache to avoid querying MySQL a dozen times per page.

* Lucene and ElasticSearch are your friends when you need fulltext search, or when your MySQL data set grows to a point where running the filters and joins in MySQL becomes slow like a snail.

* Cassandra is amazing when you have a write-heavy environment, need to be capable of scaling writes linearly and supporting a huge data set.

* redis works particularly well as a state machine, when you need counters with atomic increment/decrement. Typical uses: "how many users are on my website?" ala ChartBeat, "how many jobs are waiting to be processed" etc.

Architecture

redis is written in Ansi C and runs as a single light-weight daemon on your machine. All updates are done in memory first and saved to disk later, asynchronously.

Supported languages: C, C#, Erlang, Java, JavaScript, Lua, Perl, PHP, Python, Ruby, Scala, Go, and Tcl.

As of 15 March 2010, development of redis is funded by VMware.

Installing redis

Step 1

Download the redis tarball and extract it


wget
"http://redis.googlecode.com/files/redis-2.0.3.tar.gz"
tar xvzf redis-2.0.3.tar.gz

Step 2

Compile redis and install it


gmake
gmake install

Step 3

Run redis


/usr/local/bin/redis-server

Once running you can use redis-benchmark to run some benchmarks.

redis doesn't come with a config file, it will use all default settings by default. But you're going to want to study the config options and set them up.

Sample redis config file here

Configuration options

A few important redis.conf options you're going to want to set:

* First, if you will only be connecting to a local Redis instance, uncomment the bind configuration in the sample file:


bind 127.0.0.1

That tells Redis not to listen for external connections.

* redis supports multiple databases, but for most use cases, you're only going to need one. Change the default 16 to 1:


databases 1

* You can set the maximum number of bytes Redis can allocate, after which it will start purging volatile keys. If it cannot reclaim any more memory it will start refusing write commands. Here's a sample setting for a 100MB limit:


maxmemory 104857600

* The server will periodically fork and asynchronously dump the current contents of the database to disk. The dump is actually made to a temporary file and then moved to replace any older dump, so the operation is atomic and won't leave you with a partially dumped database. If Redis is eventually shutdown and reloaded, it will restore from this dump file.

How often it dumps the keys is configureable by the amount of time that passes and the number of changes that have been made to the data. For example, the following settings tell Redis to dump the database after 60 seconds if 100 changes have been made or after five minutes if there has been at least 1 change:


save 300 1
save 60 100

* By default redis starts in foreground mode. To fix that, change demonize option in redis.conf file to "Yes":


demonize yes

To Redis or not to Redis?

If you have a large data set that cannot comfortably fit into RAM, Redis is not the key value store for you to use, but if you have smaller sets, and if you can live with the asynchronous write behavior, then, for me, the answer is definitely "to Redis."

As an alternative, Tokyo Cabinet is very fast for a synchronous key value store, and it does support some features that Redis does not, such as tables. Redis permits a master/slave setup, which can alleviate fears of data loss from failure, but it's not as certain as something like Tokyo Cabinet, which will write the data as soon as it gets it. On the other hand, Redis is blazingly fast, incredibly easy to use, and will support just about anything you can think of doing with your data.


More resources:

* Try redis in your browser
* Download redis
* Retwis - a PHP twitter clone using redis

View 4 Comment(s)

How to hide .php extension in your urls with Nginx

Adrian Singer, November 4, 2010
Looking for clean urls (/hello instead of /hello.php)?

Here's how to set it up:

Step 1

Create a notfound.php script and place it in your root web server folder


// Set this for easier access
$url = substr($REQUEST_URI,1);

// Strip parameters
if (($pos = strpos($url,"?"))>0)
{
   
$url_parameters = substr($url, $pos+1);
   
$url = substr($url, 0, $pos);
}
$url = trim(strtolower($url));

// Strip prefix and suffix '/'
if ($url[0]=='/') $url = substr($url,1);
if (
strlen($url)>1)
if (
$url[strlen($url)-1]=='/') $url = substr($url, 0, strlen($url)-1);

// If url starts with .. it's a hack attempt
if (Strcasecmp(substr($url,0,2),"..")==0)
{
 
$url = str_replace("..","",$url);
}

// If we have a php script with this name
if (file_exists($url.".php"))
{
 
// Set PHP_SELF and REQUEST_URI to point to the real script
 
$_SERVER['PHP_SELF'] = $PHP_SELF = $_SERVER['REQUEST_URI'] = $REQUEST_URI = "/".$url;
  if (!empty(
$url_parameters)) $_SERVER['REQUEST_URI'] = $REQUEST_URI .= "?".$url_parameters;

 
// Load real php script
 
require($DOCUMENT_ROOT."/$url.php");
  return;
}

Step 2

Update your Nginx nginx.conf file, rewriting all urls where the file is not found, to notfound.php


    location
/
    {
      if (-
d $request_filename)
      {
        break;
      }
      if (!-
f $request_filename)
      {
       
rewrite ^(.*)$ /notfound.php?$1 last;
        break;
      }
    }

Note: This is different than doing an error_document 404 redirect. With a 404 redirect, HTTP_POST data is not preserved.

View 4 Comment(s)

Three different ways of handling a problem

Mike Peters, October 25, 2010
Presented with a new problem, I noticed engineers can be divided to three groups:

1. I don't know, don't understand, not familiar with this part of the code

Unless you're on your first few weeks with the company, this statement is totally inexcusable. It's lazy and pathetic.

Be a Problem solver.

Waving your hands in the air announcing to the world you're not familiar with some code, does nothing more than exhibit your incompetence in picking up something new and running with it.

How comfortable will your manager, peers and clients feel, once you've made such a statement?

Be a Problem solver. Study the code, follow the execution process flow, find engineers who are more familiar than you and ask them specific questions.

Don't stop probing and asking until you've mastered the code.

Unless you fully understand the code, problem at hand and the big picture implications, you're better off not touching it at all.

2. Quick and dirty: Let me patch it up real nice. In & out as quickly as possible.

Time is of the essence, right?

How about doing the absolute minimum, so the problem can be patched up and you can move on with more important pressing items on your todo list?

Wrong.

I'm a big proponent of code clarity and elegance.

Taking shortcuts, especially when working in an agile development environment, will come back to bite you in the butt.

Why I hate patches:

* Patches clutter the code, which means the next guy will have to struggle twice as hard to understand what's going on

* Patches are often specific to dealing with one edge case, one facet of the problem. Slightly different variation of the same problem and you're back to square one.

* Patching up code is often a lazy act done by someone who didn't want to take the time to understand the big picture. Which means, chances are the patch can break other perfectly valid scenarios.

Take the time to do things right, the first time around.

The difference between patching up a problem and doing it right? That brings me to the third type of engineers, those who still have a job.

3. The right way: Complete, Elegant and Short

How do you know when you've fully mastered a problem and came up with the best possible solution?

Think Occam's Razor

* When your solution is clean, simple and easy to understand;
* When you cannot make it any better had you had unlimited time on your hands;
* When your solution doesn't handle a single edge case of the problem, but rather completely eliminate it;
* When you feel your code should go up on the code hall of fame;

...that's when you're a real code ninja.

Be careful though. It's very easy to take this the way of over-engineering things.

If you're adding complexity instead of taking things away and simplyfing, you're doing it all wrong.

This doesn't mean you should re-engineer an entire architecture from scratch, only to make things cleaner.

It's okay to take baby steps, just make sure your contribution to the code, is the single most Elegant, Complete and Short solution.

-

A simple example

MySQL database server used to store timestamps in this format:
YYYYMMDDHHIISS

For example, 7am October 25th, 2010, would be represented as: 20101025070000

Starting with MySQL 5.1, the way timestamps are represented in the database changed to: YYYY-MM-DD HH:II:SS

The same date, is now represented as: 2010-10-25 07:00:00

When no date is set, MySQL 5.1 used 0000-00-00 00:00:00 whereas previous versions of the MySQL database used 00000000000000.

A few sections of our code had to test whether or not a timestamp was set.

The original php code looked something like this:


// If no timestamp
if ($timestamp=="00000000000000")
{
// Do something
}

With the transition to MySQL 5.1, a new problem surfaced, where new records used "0000-00-00 00:00:00" to indicate no date was set, while old denormalized data still used "00000000000000".

Three different engineers approached this simple problem of supporting both the old and new timestamp formats.

I've included an excerpt of what each had to say below.

Engineer 1:
Quote:
I'm not too familiar with MySQL 5.1, but I don't think we can support the old timestamps any more.

Engineer 2:
Quote:
Fixed! And it only took a minute to do.

I'm now testing for both cases:

// If no timestamp
if ($timestamp=="00000000000000" || $timestamp=="0000-00-00 00:00:00")

Engineer 3:
Quote:
To keep the code clean, handle other cases (where we have 0000-00-00 with no hour/minute/second) and make it easy to adjust in the future, I created a new function IsEmptyTimestamp() and updated the code accordingly:

// If no timestamp
if (IsEmptyTimestamp($timestamp))

* First guy is out.

* Second guy cluttered the code, didn't properly handle other edge cases and didn't think about the future.

* Third one passed with flying colors.

Closing Thoughts

When I originally wrote this piece, I was thinking software development.

I realize now, after reading this again, that the same traits described here, apply to any area in business.

Be a true Problem solver.

It's the single most important skill you'll ever develop.

View 2 Comment(s)

How to mount /proc on FreeBSD

Michel Nadeau, September 27, 2010
There are a few commands under FreeBSD that depend on procfs (process file system).

FreeBSD doesn't mount it by default.

This tutorial describes how to mount /proc on FreeBSD and how to get FreeBSD to do it automatically when rebooting.

1. Mounting /proc

To mount /proc, run the following command:

mount -t procfs proc /proc

Applications and commands depending on procfs will now work correctly.

2. Mount /proc automatically when rebooting

To get /proc to be mounted automatically when rebooting, simply add this line in /etc/fstab:

proc /proc procfs rw 0 0

There you go, /proc will now be mounted automatically at boot time.

How to protect against DDoS Attacks

Mike Peters, September 22, 2010
DDos (Denial Of Service) Attacks are distributed hits to your server, coming from multiple sources at the same time.

Unlike an attack from a single location, where the source IP address can be blocked on the firewall level, denial of service attacks are very difficult to stop.

DDoS attacks recently silenced the MPAA, Aiplex and took down Malaysian Government critics. DDoS attacks are back and they're bigger than ever.

New technology makes it too easy to launch low orbit ion cannon attacks and bring sites down to their knees.

Here are a few simple things you can do to protect your servers against a DDoS attack:

Have a contingency plan

Much like recovering from a failed harddrive, you need to plan ahead.

Avoid single points of failure and make sure you have at least two separate machines, running your web-servers and databases. Multi-homed hosting can really help here.

When the attack comes in, you'll be able to switch the ip address until the storm calms down.

Disable Ping-flood attacks

Add this to your /etc/sysctl.conf:
net.inet.icmp.bmcastecho=1
net.inet.icmp.icmplim=1

And run this on the shell to apply the changes immediately:
sysctl net.inet.icmp.icmplim=1
sysctl net.inet.icmp.bmcastecho=1

Use obscure ports for anything other than HTTP

Change your MySQL (/etc/my.cnf port=1234), FastCGI and all other daemons to run on unique port numbers.

Install all the latest security patches

Duh!

Use private ip addresses for inter-server communications

If you have more than one machine on the same LAN, use the LAN private ip addresses to communicate between the machines.

This is particularly helpful when your data-center decides to null-route the public-facing ip address of your database server (why is it open in the first place?) and you want to allow the web server to continue communicating with the database uninterrupted.

Using private LAN ip addresses is more efficient and ensures no interruptions in case your public-facing ip address gets null-routed.

Use a Firewall

Hardware firewalls rock, but the good ones can get very expensive to acquire and manage.

These two software firewalls are great for brute force detection and advanced policies that can detect anomalies common to DDoS attacks: APF and BFD. Both are from R-FX Networks
« Previous Posts » Next Posts



About Us  |  Contact us  |  Privacy Policy  |  Terms & Conditions