Recent Posts

Featured Posts
![]() |
mysqldump exclude databasesMike Peters, January 18, 2015 |
mysqldump doesn't offer an option to exclude a database from the dump, similar to --ignore-table for tables
Bash to the rescue!
This short shell script lets you exclude a list of databases
View 1 Comment(s)
Bash to the rescue!
This short shell script lets you exclude a list of databases
View 1 Comment(s)
![]() |
Monitoring services with xinetdMike Peters, August 31, 2014 |
About xinetd
xinetd (extended Internet services daemon) performs the same function as inetd: it starts programs that provide Internet services.
Instead of having such servers started at system initialization time, and be dormant until a connection request arrives, xinetd is the only daemon process started and it listens on all service ports for the services listed in its configuration file.
When a request comes in, xinetd starts the appropriate script.
Sample xinetd configuration file
Listen for connections on port 9001. Run cassandra_status shell script per each thread and return the result.
Installing xinetd
On FreeBSD:
On CentOS:
Configuring Monitoring scripts
xinetd light weight server makes it easy to test several conditions, before returning a response of "All good" to a third party monitoring script like Pingdom.
Here's a sample xinetd shell monitoring script that we use to detect if a Cassandra node is running properly.
View 4 Comment(s)
xinetd (extended Internet services daemon) performs the same function as inetd: it starts programs that provide Internet services.
Instead of having such servers started at system initialization time, and be dormant until a connection request arrives, xinetd is the only daemon process started and it listens on all service ports for the services listed in its configuration file.
When a request comes in, xinetd starts the appropriate script.
Sample xinetd configuration file
Listen for connections on port 9001. Run cassandra_status shell script per each thread and return the result.
Installing xinetd
On FreeBSD:
On CentOS:
Configuring Monitoring scripts
xinetd light weight server makes it easy to test several conditions, before returning a response of "All good" to a third party monitoring script like Pingdom.
Here's a sample xinetd shell monitoring script that we use to detect if a Cassandra node is running properly.
View 4 Comment(s)
![]() |
Optimizing NGINX and PHP-fpm for high traffic sitesAdrian Singer, April 20, 2014 |
After 7 years of using NGINX with PHP, we learned a couple of things about how to best optimize NGINX and PHP-fpm for high traffic sites.
1. TCP Sockets vs UNIX domain sockets
UNIX domain sockets offer slightly better performance than TCP sockets over loopback interface (less copying of data, fewer context switches).
If you need to support more than 1,000 connections per server, use TCP sockets - they scale much better.
2. Adjust Worker Processes
Modern hardware is multiprocessor and NGINX can leverage multiple physical or virtual processors.
In most cases your web server machine will not be configured to handle multiple workloads (like providing services as a Web Server and a Print Server at the same time) so you will want to configure NGINX to use all the available processors since NGINX worker processes are not multi-threaded.
You can determine how many processors your machine has by running:
On Linux -
On FreeBSD -
Set the worker_processes in your nginx.conf file to the number of cores your machine has.
While you're at it, increase the number of worker_connections (how many connections each core should handle) and set "multi_accept" to ON, as well as "epoll" if you're on Linux:
3. Setup upstream load balancing
In our experience, multiple upstream backends on the same machine, produce higher throughout than a single one.
For example, if you're looking to support 1,000 max children, divide that number across two backends, letting each handle 500 children:
Here are the two pools from php-fpm.conf:
4. Disable access log files
This can make a big impact, because log files on high traffic sites involve a lot of I/O that has to be synchronized across all threads.
If you can't afford to turn off access log files, at least buffer them:
5. Enable GZip
6. Cache information about frequently accessed files
7. Adjust client timeouts
8. Adjust output buffers
9. /etc/sysctl.conf tuning
10. Monitor
Continually monitor the number of open connections, free memory and number of waiting threads.
Set alerts to notify you when thresholds exceed. You can build these alerts yourself, or use something like ServerDensity.
Be sure to install the NGINX stub_status module
You'll need to recompile NGINX -
View 13 Comment(s)
1. TCP Sockets vs UNIX domain sockets
UNIX domain sockets offer slightly better performance than TCP sockets over loopback interface (less copying of data, fewer context switches).
If you need to support more than 1,000 connections per server, use TCP sockets - they scale much better.
2. Adjust Worker Processes
Modern hardware is multiprocessor and NGINX can leverage multiple physical or virtual processors.
In most cases your web server machine will not be configured to handle multiple workloads (like providing services as a Web Server and a Print Server at the same time) so you will want to configure NGINX to use all the available processors since NGINX worker processes are not multi-threaded.
You can determine how many processors your machine has by running:
On Linux -
On FreeBSD -
Set the worker_processes in your nginx.conf file to the number of cores your machine has.
While you're at it, increase the number of worker_connections (how many connections each core should handle) and set "multi_accept" to ON, as well as "epoll" if you're on Linux:
3. Setup upstream load balancing
In our experience, multiple upstream backends on the same machine, produce higher throughout than a single one.
For example, if you're looking to support 1,000 max children, divide that number across two backends, letting each handle 500 children:
Here are the two pools from php-fpm.conf:
4. Disable access log files
This can make a big impact, because log files on high traffic sites involve a lot of I/O that has to be synchronized across all threads.
If you can't afford to turn off access log files, at least buffer them:
5. Enable GZip
6. Cache information about frequently accessed files
7. Adjust client timeouts
8. Adjust output buffers
9. /etc/sysctl.conf tuning
10. Monitor
Continually monitor the number of open connections, free memory and number of waiting threads.
Set alerts to notify you when thresholds exceed. You can build these alerts yourself, or use something like ServerDensity.
Be sure to install the NGINX stub_status module
You'll need to recompile NGINX -
View 13 Comment(s)
![]() |
How to install htopAdrian Singer, April 20, 2014 |
htop is an interactive process viewer for Linux, replacing the traditional top.
Why htop?
htop provides a more interactive process-viewing experience. You can surf through running processes, scrolling horizontally and vertically to reveal information that would otherwise have been clipped, information such as full command lines.
You can see what files a process has open (click "l"); you can even trace processes using strace. There's also a handy tree view for understanding process ancestry.
Installing on Linux
CentOS:
Debian:
Installing on FreeBSD
Step 1:
Add the following line to /etc/fstab:
Step 2:
Create a symbolic link for /usr/compat
Step 3:
Compile and install
Why htop?
htop provides a more interactive process-viewing experience. You can surf through running processes, scrolling horizontally and vertically to reveal information that would otherwise have been clipped, information such as full command lines.
You can see what files a process has open (click "l"); you can even trace processes using strace. There's also a handy tree view for understanding process ancestry.
Installing on Linux
CentOS:
Debian:
Installing on FreeBSD
Step 1:
Add the following line to /etc/fstab:
Step 2:
Create a symbolic link for /usr/compat
Step 3:
Compile and install
![]() |
How to delete files when argument list too longMike Peters, January 23, 2014 |
Ever try to delete a lot of files in a folder, only to have the operation fail with "Argument list too long"?
Here's how to get it done:
FreeBSD
Note that this is a recursive search and will find (and delete) files in subdirectories as well.
Linux
View 1 Comment(s)
Here's how to get it done:
FreeBSD
Note that this is a recursive search and will find (and delete) files in subdirectories as well.
Linux
View 1 Comment(s)
![]() |
6 Months with GlusterFS: a Distributed File SystemMike Peters, August 9, 2012 |
Gluster is an open-source software-only distributed file system designed to run on commodity hardware, scaling to support petabytes of storage.
Gluster supports file system mirroring & replication, striping, load balancing, volume failover, storage quotas and disk caching.
Hesitant with the lack of glowing reviews about Gluster, we were attracted by its feature set and simple architecture.
Over the last six months, we battle-tested Gluster in production, relying on the system to deliver high-availability and geo replication, to power large scale Internet Marketing product launches.
Architecture
The Gluster architecture aggregates compute, storage, and I/O resources into a global namespace. Each server plus attached commodity storage is considered to be a node. Capacity is scaled by adding additional nodes or adding additional storage to each node. Performance is increased by deploying storage among more nodes. High availability is achieved by replicating data n-way between nodes.
Unlike other distributed file systems, Gluster runs on top of your existing file-system, with client-code doing all the work. The clients are stateless and introduce no centralized single point of failure.
Gluster integrates with the local file system using FUSE, delivering wide compatibility across any system that supports extended file attributes - the "local database" where Gluster keeps track of all changes to a file.
The system supports several storage volume configurations:
* None: Files are transparently distributed across servers, with each node adding to the total storage capacity.
* Replica: Files are replicated between two LAN drives (synchronous replication)
* Geo replica: Files are replicated between two remote drives (asynchronous replication, using rsync in the background)
* Stripe: Each file is spread across 4 servers to distribute load.
As of October 2011, development of Gluster is funded by RedHat
Installing Gluster
This is one of the areas where Gluster really shines. You can be up and running in minutes.
Step 1
Installing the FUSE client, which serves as the "glue" between Gluster and your local file system.
Step 2
Building Gluster from source
Starting Gluster and setting it to auto-start on next reboot
Step 3
Configuring your first two nodes as a Replica setup (mirroring)
On node 1 (backup1east):
On node 2 (backup2west):
Important: Make sure the name of your Gluster volume ('backup' in the example above) is different than the name of the share ('gfs' in the example above) or things will not work properly.
Our Experience
Going into this experiment, we had very high hopes for Gluster. Once proven, the goal was to replace our entire private cloud storage cluster with Gluster.
Unfortunately, we have been very disappointed with Gluster...
In spite of getting a lot of help from the Gluster community, testing different platforms and configurations, results have been consistent.
Like other users reported, we struggled with poor performance, bugs, race conditions when dealing with lots of small files, difficulties in monitoring node health and worst of all - two instances of unexplained data loss.
We ended up completely abandoning Gluster and switching back to our home-grown rsync-based solution.
As always, run your own tests to determine if this is a good fit for your needs.
Proceed with caution.
More Resources
* SlideShare Introduction to GlusterFS
* Gluster Documentation
* Gluster IRC Channel
* Gluster Blog
View 1 Comment(s)
Gluster supports file system mirroring & replication, striping, load balancing, volume failover, storage quotas and disk caching.
Hesitant with the lack of glowing reviews about Gluster, we were attracted by its feature set and simple architecture.
Over the last six months, we battle-tested Gluster in production, relying on the system to deliver high-availability and geo replication, to power large scale Internet Marketing product launches.
Architecture
The Gluster architecture aggregates compute, storage, and I/O resources into a global namespace. Each server plus attached commodity storage is considered to be a node. Capacity is scaled by adding additional nodes or adding additional storage to each node. Performance is increased by deploying storage among more nodes. High availability is achieved by replicating data n-way between nodes.
Unlike other distributed file systems, Gluster runs on top of your existing file-system, with client-code doing all the work. The clients are stateless and introduce no centralized single point of failure.
Gluster integrates with the local file system using FUSE, delivering wide compatibility across any system that supports extended file attributes - the "local database" where Gluster keeps track of all changes to a file.
The system supports several storage volume configurations:
* None: Files are transparently distributed across servers, with each node adding to the total storage capacity.
* Replica: Files are replicated between two LAN drives (synchronous replication)
* Geo replica: Files are replicated between two remote drives (asynchronous replication, using rsync in the background)
* Stripe: Each file is spread across 4 servers to distribute load.
As of October 2011, development of Gluster is funded by RedHat
Installing Gluster
This is one of the areas where Gluster really shines. You can be up and running in minutes.
Step 1
Installing the FUSE client, which serves as the "glue" between Gluster and your local file system.
Step 2
Building Gluster from source
Starting Gluster and setting it to auto-start on next reboot
Step 3
Configuring your first two nodes as a Replica setup (mirroring)
On node 1 (backup1east):
On node 2 (backup2west):
Important: Make sure the name of your Gluster volume ('backup' in the example above) is different than the name of the share ('gfs' in the example above) or things will not work properly.
Our Experience
Going into this experiment, we had very high hopes for Gluster. Once proven, the goal was to replace our entire private cloud storage cluster with Gluster.
Unfortunately, we have been very disappointed with Gluster...
In spite of getting a lot of help from the Gluster community, testing different platforms and configurations, results have been consistent.
Like other users reported, we struggled with poor performance, bugs, race conditions when dealing with lots of small files, difficulties in monitoring node health and worst of all - two instances of unexplained data loss.
We ended up completely abandoning Gluster and switching back to our home-grown rsync-based solution.
As always, run your own tests to determine if this is a good fit for your needs.
Proceed with caution.
More Resources
* SlideShare Introduction to GlusterFS
* Gluster Documentation
* Gluster IRC Channel
* Gluster Blog
View 1 Comment(s)
![]() |
How to: Install PHP w/ FPM + Memcached + GD + MySQL on FreeBSD 8Adrian Singer, November 30, 2011 |
Enjoy our step-by-step guide to configuring PHP 5 with FPM, NGinx Web server, Memcached and MySQL 5.1, on FreeBSD 8:
1. Install FreeBSD 7 compatibility and standard packages
2. Install ProFTPD
3. Install NGinx
Make sure you click to enable 'HTTP_GZIP_STATIC_MODULE', 'HTTP_SSL_MODULE' and 'HTTP_ZIP_MODULE'
You can always run make config to redo the configuration options
4. Install CURL+LibXML
5. Install MySQL client and server
6. Install GD
7. Install PHP 5
8. Install Memcached
9. Install HAProxy
10. Start MySQL and NGinx
--
Verify MySQL is working properly:
Attempt connecting to MySQL:
Verify NGinx is working properly:
Point your browser to http://1.2.3.4/ (replacing 1.2.3.4 with the PUBLIC ip address of the server)
Verify PHP is working properly:
Point your browser to http://1.2.3.4/phpinfo.php (replacing 1.2.3.4 with the PUBLIC ip address of the server).
If you see the PHP info screen, all is well
1. Install FreeBSD 7 compatibility and standard packages
2. Install ProFTPD
3. Install NGinx
Make sure you click to enable 'HTTP_GZIP_STATIC_MODULE', 'HTTP_SSL_MODULE' and 'HTTP_ZIP_MODULE'
You can always run make config to redo the configuration options
4. Install CURL+LibXML
5. Install MySQL client and server
6. Install GD
7. Install PHP 5
8. Install Memcached
9. Install HAProxy
10. Start MySQL and NGinx
--
Verify MySQL is working properly:
Attempt connecting to MySQL:
Verify NGinx is working properly:
Point your browser to http://1.2.3.4/ (replacing 1.2.3.4 with the PUBLIC ip address of the server)
Verify PHP is working properly:
Point your browser to http://1.2.3.4/phpinfo.php (replacing 1.2.3.4 with the PUBLIC ip address of the server).
If you see the PHP info screen, all is well
« Previous Posts |