Full-service Internet Marketing & Web Development
Recent Posts

Sponsors
![]() |
MySQL Replication MonitorDawn Rossi, 02-26-2009 |
What is MySQL Replication
MySQL Replication is a great way to improve database performance and availability, by creating live copies of the database on multiple machines.
Read our article about How to setup MySQL Replication for more information about setting up replication.
There are two primary issues you have to deal with, when using MySQL replication:
#1. Replication can often break
When a query cannot process properly on a slave (due to duplicate key or another MySQL error), replication will stop running until you manually resume it.
To avoid replication breaking due to duplicate keys, make sure your /etc/my.cnf configuration file includes these two lines:
slave_exec_mode = IDEMPOTENT
slave-skip-error =1062
#2. Replication can lag
Since replication is serialized (slave runs all queries on a single thread, while master is multi threaded), the slave machine can often fall behind the master. This leads to inconsistency in data, especially when using a master-master replication setup.
-
The issue with monitoring MySQL Replication
The most popular method to monitor MySQL replication, is to use MySQL built-in SHOW SLAVE STATUS command.
The two important fields are 'Slave_IO_Running' and 'Seconds_Behind_Master', indicating whether the slave thread is running and how many seconds it is lagging behind the master.
Unfortunately, the 'Seconds_Behind_Master' estimate is very unreliable, especially when you have a slow network or when the master/slave servers are not running on the same LAN.
The reason for the inaccuracy lies in the way MySQL calculates 'Seconds_Behind_Master'. All it is doing is subtracting the position of its internal SQL log from the IO log.
In our tests, we measured cases where a slave was 10 minutes behind the master and yet 'Seconds_Behind_Master' returned 0 seconds.
A better approach to Monitoring MySQL Replication
A more reliable approach to measure replication lag,take advantage of MySQL SYSDATE function.
If you insert NOW at 12:00:04 on the master the row will hold exactly 12:00:04 on the slave, regardless of the slave local time. However, the SYSDATE function does not follow this behavior. It always uses the value of the slave's system clock.
If you insert a row with one column holding the value of NOW or CURRENT_TIMESTAMP and the other holding the value of SYSDATE into the master, you can use the difference between the two values on the slave to see how far behind it is. If the slave is in sync the two values will be identical. If the slave is one second behind the column holding SYSDATE will be one second ahead of the column holding NOW. No polling is required to determine the current lag.
Implementation
Step 1: Create Heartbeat table
CREATE TABLE mysql_heartbeat (master_host char(50) default '' not null, master_time timestamp default CURRENT_TIMESTAMP, slave_time timestamp not null, index(master_host));
Step 2: Setup heartbeat cron
Install a script on a cronjob that will run this query on the master server, every 60 seconds
insert into mysql_heartbeat(master_host, slave_time) values('127.0.0.1',SYSDATE());
(Replace 127.0.0.1 with the master host name or ip address)
Step 3: Monitor script
Run this script to display server lag over a given time frame
select master_host,master_time,timediff(slave_time,master_time) from mysql_heartbeat where date(master_time)=date(now()) group by master_host,master_time order by master_host,master_time;
MySQL Replication is a great way to improve database performance and availability, by creating live copies of the database on multiple machines.
Read our article about How to setup MySQL Replication for more information about setting up replication.
There are two primary issues you have to deal with, when using MySQL replication:
#1. Replication can often break
When a query cannot process properly on a slave (due to duplicate key or another MySQL error), replication will stop running until you manually resume it.
To avoid replication breaking due to duplicate keys, make sure your /etc/my.cnf configuration file includes these two lines:
slave_exec_mode = IDEMPOTENT
slave-skip-error =1062
#2. Replication can lag
Since replication is serialized (slave runs all queries on a single thread, while master is multi threaded), the slave machine can often fall behind the master. This leads to inconsistency in data, especially when using a master-master replication setup.
-
The issue with monitoring MySQL Replication
The most popular method to monitor MySQL replication, is to use MySQL built-in SHOW SLAVE STATUS command.
Quote:
|
show slave status ; Slave_IO_State: Waiting for master to send event Master_Log_File: mysql-bin.000017 Read_Master_Log_Pos: 572720266 Relay_Log_File: api4-relay-bin.000034 Relay_Log_Pos: 2424716 Relay_Master_Log_File: mysql-bin.000017 Slave_IO_Running: Yes Slave_SQL_Running: Yes Exec_Master_Log_Pos: 572720266 Relay_Log_Space: 97695432 Seconds_Behind_Master: 0 |
The two important fields are 'Slave_IO_Running' and 'Seconds_Behind_Master', indicating whether the slave thread is running and how many seconds it is lagging behind the master.
Unfortunately, the 'Seconds_Behind_Master' estimate is very unreliable, especially when you have a slow network or when the master/slave servers are not running on the same LAN.
The reason for the inaccuracy lies in the way MySQL calculates 'Seconds_Behind_Master'. All it is doing is subtracting the position of its internal SQL log from the IO log.
In our tests, we measured cases where a slave was 10 minutes behind the master and yet 'Seconds_Behind_Master' returned 0 seconds.
A better approach to Monitoring MySQL Replication
A more reliable approach to measure replication lag,take advantage of MySQL SYSDATE function.
If you insert NOW at 12:00:04 on the master the row will hold exactly 12:00:04 on the slave, regardless of the slave local time. However, the SYSDATE function does not follow this behavior. It always uses the value of the slave's system clock.
If you insert a row with one column holding the value of NOW or CURRENT_TIMESTAMP and the other holding the value of SYSDATE into the master, you can use the difference between the two values on the slave to see how far behind it is. If the slave is in sync the two values will be identical. If the slave is one second behind the column holding SYSDATE will be one second ahead of the column holding NOW. No polling is required to determine the current lag.
Implementation
Step 1: Create Heartbeat table
CREATE TABLE mysql_heartbeat (master_host char(50) default '' not null, master_time timestamp default CURRENT_TIMESTAMP, slave_time timestamp not null, index(master_host));
Step 2: Setup heartbeat cron
Install a script on a cronjob that will run this query on the master server, every 60 seconds
insert into mysql_heartbeat(master_host, slave_time) values('127.0.0.1',SYSDATE());
(Replace 127.0.0.1 with the master host name or ip address)
Step 3: Monitor script
Run this script to display server lag over a given time frame
select master_host,master_time,timediff(slave_time,master_time) from mysql_heartbeat where date(master_time)=date(now()) group by master_host,master_time order by master_host,master_time;
![]() |
Dawn Rossi, 02-26-2009 |
You could even setup nice charts to display the lag over a period of time:
select master_host,master_time,time_format(timediff(slave_time,master_time),'%H'),
time_format(timediff(slave_time,master_time),'%s')
from mysql_heartbeat where
date(master_time)=date(now()) and master_host='127.0.01'
group by master_host,master_time order by master_time desc limit 400

select master_host,master_time,time_format(timediff(slave_time,master_time),'%H'),
time_format(timediff(slave_time,master_time),'%s')
from mysql_heartbeat where
date(master_time)=date(now()) and master_host='127.0.01'
group by master_host,master_time order by master_time desc limit 400

![]() |
XL, 06-02-2009 |
Quote: "The reason for the inaccuracy lies in the way MySQL calculates 'Seconds_Behind_Master'. All it is doing is subtracting the position of its internal SQL log from the IO log"
This is wrong.
Each log event contains a timestamp indicating the time that it was created on the master. The seconds_behind_master are calculated by subtracting this timestamp from the time that the log event is executed on the slave.
Of course this will only work when master and slave have synchronized clocks (but you want to have those anyway). A constant clock offset is accounted for.
This is wrong.
Each log event contains a timestamp indicating the time that it was created on the master. The seconds_behind_master are calculated by subtracting this timestamp from the time that the log event is executed on the slave.
Of course this will only work when master and slave have synchronized clocks (but you want to have those anyway). A constant clock offset is accounted for.
|
|
Subscribe Now to receive new posts via Email as soon as they come out.
Comments
Post your comments



