In theory, MySQL replication should work without problems even if the servers are restarted, databases created and removed, data flowing in and out… Sometimes, however, there are errors that are breaking the replication process. It is good to know once such situation appears.
Understanding the basics of replication status messages
The main method to check the replication status is to execute
SHOW SLAVE STATUS \G on the replication slave server. The “\G” switch changes the result displaying layout – from the horizontal table to the vertical one (there is one row only, so it is much easier to review this way).
The main elements of the output we should check are Slave_IO_Running, Slave_SQL_Running, and Seconds_Behind_Master. The first two should read “Yes” and the third one should be a reasonable value (in seconds). If Slave_IO_Running or Slave_SQL_Running are not set to “Yes”, you can review values in Last_Error, Last_IO_Error, and Last_SQL_Error. This should give you the clue on what is going on.
When talking of Seconds_Behind_Master – this value may vary. For most of the small installations, it should be almost always zero or near zero. For bigger servers or a significant amount of data being inserted or adjusted, it can reach higher values. The number is provided in seconds. In most cases, I’m using 300 as the border value between “OK” state and “something wrong”. We will use this in the script below.
If your MySQL server is allowing you to connect from the command line without a password, you can skip this step (but it is still recommended from the security point of view). For remote servers or the servers that require a password to connect from localhost, you should create a separate user only for this purpose. Such a user will be granted only necessary privileges.
To create such user execute the commands in MySQL:
CREATE USER 'replicationStatus'@'localhost' IDENTIFIED BY 'password'; GRANT SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicationStatus'@'localhost';
The new user will be able to check the status of the replication, show database names but will not be able to retrieve or remove the data. Of course, in the example above you should replace the password with something more complicated.
Checking replication status
Here is the script:
#!/bin/bash # # Set the maximum number of seconds behind master that will be ignored. # If the slave is be more than maximumSecondsBehind, an email will be sent. # maximumSecondsBehind=300 # # Checking MySQL replication status... # mysql -u replicationStatusLocal -p'password' -e 'SHOW SLAVE STATUS \G' | grep 'Running:\|Master:\|Error:' > replicationStatus.txt # # displaying results, just in case you want to see them # echo "Results:" cat replicationStatus.txt # # checking parameters # slaveRunning="$(cat replicationStatus.txt | grep "Slave_IO_Running: Yes" | wc -l)" slaveSQLRunning="$(cat replicationStatus.txt | grep "Slave_SQL_Running: Yes" | wc -l)" secondsBehind="$(cat replicationStatus.txt | grep "Seconds_Behind_Master" | tr -dc '0-9')" # # Sending email if needed # if [[ $slaveRunning != 1 || $slaveSQLRunning != 1 || $secondsBehind -gt $maximumSecondsBehind ]]; then echo "" echo "Sending email" mail -s "MyServer.com - replication issue found" email@example.com < replicationStatus.txt else echo "" echo "Replication looks fine." fi
You can adjust the maximumSecondsBehind to the value of your choice. If you will notice that the alert is triggered too frequently, you can change this value to a higher number. You should also set the proper password in the MySQL execution string. As the last step, you should adjust an email subject and address in mail execution part.
The results of the status check are saved to the text file because I’m using it also for other reports. This allows me also to easily manipulate the values and check them using simple grep. If you need a more complex report or other values to be checked, you can simply change the grep filter on the MySQL execution string and the processing of the values.