MySQL doesn’t often crash, but, if you use MySQL on a production system you should have High Availability.
Maybe you’ve not heard of MHA. Yoshinori Matsunobu only released it in July of this year (2011). I’ve been reviewing it and I think you should too. There are a lot of people chasing after this “holy grail” and most systems are complex and / or hard to recover when they fail. MHA simple and easy to use.
MHA is a fail over tool. It’s designed to fail a master and promote a slave to a new master. (This is simply to say but you should read how it’s done.) It can monitor your master or you can manually fail over. Failing back doesn’t happen. Your expected to fix the failed master and turn it into a slave. The monitoring program (masterha_manager) will stop running after a fail over.
I’m not going to give details on how MHA works. I’ll assume you have a master and some number of slaves already working.
Your slave/s server should be ready to become masters. This means they have sever_id and bin-log set. You should be able to run ‘show master status’ on your slave/s and get an answer. Slaves should also be set with read_only=1 and realy_log_purge=0;
Preparing the way
Download the latest version of MHA from the Google Code server. http://code.google.com/p/mysql-master-ha/downloads/list
Every MySQL server needs to have a copy of the MHA node installed.
yum install perl-DBD-MySQL.x86_64
rpm -i mha4mysql-node-0.52-0.noarch.rpm
The Manageris a little more difficult. It should be run on a system outside your cluster. Should the Manager process fail your HA process fails with it. (Quis custodiet ipsos custodes?)
There are a number of required Perl modules. You may find you have to use CPAN to install these. CPAN will require you install perl-YAML, make and gcc. (I found this the easiest for CentOS 6.2) If you use CPAN you will need to tell RPM to skip dependencies.
rpm -i mha4mysql-manager-0.52-0.noarch.rpm --nodeps
This may not be needed. On some systems, because I installed on a x64 version (as you should be too) I found I had to copy the MHA libraries to the perl lib64 space. You might want to make simlinks instead.
cp -crp /usr/lib/perl5/vendor_perl/MHA /usr/lib64/perl5/vendor_perl
You only need one configuration file if you are not going to monitor multiple Master/Slave clusters. You can name this configuration file anything you want. Change these setting and make sure the directories exist and are writable.
On the server where the Manager will be running:
vi /etc/MHA.cnf [server default] user=hauser password=P@ssw0rd manager_workdir=/var/log/masterha manager_log=/var/log/masterha/MHA.log remote_workdir=/var/log/masterha [server1] hostname=db1 [server2] hostname=db2
The MHA will need access to each server so you need to add the user to each. On all MySQL server add the hauser.
mysql> grant all on *.* to 'hauser'@'192.168.%.%' identified by 'P@ssw0rd'; mysql> flush privileges;
Now is a good time to make sure all the server names resolve on each system and they know their own names.
MHA also need ssh access to each server. You need to create ssh keys and copy to the mysql servers.
Don’t do these steps if you already have ssh keys installed.
ssh-keygen -t dsa -f ~/.ssh/id_dsa -N ""
cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
scp -r ~/.ssh db2:
Check if you have ssl working with the masterha_check_ssh tool. Look for the OK after the connect test.
From the Manager server:
masterha_check_ssh --conf=/etc/MHA.cnf Thu Dec 1 10:10:52 2011 - [info] Starting SSH connection tests.. Thu Dec 1 10:10:53 2011 - [debug] Thu Dec 1 10:10:52 2011 - [debug] Connecting via SSH from root@db1(192.168.0.11) to root@db2(192.168.2.12).. Thu Dec 1 10:10:53 2011 - [debug] ok.
MHA need to control replication so you should test it with masterha_check_repl. This will outout a lot of chat. Your looking for the ‘MySQL Replication Health is OK’ at the end. You might also pay attention to any warnings.
masterha_check_repl --conf=/etc/MHA.cnf --- Thu Dec 1 10:07:22 2011 - [info] Checking slave configurations.. Thu Dec 1 10:07:22 2011 - [warning] read_only=1 is not set on slave db2(192.168.0.11:3306). Thu Dec 1 10:07:22 2011 - [warning] relay_log_purge=0 is not set on slave db2(192.168.0.12:3306). --- MySQL Replication Health is OK.
You now have MHA installed. If you configure it wrong MHA will give some errors that are not helpful and die. Check your settings again.
If you want MHA to monitor the status of your master server you need to run masterha_manager. It does not normally run as a daemon. If masterha_manage is not running your server will not automatically fail over. This is a weakness in MHA. You might want to use something like Pacemaker(Heartbeat) to monitor your systems and MHA to do the fail over. Here is how I make MHA run in the background.
nohup masterha_manager --conf=/etc/MHA.cnf &
You can check masterha_manager with masterha_check_status.
Now the magic. If you want to manual fail over you can use masterha_master_switch and tell it the master is alive.
masterha_master_switch --master_state=alive --conf=/etc/MHA.cnf
If your using another monitoring system like Pacemaker you can have it run this command to automatically fail your master.
masterha_master_switch --master_state=dead --dead_master_host=db1 --conf=/etc/MHA.cnf
If you find I’ve missed something or it works different on your OS please drop me a comment.