Making a MySQL slave with ZERO downtime

April 10, 2013

It’s about time for this year’s MySQL (Percona) conference in San Francisco.  I haven’t written anything in some time.  So, I thought I’d write something we can talk about should you meet me.

You need to make a slave of your MySQL server.

What if, you’re bonus was tied to database up time or maybe the systems group will not grant you any down time?  You’re ONE Master DB server is so big, coping tables takes hours. Even an ‘scp’ sync takes too long.  99.999% uptime is hard to achieve.

Percona and their tools are your friend… mostly. Xtrabackup will create a ready to use copy of all your databases, document your my.cnf and your current Master Position…  unless it doesn’t.

Xtrabackup uses it’s own XtraDB engine to copy the innodb tables so it doesn’t block master transactions. (Very cool) But that just copies the innodb tables.  All the other files/tables are just copied (cp-ed).

What if, your local disk space is more than 50% full?  You will need to copy it to a different system.  Maybe that space is on a NFS server and maybe that service is mounted with automount.

Now you learn ‘cp’ is not reliable.  (Timeouts? SAN problems? I’m a DBA not a sysadmin and have you read the bastard operator from hell? )

You learn, Xtrabackup can use ‘scp’.  Now all will go…  better.  Now it only fails about one out of six times.  You learn a week later when the developers tell you they are getting odd results half the time.

Odd thing, if you start a slave with all the Innodb data intact and all database (directories) but not all the tables, the replication process will not complain.  It will just ignore the missing tables like they were “black holes”.

So you add a backup to your backup script.  (See Below, the part in RED). You use your “fixed” backup script to create a slave. It only takes six hours for the slave to catch up with the master.  All goes well…  for about a week.

 

#!/bin/bash
export HOST="db01"
export BKDIR="/mnt/archive/xtrabackup"

if [ ! -d $BKDIR/$HOST ]
then
/bin/mkdir $BKDIR/$HOST
fi

#
# archive Old backup
#
export DT=`/bin/date +20%y-%m-%d_%H-%M-%S`
cd $BKDIR/$HOST/`/bin/date +20%y`*
tar zcf $BKDIR/$HOST/$HOST-$DT.tgz ./*

#
# Remove old backups
#
/bin/find $BKDIR/$HOST/$HOST-* -mtime +1 -exec rm {} -Rf \;

#
# Remove last backup
#
rm -Rf $BKDIR/$HOST/`/bin/date +20%y`*

#
# Create backup
#
export DT=`/bin/date +20%y-%m-%d_%H-%M-%S`
/usr/bin/innobackupex --rsync $BKDIR/$HOST

#
# Check for missing files
#
export d=`ls $BKDIR/$HOST | grep ^20`
cd /data/mysql
for f in `find . -regex ".*\.\(frm\|MYD\|MYI\|MAD|\MAI\|MRG\|TRG\|TRN\|ARM\|ARZ\|CSM\|CSV\|opt\|par\)" -printf "%P\n"` ;
do
if [ ! -f $BKDIR/$HOST/$d/$f ]
then
echo coping $f
rsync $f $BKDIR/$HOST/$d/$f
fi
done
/usr/bin/xtrabackup --prepare --target-dir=$BKDIR/$HOST/$d
echo DONE!

 

You check and… All the innodb tables are all there.  Only a few “less” critical myisam tables are missing.  You take care of the missing tables with mysqldump. The app guys check again for missing data and NO JOY.   You forgot to restart the mysql slave after you duped/copied the tables and the replication engine doesn’t know about them.

 

Again Percona to the rescue. You use PT-Table-Checksum to verify all the tables are in sync with the master. The DIFFS column show all 0 (zeros).  All is good… or not.  Again you find out from the developers that “some data is missing”.  (They don’t tell you the database or table of course.)

Time for some “old school” testing.

$ for x in `mysql -h db01 -e "show databases"`; do mysql -h db01 $x -e "show tables"; done > bar
$ for x in `mysql -e "show databases"`; do mysql $x -e "show tables"; done > foo
$ diff foo bar
99a100,101
>; forecast_20130318_1705
>; forecast_20130318_2002
105a108
>; forecast_gfs_new

 

Sure enough, you turn up missing and added tables because you’re copy wasn’t quite a “Point in time”.  The developers like to do table renames with their “load data from file” processes.  The myisam data was changing between the time you started your backup and the time you created your slave.

So a few mysqldumps, tables drops, restarts and you got it.


Do you have weeks like this?

 

 

Tweet

posted in Commentary by mark

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Leave Your Comment

You must be logged in to post a comment.

 



Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org
Creative Commons License
MySQL Fan Boy by Mark Grennan is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
HOME