MySQL Fanboy » Tips & Tricks

Does Size or Type Matter?

mark — Tue, 27 Jul 2010 18:08:16 +0000

MySQL seems to be happy to convert types for you. Developers are rushed to complete their project and if the function works they just move on. But what is the costs of mixing your types? Does it matter if your are running across a million rows or more? Lets find out.

Here is what the programmers see.

mysql> select 1+1;
+-----+
| 1+1 |
+-----+
|   2 |
+-----+
1 row in set (0.00 sec)

mysql> select "1"+"1";
+---------+
| "1"+"1" |
+---------+
|       2 |
+---------+
1 row in set (0.00 sec)

Benchmark

What if we do a thousand simple loops? How long does the looping itself take?

The BENCHMARK() function executes the expression expr repeatedly count times. It may be used to time how quickly MySQL processes the expression. The result value is always 0.

mysql> select benchmark(1000000000, 1);
+--------------------------+
| benchmark(1000000000, 1) |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (5.42 sec)

mysql> select benchmark(1000000000, "1" );
+-----------------------------+
| benchmark(1000000000, "1" ) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (5.40 sec)

So maybe type doesn’t matter? About five seconds just to loop but the type didn’t change it. What if we add 1+”1″?

mysql> select benchmark(1000000000, 1+1);
+----------------------------+
| benchmark(1000000000, 1+1) |
+----------------------------+
|                          0 |
+----------------------------+
1 row in set (12.65 sec)

mysql> select benchmark(1000000000, 1+"1");
+------------------------------+
| benchmark(1000000000, 1+"1") |
+------------------------------+
|                            0 |
+------------------------------+
1 row in set (35.58 sec)
mysql> select benchmark(1000000000, "1"+"1");
+--------------------------------+
| benchmark(1000000000, "1"+"1") |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (51.59 sec)
It looks like type does matter.  But does it always matter?
mysql> select benchmark(1000000000, sum(1+1));
+---------------------------------+
| benchmark(1000000000, sum(1+1)) |
+---------------------------------+
|                               0 |
+---------------------------------+
1 row in set (9.69 sec)

mysql> select benchmark(1000000000, sum("1"+"1"));
+-------------------------------------+
| benchmark(1000000000, sum("1"+"1")) |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
1 row in set (9.94 sec)

mysql> select benchmark(1000000000, sum("1.23456789"+"1.23456789"));
+-------------------------------------------------------+
| benchmark(1000000000, sum("1.23456789"+"1.23456789")) |
+-------------------------------------------------------+
|                                                     0 |
+-------------------------------------------------------+
1 row in set (10.32 sec)
So, not all functions are the same.  But it looks like size might matter!
mysql> select benchmark(1000000000, 1.1+1.1);
+--------------------------------+
| benchmark(1000000000, 1.1+1.1) |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (34.90 sec)

mysql> select benchmark(1000000000, "1.1"+"1.1");
+------------------------------------+
| benchmark(1000000000, "1.1"+"1.1") |
+------------------------------------+
|                                  0 |
+------------------------------------+
1 row in set (1 min 15.32 sec)

mysql> select  benchmark(1000000000, "1.123456789"+"1.123456789");
+----------------------------------------------------+
| benchmark(1000000000, "1.123456789"+"1.123456789") |
+----------------------------------------------------+
|                                                  0 |
+----------------------------------------------------+
1 row in set (1 min 53.32 sec)
Sorry.  Looks like size does matter.

This doesn't seem logical.
mysql> select benchmark(1000000000, 1=1);
+----------------------------+
| benchmark(1000000000, 1=1) |
+----------------------------+
|                          0 |
+----------------------------+
1 row in set (12.75 sec)

mysql> select benchmark(1000000000, 1="1");
+------------------------------+
| benchmark(1000000000, 1="1") |
+------------------------------+
|                            0 |
+------------------------------+
1 row in set (40.78 sec)
mysql> select benchmark(1000000000, 1=true);
+-------------------------------+
| benchmark(1000000000, 1=true) |
+-------------------------------+
|                             0 |
+-------------------------------+
1 row in set (12.73 sec)

mysql> select benchmark(1000000000, 1="true");
+---------------------------------+
| benchmark(1000000000, 1="true") |
+---------------------------------+
|                               0 |
+---------------------------------+
1 row in set, 65535 warnings (3 min 5.72 sec)
mysql> select benchmark(1000000000, "true"="true");
+--------------------------------------+
| benchmark(1000000000, "true"="true") |
+--------------------------------------+
|                                    0 |
+--------------------------------------+
1 row in set (57.25 sec)
Maybe we should CAST our work?
mysql> select benchmark(1000000000, cast("1" as unsigned));
+----------------------------------------------+
| benchmark(1000000000, cast("1" as unsigned)) |
+----------------------------------------------+
|                                            0 |
+----------------------------------------------+
1 row in set (32.27 sec)

mysql> select benchmark(1000000000, cast("1" as unsigned) + cast("1" as unsigned));
+----------------------------------------------------------------------+
| benchmark(1000000000, cast("1" as unsigned) + cast("1" as unsigned)) |
+----------------------------------------------------------------------+
|                                                                    0 |
+----------------------------------------------------------------------+
1 row in set (1 min 7.24 sec)

Maybe not!
Conclusion: Be careful with your data types. If you are taking user input, do the type conversion ONCE in your program. Don’t let MySQL do the type conversions for you.
query = “SELECT * FROM table where $INPUT = 1″; could be doing your wrong.

References:

http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_benchmark

http://dev.mysql.com/doc/refman/5.0/en/numeric-type-overview.html

http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html

Federated Tables

mark — Wed, 07 Jul 2010 15:55:47 +0000

Your searching for how to create a join across two databases on two different servers and it can’t be done directly. select d1.a, d2.b from db1@server1 join db2@server2 where db1.c = db2.c; does not work.

You learn about federated databases. The federated storage engine allows accesses data in tables of remote databases. Now how do you make it work?

1) Check if the federated storage engine is supported. Federation is OFF by default!

mysql> show engines;
+------------+---------+----------------------------------------------------------------+
| Engine     | Support | Comment                                                        |
+------------+---------+----------------------------------------------------------------+
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys     |
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance         |
| BLACKHOLE  | YES     | /dev/null storage engine (anything you write to it disappears) |
| CSV        | YES     | CSV storage engine                                             |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables      |
| FEDERATED  | YES     | Federated MySQL storage engine                                 |
| ARCHIVE    | YES     | Archive storage engine                                         |
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                          |
+------------+---------+----------------------------------------------------------------+

If it is not “Support”ed (on) you need to add ‘federated=ON‘ to the [mysqld] section of your /etc/my.cnf file. I found this section to be a bit troublesome. It must be ‘=ON’ not ‘=YES” or even ‘=on’. Most options allow these but the federated options is picky. I’m running MySQL Enterprise 5.1.37.sp1.

2) If you don’t already have the database created, create the database on the storage server. By ‘storage server’ I mean the one where the data will be written to disk.

I like to create a user just for the purpose of connection the federated copy of the database to the true database. This way, if the password gets changed or the user deleted, the federated system can continue to connect.

mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
 Name VARCHAR(20),
 case TINYINT(3),
) ENGINE = INNODB;

3) Now you can create the federated version of your data on the remote system.

mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
 Name VARCHAR(20),
 case TINYINT(3),
) ENGINE = FEDERATED
CONNECTION = 'mysql://skiner:c0nsper@fbi/xfiles/cases';

4) Check your work. The table status should show Engine: FEDERATED.

mysql> use xfiles;
mysql> show table status\G

Now you can add records to the table and the data should show up in select on either server.

Enjoy.

MyTOP Upated

mark — Thu, 24 Jun 2010 22:54:20 +0000

MyTOP is a console-based (non-gui) tool for monitoring the threads and overall performance of a MySQL.

UPDATE – I just fond Jeremy did update MyTOP in 2009 and released it on GitHub. He fixed the 64x and 5.x bugs. He also incremented the version number to 1.7. So, I’m bumping my number to 1.8.

Jeremy D. Zawodny <Jeremy@Zawodny.com> wrote the original in 2000 and has continued to update it until 2007. The 1.6 version works on MySQL up to version 4.x.

For weeks now and I’m been working on bringing it up to date. When I started using version 1.6 it worked but didn’t return some data fields. After fixing these bugs I began to ideas for improvements. Here is a quick list of what I have done.

Added updates for MySQL 5.x.
Added ‘S’ (slow) highlighting.
Added ‘C’ to turn on and off Color.
Added ‘l’ command to change color for long running queries.
Fixed a few documentation issues.
Added monitoring for Slave status.
Added color to Queue hit ratio.
Added number of rows sorted per second.
Added a new column to display State of the query. (Sorting, Locked, Updating)
Added ‘t’ to filter based on State.

How can you use it? I was having a problem with a production system. It has locks creating long delays in MyISAM tables. (Not new.) I used MyTOP version 1.7 (my release) with the ‘t’ command to filter the query state for ‘Locked”. I also use the ‘i” command to reverse the sort order and filter for active connections. I found a “LOCK” run by root on the local host. I then use ‘f’ to display the full query request. I copied it and killed it with the ‘k’ command. Problem solved. You can’t do this with MySQL Administrator, MySQL Workbench or Toad for MySQL.

The color display makes monitoring what is happening easy.

I’m making my updates to MyTOP availible at www.MySqlFanBoy.com/mytop.

Enjoy.

New AutoMySQLBackup Script

mark — Mon, 19 Apr 2010 16:31:31 +0000

MySQL Backup Script has been around for a long time. I have used it on and off for years but now I’ve needed to make some improvements. This script is based on VER. 2.6 – http://sourceforge.net/projects/automysqlbackup/ Copyright (c) 2002-2003 wipe_out@lycos.co.uk.
I have added my own Copyright (c) 2010 mark@grennan.com – http://www.mysqlfanboy.com/Files/automysqlbackup.sh. But as the code says: This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

My improvements include:

# VER 2.6 Beta 5 – MTG – (2010-04-18)
#    Added option to archive (rsync) the local backup files to a remote locations
#     using the COPYDIR variable.
#    Added option to copy files into a directory based on the host name using the
#     variable HOSTNAME. This allows the script to be run from a shared storage directory
#     ( SBM, NFS, NetApp) the data to be kept separate.
#    Added option to additionally backup all database schema only using variable FULLSCHEMA.
#    Added option to backup MySQL configuration file, my.cnf and remove files older then seven
#    days from the BACKUPDIR directory.
#    Added –master-data=2 and –single-transaction to include a comment with the master server’s
#    the binary log coordinates. If used the CHANGE_MASTER_TO line must be uncommitted.

Download it here: http://www.mysqlfanboy.com/Files/automysqlbackup.sh

Developer Tips using MySQL

mark — Mon, 19 Apr 2010 16:17:02 +0000

I get ask, by application developers, “how do you optimize MySQL”. I do lots of things that don’t really relate to a developer. I analyze the percent of queries are being pulled from cache for instance. What a developer can do to optimize the SQL they develop is a different questions. So here is a quick list of things applications developers should know about MySQL.

Explain will analyze your query.

This example shows the possible indexes (keys) that could be used and the index that was selected. 2,262 rows where selected and then sorted (Using file sorts) and one record was returned (limit 1).

mysql> explain SELECT 5/9*(temp_F-32) as t, 5/9*(dewpt_F-32) as td, speed_mps as spd, dir
 > where stn='KLDM' and date_time<'2010-02-12 18:15' and date_time>'2010-02-12 17:45'
 > order by ABS( date_time - CAST('2010-02-12 18:00:00' as datetime) ) limit 1;

+----+-------------+----------+------+-----------------------+------+---------+-------+------+-----------------------------+
| id | select_type | table    | type | possible_keys         | key  | key_len | ref   | rows | Extra                       |
+----+-------------+----------+------+-----------------------+------+---------+-------+------+-----------------------------+
|  1 | SIMPLE      | metar_nc | ref  | PRIMARY,stn,date_time | stn  | 8       | const | 2262 | Using where; Using filesort |
+----+-------------+----------+------+-----------------------+------+---------+-------+------+-----------------------------+

Using profiling can give you even more information; Don’t forget to turn it off with a ‘set profiling=0’ when you are done.

mysql> set profiling=1;
+--------------------+----------+
| Status             | Duration |
+--------------------+----------+
| starting           | 0.000110 |
| Opening tables     | 0.000014 |
| query end          | 0.000004 |
| freeing items      | 0.000008 |
| logging slow query | 0.000002 |
| cleaning up        | 0.000003 |
+--------------------+----------+
6 rows in set (0.01 sec)
mysql> set profiling=0;

Indexing Basics

Avoiding disk reads is the name of the game. Indexes are presorted and small. Two or three disk reads of an index can point to a large amount of data.

MySQL Can only use Prefixes of the index
mysql> SELECT AVG(age) FROM user GROUP BY city;
This is a complex query that needs to scan all the rows. You can make it traverse shorter indexes by adding an index on (city,age)
Index (A,B) can be used for
WHERE A=5 , WHERE A=6 AND B=5 , WHERE A=7 AND B>5
It can’t be used for – WHERE B=6 AND B<2
Only Equality/List allows second key part usage
WHERE A=5 AND B>6 – will use 2 key parts
IN (1,2) AND B=2 – will use 2 key parts
A>5 and B=2 will use 1 key part only
The B=2 will be checked while reading row/index only
A=5 ORDER BY B – will use the index
A>5 ORDER BY B – will NOT use the index
For simple cross reference look ups, add the data to the index to skip the data read.
SELECT name FROM login=”Jack123”;

If this is a very common part of your code, make the index (login,name). When the index is read the data is in memory. Don’t add every column you just double the disk space and magnify the disk access.

More Tips

Check that all tables have PRIMARY KEYs on columns with high cardinality. Primary keys must be unique.
A column like, `gender` low cardinality (selectivity), an id column (Int – auto increment) is a good candidate to become a primary key.
All joins (inter, outer, ‘,’) should have indexes.
Fields you often search on (appear frequently in WHERE, ORDER BY or GROUP BY clauses) need indexes.
But don’t add too many: the worst thing you can do is to add an index on every column of a table.
Don’t use DISTINCT when you have or could use GROUP BY
Open to the server just before you are going to use it. Unless you are using a persistent connection library, don’t open a database connections and then run minutes of calculations before making your query. You may find your connections has been “gone away” before you make your query.
When your index many columns, create a hash column. Then your query will look like:
SELECT *
FROM table
WHERE hash_column = MD5( CONCAT(col1, col2) )
AND col1=’aaa’ AND col2=’bbb’;
Use less RAM by declaring columns only as large as they need to be to hold the values stored in them.
Use CHAR type when possible (instead of VARCHAR, BLOB or TEXT) — when values of a column have constant length: MD5-hash (32 symbols) or ICAO or IATA airport code (4 and 3 symbols). This is also true for indexes. If only the last 4 symbols are unique index only that part.
Use SQL_NO_CACHE when you are SELECT-ing frequently updated data or large sets of data. This way you will not kick good data out of the cache.
Avoid wildcards at the start of LIKE queries. (LIKE ‘%find%’). Finding ‘1234find’ in 10000 records requires up to 40,000 searches.
Normalizing redundant data is good but don’t split a table because you have too many columns.
Think of storing users sessions data (or any non-critical / high access data) in MEMORY table — it’s very fast.
Divide complex queries into several simpler ones — they have more chances to be cached, so will be quicker.
A column must be declared as NOT NULL if it really is. This speeds up table traversing.
If you usually retrieve rows in the same order like expr1, expr2, …, make ALTER TABLE … ORDER BY expr1, expr2, … to optimize the table.
Don’t use PHP loop to fetch rows from database one by one just because you can — use IN instead, e.g.
SELECT *
FROM `table`
WHERE `id` IN (1,7,13,42);
Reuse your database connections. Opening a new connection to the database will add one or more seconds to your query.
In PHP use mysql_pconnect() to open a persistent connection with mod_php. Perl provides persistent connections with Apache::DBI with mod_perl. Python does not have persistent connections in mod_python. But you can maintain them in your application. (http://www.modpython.org/FAQ/faqw.py)
When inserting data, insert only those values that differs from the default. This reduces the query parsing time.
Use INSERT DELAYED or INSERT LOW_PRIORITY (for MyISAM) to write to your change log table.
Also, if it’s MyISAM, you can add DELAY_KEY_WRITE=1 option — this makes index updates faster because they are not flushed to disk until the table is closed.
For your web application, images and other binary assets should normally be stored as files.
That is, store only a reference to the file rather than the file itself in the database.

Compiling Drizzle on CentOS 5.4

mark — Tue, 06 Apr 2010 21:36:00 +0000

I was able to compile Drizzle on CentOS today thanks to Neil Armitage post on his website.

Clean install centos 5.4 with Development Tools and Development Libraries

yum groupinstall “Development Tools”
yum groupinstall “Development Libraries”

Setup the drizzle user account and allow it to sudo

/usr/sbin/visudo
uncomment %wheel ALL=(ALL) NOPASSWD: ALL
useradd drizzle
gpasswd -a drizzle wheel

Install Required Dependencies

yum install autoconf autoconf.noarch bison-devel.x86_64 \
bison.x86_64 bzr cpp.x86_64 e2fsprogs-devel.i386 e2fsprogs-devel.x86_64 \
gcc gcc-c++.x86_64 gcc.x86_ glib2-devel glibc-devel.x86_64 glibc.x86_64 \
gperf libevent-devel.x86_64 libstdc++.i386 libtool.x86_64 ncurses-devel.i386 \
ncurses-devel.x86_64 ncurses.x86_64 pcre-devel.x86_64 pcre.i386 pcre.x86_64 \
readline-devel.x86_64 readline.x86_64 zlib-devel.x86_64

Install Protobufs

wget http://protobuf.googlecode.com/files/protobuf-2.3.0.tar.gz
tar -xvf protobuf-2.3.0.tar.gz
cd protobuf-2.3.0
./configure
make
make install

Install bzr

wget http://launchpad.net/bzr/2.1/2.1.0b4/+download/bzr-2.1.0b4.tar.gz
tar -xvf bzr-2.1.0b4.tar.gz
cd bzr-2.1.0b4
python setup.py install

Make the Local bzr Repo

su – drizzle

mkdir ~/bzrwork
bzr init-repo –2a ~/bzrwork
cd ~/bzrwork

Build libdrizzle

bzr branch lp:libdrizzle
cd libdrizzle
./config/autorun.sh
./configure
make
sudo make install

Build Drizzle

cd ~/bzrwork
bzr branch lp:drizzle
cd drizzle
./config/autorun.sh
./configure
make
sudo make install

Run the tests

cd tests
./test-run

Configure and Start Drizzle

sudo mkdir /usr/local/var
sudo chown drizzle.drizzle /usr/local/var
cd /usr/local
/usr/local/sbin/drizzled –no-defaults –drizzle-protocol-port=9306 \
–basedir=$PWD –datadir=$PWD/var >> $PWD/var/drizzle.err 2>&1 &

Connect to drizzle

drizzle

The OpenArk Kit

mark — Thu, 25 Mar 2010 20:07:42 +0000

Shlomi Noach is DBA, authorized MySQL instructor software developer and Winner of the MySQL Community Member of the Year, 2009.

He has published his own collections of MySQL scripts and you might find them useful. Shlomi calls them the “openark kit“.

The available tools are:

oak-apply-ri: apply referential integrity on two columns with parent-child relationship.
oak-block-account: block or release MySQL users accounts, disabling them or enabling them to login.
oak-chunk-update: Perform long, non-blocking UPDATE/DELETE operation in auto managed small chunks.
oak-kill-slow-queries: terminate long running queries.
oak-modify-charset: change the character set (and collation) of a textual column.
oak-online-alter-table: Perform a non-blocking ALTER TABLE operation.
oak-purge-master-logs: purge master logs, depending on the state of replicating slaves.
oak-security-audit: audit accounts, passwords, privileges and other security settings.
oak-show-limits: show AUTO_INCREMENT “free space”.
oak-show-replication-status: show how far behind are replicating slaves on a given master.

All tools are written in Python, and require Python 2.3 or newer, and the python-mysqldb driver. Some tools require MySQL 5.0 or higher; see the docs for each tool.

The openark kit is released under the BSD license.

Thanks Shlomi.

Enjoy!

Loading Bulk CSV Tables

mark — Tue, 23 Mar 2010 18:47:03 +0000

In my job I use many data tables that are transient. New weather data is received all the time and old data is purged. Most of these table are received as CSV files. The data is then loaded into MySQL tables and indexed to be used with geographic queries.

Most of these tables never see an insert or update. It would be nice if you could build make these CVS tables read only and build byte pointer indexes for each row. (Maybe some day I’ll code this into MySQL.)

Most people load large data tables at night time with the LOCK & LOAD method. It goes like LOCK TABLE…; LOAD DATA INFILE…; UNLOCK TABLE. In other words, nobody will read data or generate reports during while this is running.

With the script I developed I have been able to load 33,000,000 records from a CSV file into a MySQL table, with indexes, in 22m 36.282s minutes without creating long LOCK times effecting the users.

Here is what I’m doing. This is a proof of concept script written in BASH.

In the ‘test’ database there are two tables.

forecast = MyISAM table with index
NEWforecast = CSV table

#!/bin/bash
echo "Truncate forecast file"
mysql test -Bse "truncate table forecast;"
count=`mysql test -Bse "select TABLE_ROWS from information_schema.tables where table_name = 'forecast';"`
echo "Count is now $count"

echo "Check Slave is truncated"
count=`mysql -h slave_ip -u dbaops –pP@ssw0rd -Bse "select TABLE_ROWS from information_schema.tables where table_name = 'forecast';"`
echo "Count is now $count"

# The size of the split file determines the time the MyISAM table will be locked.
    echo "splitting NEWforecast.CVS file into 100,000 records"
    split -l 100000 NEWforecast.CSV data_
# Time the for loop
time for x in data_*
do
# copy new data to MySQL CSV file
    echo "cp /home/dbaops/$x /data/mysql/test/NEWforecast.CSV"
    cp /home/dbaops/$x /data/mysql/test/NEWforecast.CSV
# copy same data to the SLAVE server
    scp /home/dbaops/$x 'dbaops:P@ssw0rd@slave_ip:/data/mysql/test/NEWforecast.CSV'
# Flush tables to load new data
    mysql test -Bse "flush tables;"
    mysql -h slave_ip -u dbaops –pP@ssw0rd -Bse "flush tables;"
# Insert from CVS to MyISAM with index – This command get replicated.
    mysql test -Bse "concurrent insert ignore into forecast select * from NEWforecast;"
    count=`mysql test -Bse "select TABLE_ROWS from information_schema.tables where table_name = 'forecast';"`
    echo "Count for this load is $count"
done

rm data_*

sleep 5
count=`mysql -h slave_ip  -u dbaops –pP@ssw0rd test -Bse "select TABLE_ROWS from information_schema.tables where table_name = 'forecast';"`

echo "Count on SLAVE is now $count"

I use the CONCURRENT keyword to enable inserts to happen concurrently, and if needed use “SET GLOBAL concurrent_insert=2“.

Deleting the old records can be a trick too. In the above example I just empty the table using the ‘TRUNCATE TABLE” command. Having no data for the application to query may return strange results to the user.

Bulk deletes can also lock the table for a long amount of time. A stored procedure can be used to loop through the data and remove all record in batches until there are none left.

DROP PROCEDURE IF EXISTS forecast.delete_incrementally;
CREATE PROCEDURE `forecast.delete_incrementally`()
MODIFIES SQL DATA
BEGIN
  REPEAT
    DELETE FROM test.forecast
    WHERE valid_time_utc <= SUBDATE(NOW(), INTERVAL 1 DAY)
    LIMIT 10000;
  UNTIL ROW_COUNT() = 0 END REPEAT;
END;

For InnoDB:

It is a big advantage if the data in the CSV files is already ordered by primary key. (Because the InnoDB primary key is a clustered index, so it will organize the table physically to be in primary key order anyway.)

For the bulk insert, you should consider turning off foreign key checking and unique index checking.

UNIQUE_CHECKS=0;
FOREIGN_KEY_CHECKS=0

Using InnoDB plugin, you can speed things up by inserting data into a table without indexes (only define primary key, of course), and then create the indexes separately with alter table. (on an existing table you can also consider dropping existing indexes, the benefit of this would depend case by case).

CSV Files

http://www.shinguz.ch/MySQL/CSV_tables.pdf

http://blogs.sun.com/carriergrademysql/entry/tips_for_bulk_loading

http://www.mysqlperformanceblog.com/2008/07/03/how-to-load-large-files-safely-into-innodb-with-load-data-infile/

InnoDB

http://www.innodb.com/doc/innodb_plugin-1.0/innodb-create-index.html

http://www.mysqlperformanceblog.com/2008/04/23/testing-innodb-barracuda-format-with-compression/