MySQL GIS – Part 2

August 30, 2010

The “DATA”

“The wonderful thing about standards is that there are so many of them to choose from.”Grace Hopper

A Shape file is the most common format for GIS vector data and just about every GIS program can use them.  Unfortunately not all GIS data come in a shape file format. An E00 (E-zero-zero) file is the file format used by Environment Systems Research Institute’s (ESRI). ESRI is the Photoshop of the GIS workplace. E00 files are used by ArcInfo and ArcGIS Explorer.  These are the two most common file formats. Spatial Data Transfer Standard (SDTS) is used by the US government to transfer data between dissimilar computer systems and GPS (.gpx) data used to transfer data to and from a GPS device.

Here is a list of the vector and raster file formats I have come across.

  • .shp/.shx/.dbf Shapefile – vector data (All three files make up a set.)
  • .E00 Proprietary ESRI files.
  • .klm Keyhole markup language used by Google Earth (GE)
  • .gpx/ .gps a common GPS data format for software applications
  • .gml Geographic markup language
  • .pdb PocketAPRS vector map
  • .map APRSdos/WinAPRS/MacAPRS vector map
  • .gnis GNIS labels file (actually points instead of vectors)
  • .tif/.fgd geo TIFF raster image maps

Geospatial Data Abstraction Library (GDAL) is the work horse utility convert between these formats..  It is really a set of utilities and is the guts of  the MapServer. The program ogr2ogr is included in GDAL. This program is key to converting and getting data into MySQL.  The OGR toolkit is a sub kit of the FW Tools Toolkit. It has several command line tools. The two most useful are:

  • ogrinfo – inspects a GIS data source and spits out summary data or detailed information about the layers, kinds of geometries found in the file.
  • ogr2ogr – this is a command line tool that converts data. Ogr supports many formats: Shapefile, MapInfo Tab file, TIGER, s57, DGN, CSV, DBF, GML, KML, Interlis, SQLite, ODBC, ESRI GeoDatabase (MDB format), PostGIS/PostgreSQL, MySQL

Like the unix ‘file’ command, these utilities can recognize the type of data you feed them. One odd thing about ogr2ogr is, its command options are in output file then the input file order.

Here are a couple of useful examples from a cheat sheet on these utilities.

# Again this is based on Redhat 5.5
 yum install gdal

 ogrinfo ../data/Oklahoma/oklahoma_poi.shp
 INFO: Open of `../data/Oklahoma/oklahoma_poi.shp'
 using driver `ESRI Shapefile' successful.
 1: oklahoma_poi (Point)

 wget http://mappinghacks.com/data/TM_WORLD_BORDERS-0.2.zip
 mkdir boundaries_shp
 unzip TM_WORLD_BORDERS-0.2.zip -d boundaries_shp

 cd boundaries_shp
 ogr2ogr -f "MySQL" MySQL:"geo,user=root,host=localhost,password=" -nln world_borders -lco engine=MYISAM TM_WORLD_BORDERS-0.2.shp

 mysql geo -e 'select iso3,astext(SHAPE) from world_borders where iso2="VA";'
 +------+--------------------------------------------------------------------+
 | iso3 | astext(SHAPE)                                                      |
 +------+--------------------------------------------------------------------+
 | VAT  | POLYGON((12.445 41.903,12.451 41.907,12.456 41.901,12.445 41.903)) |
 +------+--------------------------------------------------------------------+

In part one I said GIS / Mapping Systems work with both text data and graphical data. There are even more raster graphical file formats. GDAL has a library for raster GEO data types with a very large list of file formats it supports.  Besides converting format types, GDAL can also warp, stitch and re-project images to fit a give projection.  These blog posts are about MySQL so I will leave the conversion of graphical data to someone else.

In a coming posts I’ll go over:

  • What data is available and where can you find it?
  • More examples on what you can do with GIS data and MySQL.
  • Viewing our GIS data.
  • How to collect your own GIS data.
  • Good and bad examples of searching GIS data.
  • Optimizing MySQL GIS.  Is it really worth using?

Mark Grennan

MySQL GIS – Part 1

August 23, 2010

In my business (weather) we use lots map based (Geo) information.  Almost every table has latitude and longitude. Working with this kind of data can be exciting and frustrating.  This should give you a quick start into GIS with MySQL.

“A geographic information system (GIS), or geographical information system, is any system that captures, stores, analyzes, manages, and presents data that are linked to location. In the simplest terms, GIS is the merging of cartography, statistical analysis, and database technology. GIS systems are used in cartography, remote sensing, land surveying, utility management, natural resource management, photogrammetry, geography, urban planning, emergency management, navigation, and localized search engines.”Wikipedia

GIS / Mapping Systems work with both text data and graphical data.  Applications and utilities often blur the lines between the two types and make understanding difficult.  Map servers blend raster images, with point or polygon data, and bitmap images to make complete images to display in the user’s client application.  For this post I will concentrate on the text type “data”.  The type we can index in a MySQL database.

THE SEARCH

After months of reading, [1]  I’m writing this post to describes what I have learned about how to get started using GEO coding data as quickly as possible.  I found very little piratical information on GIS and MySQL.  The MySQL manual covers the functions but doesn’t supply much practical information on GEO.  Anders Karlsson wrote a nice and short story about GIS that give me a good start.

The best information has be written by Florin Duroiu in his post titled “Political boundaries overlay in Google maps”. A good part of my post is based on his work.

STEP BY STEP

Below are the detailed needed to to produce a MySQL database with the Points of Interest (POI).  This is based on CentOS 5.5 with MySQL 5.1.

yum install gdal
mkdir geo
mkdir data
mkdir data/Oklahoma
cd geo/data/Oklahoma
wget http://downloads.cloudmade.com/north_america/united_states/oklahoma/oklahoma.shapefiles.zip
unzip oklahoma.shapefiles.zip
mysql -e 'create database geo'
ogr2ogr -f "MySQL" MySQL:"geo,user=root,host=localhost,password=" -nln oklahoma_poi -lco engine=MYISAM oklahoma_poi.shp
mysql geo -e 'desc oklahoma_poi'

+----------+--------------+------+-----+---------+----------------+
| Field    | Type         | Null | Key | Default | Extra          |
+----------+--------------+------+-----+---------+----------------+
| OGR_FID  | int(11)      | NO   | PRI | NULL    | auto_increment |
| SHAPE    | geometry     | NO   | MUL | NULL    |                |
| category | varchar(30)  | YES  |     | NULL    |                |
| name     | varchar(113) | YES  |     | NULL    |                |
+----------+--------------+------+-----+---------+----------------+

mysql geo -e "select name, category, Y(SHAPE) as lat, X(SHAPE) as lng  from oklahoma_poi where NAME like 'School:Putnam%'"

+-------------------------------------+--------------------------------+------------+-------------+
| name                                | category                       | lat        | lng         |
+-------------------------------------+--------------------------------+------------+-------------+
| School:Putnam City West High School | Government and Public Services |  35.492557 | -97.6605975 |
| School:Putnam City North School     | Government and Public Services | 35.5892209 | -97.6372648 |
| School:Putnam City School           | Government and Public Services | 35.5122794 | -97.6142079 |
| School:Putnam High School           | Government and Public Services | 35.5214459 | -97.6086523 |
| School:Putnam Heights Academy       | Government and Public Services | 35.5081143 | -97.5397619 |
+-------------------------------------+--------------------------------+------------+-------------+

In a coming set of post I’ll go over:

  • The “Data” types your will find and how to convert between them.
  • What data is available and where can you find it?
  • More examples on what you can do with GIS data.
  • Viewing our GIS data.
  • How to collect your own GIS data.
  • Good and bad examples of searching GIS data.
  • Optimizing MySQL GIS.  Is it really worth using?

[1] Books: GIS for DummiesAuthor: Michael N. DeMers – John Wiley & Sons (2009) – ISBN: 0470236825
Open Source GIS: A GRASS GIS Approach. Third Edition.
Author: Markus Neteler and Helena Mitasova – ISBN: 978-0-38735767-6
Web Mapping Illustrated: Using Open Source GIS Toolkits
Author: Tyler Mitchell – ISBN: 9780596008659



Does Size or Type Matter?

July 27, 2010

MySQL seems to be happy to convert types for you. Developers are rushed to complete their project and if the function works they just move on. But what is the costs of mixing your types? Does it matter if your are running across a million rows or more? Lets find out.

Here is what the programmers see.

mysql> select 1+1;
+-----+
| 1+1 |
+-----+
|   2 |
+-----+
1 row in set (0.00 sec)

mysql> select "1"+"1";
+---------+
| "1"+"1" |
+---------+
|       2 |
+---------+
1 row in set (0.00 sec)

Benchmark

What if we do a thousand simple loops?  How long does the looping itself take?

The BENCHMARK() function executes the expression expr repeatedly count times. It may be used to time how quickly MySQL processes the expression. The result value is always 0.


mysql> select benchmark(1000000000, 1);
+--------------------------+
| benchmark(1000000000, 1) |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (5.42 sec)

mysql> select benchmark(1000000000, "1" );
+-----------------------------+
| benchmark(1000000000, "1" ) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (5.40 sec)

So maybe type doesn’t matter? About five seconds just to loop but the type didn’t change it.   What if we add 1+”1″?

Continue Reading »

MyTOP 1.9 released

July 23, 2010

MyTOP is a console-based (non-gui) tool for monitoring the threads and overall performance of a MySQL created by Jeremy Zawodny.

For months now I have been adding updates to mytop. When I started using version 1.6 it worked but didn’t return some data fields. After fixing these bugs I began to ideas for improvements. Here is a quick list of what I have done.

  • New ‘!’ command to force fix replication errors.
  • Display rows sorted per second.
  • Added new ‘Cmd’ column to display the state of the query along with the statement.
  • New ‘M’ command to change the [Mode] to ‘status’.  This was in the code but covered up by another command.

If you find any bugs or would like to see your own updates added please email me at mark @ grennan.com.

Enjoy!

MySQL HA with DRDB and Heartbeat on CentOS 5.5

July 20, 2010

This is one of a few MySQL High Availability strategies.  I have used this for years and found it work great.  If you don’t know about DRBD and MySQL you should read Peter’s comments.

These are step by step instructions for Redhat 5 or CentOS.

If you need more details please refer to:
http://www.drbd.org/users-guide/

Configuring MySQL for DRBD
http://dev.mysql.com/doc/refman/5.1/en/ha-drbd-install-mysql.html

Getting started:

The OS in this example is CentOS 5.5.  I added a new disk (/dev/sde) to the four disk RAID-5 and RAID-1 I was already using.   I’m only creating an 8 gig disk (vmware). You should start with a partition (LVM and or RAID) partition big enough for your data.

# uname -a
Linux db1.grennan.com 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:04:48 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md1              24065660   2826564  19996896  13% /
/dev/md0                101018     20988     74814  22% /boot
tmpfs                   513476         0    513476   0% /dev/shm

# fdisk -l /dev/sde

Disk /dev/sde: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1        1044     8385898+  83  Linux

Continue Reading »

Federated Tables

July 7, 2010

Your searching for how to create a join across two databases on two different servers and it can’t be done directly.   select  d1.a, d2.b from db1@server1 join db2@server2 where db1.c = db2.c; does not work.

You learn about federated databases.  The federated storage engine allows accesses data in tables of remote databases.  Now how do you make it work?

1) Check if the federated storage engine is supported.  Federation is OFF by default!

mysql> show engines;
+------------+---------+----------------------------------------------------------------+
| Engine     | Support | Comment                                                        |
+------------+---------+----------------------------------------------------------------+
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys     |
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance         |
| BLACKHOLE  | YES     | /dev/null storage engine (anything you write to it disappears) |
| CSV        | YES     | CSV storage engine                                             |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables      |
| FEDERATED  | YES     | Federated MySQL storage engine                                 |
| ARCHIVE    | YES     | Archive storage engine                                         |
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                          |
+------------+---------+----------------------------------------------------------------+

If it is not “Support”ed (on) you need to add ‘federated=ON‘ to the [mysqld] section of your /etc/my.cnf file.  I found this section to be a bit troublesome.  It must be ‘=ON’ not ‘=YES” or even ‘=on’.   Most options allow these but the federated options is picky.  I’m running MySQL Enterprise 5.1.37.sp1.

2) If you don’t already have the database created, create the database on the storage server.  By ‘storage server’ I mean the one where the data will be written to disk.

I like to create a user just for the purpose of connection the federated copy of the database to the true database.  This way, if the password gets changed or the user deleted, the federated system can continue to connect.

mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
 Name VARCHAR(20),
 case TINYINT(3),
) ENGINE = INNODB;

3) Now you can create the federated version of your data on the remote system.

mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
 Name VARCHAR(20),
 case TINYINT(3),
) ENGINE = FEDERATED
CONNECTION = 'mysql://skiner:c0nsper@fbi/xfiles/cases';

4) Check your work. The table status should show Engine: FEDERATED.

mysql> use xfiles;
mysql> show table status\G

Now you can add records to the table and the data should show up in select on either server.

Enjoy.

Slapping MySQL-Proxy

June 28, 2010

I have old applications that need to read (and write) MyISAM tables that themselves receive lots of bulk updates. Time to try MySQL-Proxy.

MySQL Proxy is light on documentation and very few people written anything about working.   Most of what I have read says MySQL-Proxy is not ready for prim time.  I have hope so I had to give it a try.

I started with thee VMware servers. I setup one master and two read only slaves.   I tested the replication with mysqlslap from another independent server and it worked fine.  The slave never ran more then a second behind.

I downloaded mysql-proxy-0.8.0.tar.gz (64x version).  I create a mysql-proxy.cnf file to configure the proxy.

[mysql-proxy]
basedir                         = /home/dbaops/mysql-proxy-0.8.0-linux-rhel5-x86-64bit
keepalive                       = false
deamon                          = false
proxy-backend-addresses         = 192.168.101.91:3306
proxy-read-only-backend-addresses = 192.168.101.92:3306,192.168.101.94:3306
proxy-lua-script                = rw-splitting.lua

I ran this with the command; ‘ bin/mysql-proxy –defaults-file=mysql-proxy.cnf ‘.

I started with few commands run by hand.  They worked.  I could write and read data fine.  Then I setup a heartbeat to create continuing updates.  This also ran file.

Now for the real test.  I ran mysqlslap though the proxy. Note in this command the port is set to 4040.  This FAILED,  in many bad ways.

mysqlslap --concurrency=25 --iterations=2 --auto-generate-sql --engine=myisam \
  --auto-generate-sql-load-type=mixed --number-of-queries=10000 \
  --host='db1' --port=4040 --user=root --password='P@ss0rd'

I turned on DEBUG in the rw-splitting.lua script by changing ‘is_debug = false’ to true.   This worked grate with hand rolled quires.  When I ran mysqlslap, it a segmentation falt-ed.

With DEBUG turned off I received all kinds of data errors. Everything from missing tables, no database selected and database disconnects.

So the stories are true. MySQL Proxy is not ready for prime time, unless you are creating a Three Stooges act.

MyTOP Upated

June 24, 2010

MyTOP is a console-based (non-gui) tool for monitoring the threads and overall performance of a MySQL.

UPDATE – I just fond Jeremy did update MyTOP in 2009 and released it on GitHub.  He fixed the 64x and 5.x bugs. He also incremented the version number to 1.7.  So, I’m bumping my number to 1.8.

Jeremy D. Zawodny <Jeremy@Zawodny.com> wrote the original in 2000 and has continued to update it until 2007. The 1.6 version works on MySQL up to version 4.x.

For weeks now and I’m been working on bringing it up to date.  When I started using version 1.6 it worked but didn’t return some data fields.  After fixing these bugs I began to ideas for  improvements.  Here is a quick list of what I have done.

  • Added updates for MySQL 5.x.
  • Added ‘S’ (slow) highlighting.
  • Added ‘C’ to turn on and off Color.
  • Added ‘l’ command to change color for long running queries.
  • Fixed a few documentation issues.
  • Added monitoring for  Slave status.
  • Added color to Queue hit ratio.
  • Added number of rows sorted per second.
  • Added a new column to display State of the query. (Sorting, Locked, Updating)
  • Added ‘t’ to filter based on State.

How can you use it?  I was having a problem with a production system.  It has locks creating long delays in MyISAM tables. (Not new.) I used MyTOP version 1.7 (my release) with the ‘t’ command to filter the query state for ‘Locked”. I also use the ‘i” command to reverse the sort order and filter for active connections. I found a “LOCK” run by root on the local host.  I then use ‘f’ to display the full query request.  I copied it and killed it with the ‘k’ command. Problem solved. You can’t do this with MySQL Administrator, MySQL Workbench or Toad for MySQL.

The color display makes monitoring what is happening easy.

I’m making my updates to MyTOP availible at  www.MySqlFanBoy.com/mytop.

Enjoy.

MySQL Network Connections

May 7, 2010

If your MySQL server has hundreds of clients (applications) and tens of thousands of queries per second,  MySQL default network settings may NOT be for you.  Network performance is not often a significant factor in the performance of MySQL.  That said, there are things to consider.

If you are building new applications make these changes now.  Developer expectations are hard to change.  My example below will break your application if  developers open a database connections and then spend ten minutes playing with their play doe before making a query.  If you have working applications make changes slowly.

Middleware provides database connections pooling. This allows database connections to be used by multiple applications. Any program that runs as a deamon or service can use database pooling using libdbi. Applications written in PHP or CGI code don’t pool DB connections. They open and closes the connection each time a user hit a page.

The Apache module mod_dbd can manages database connections. On non-threaded platforms, it provides a persistent connection. On threaded platform, it provides an altogether more scalable and efficient connection pool, as described in this article at ApacheTutor. Note that mod_dbd supersedes the modules presented in that article.

PHP provides mysql_pconnect. It first tries to find a (persistent) link that’s already open with the same host, username and password (NOT PORT). If one is found, an identifier for it will be returned instead of opening a new connection.  Second, the connection to the SQL server will not be closed when the execution of the script ends. Instead, the link will remain open for future use (mysql_close() will not close links established by mysql_pconnect()).

Back to MySQL.

Max Connections has a default of 100 (<= 5.1.14) to 151 (>= 5.1.15).  If you exceed this number you will receive a “Too many connections error”.  This is often the first setting people find they have to changed as their usage grows.

Max User Connections controls the maximum number of simultaneous connections allowed per MySQL user account. By default this is set to unlimited (zero).  The default is fine. Your system will not run out of connections or memory if you have set Max Connections correctly.  Except a bad applications can “steel” all the connections from all the others.  Hopefully you have assigned a different user for each application.  Set this to the maximum number of connections any given user (application) will make.

Wait timeout is the number of seconds the server waits for activity on a non-interactive connection before closing it. The default is eight hours. (28800 seconds)  This is a crazy amount of time.  How long would you stay on the phone if no one was speaking?  Fifteen seconds is a good place to start. If it gives you problems increase it by fifteen until your good. Applications that open a connections and then process their data for minutes before making a request will be effected by a short time setting.

Interactive timeout is the number of seconds the server waits for activity on an interactive connection before closing it. This is only used by applications using the  mysql_real_connect() function of the MySQL API. I set this to 600. My idea is to give more sophisticated users more grace.  (Think connection pools have to reconnect.)

Connect timeout is the number of seconds that the MySQL server waits for a connect packet before responding with Bad handshake. The default is ten seconds.  Five would be better.

Max connections errors is a strange setting. IO stops after this number of bad connections is made from a host. You can unblock blocked hosts with the FLUSH HOSTS statement.  The default for this is ten (10). On a “good” network this shouldn’t have to be changed.  Because “the dabase should never stop” most DBAs make this a very large number. If you have a over loaded network you should put your database on a network of its own.  Or, you can try increasing this value.

max_allowed_packet controls the size of your BLOBs.  If you are storing pictures in your database, you may need to set this.  The default is one megabyte (1M). The protocol limit for max_allowed_packet is 1GB.  Set it to the biggest BLOB you need to store. The value should be a multiple of 1024 bytes.

Here are my my.cnf settings for a server with over 200 PHP and Perl applications and peeking at 4,000 connections per second.

# Network management
port                    = 3306
max_connections         = 5000  # number of simultaneous client connections allowed
max_user_connections    = 200   # number of connections a user can make, 0 = Unlimited
wait_timeout            = 15    # seconds to waits for activity on non interactive connection
interactive_timeout     = 600   # seconds to waits for activity on interactive connection
connect_timeout         = 5     # on connect seconds to waits for a connect packet
max_connect_errors      = 1000000 # IO stops after this number of bad connections one good connect reset
max_allowed_packet      = 10M   # How big is your BLOB?

Mark Grennan

Install MariaDB on RedHat 5.4

May 4, 2010

Is MariaDB really a drop in replacement for MySQL?   I’m running CentOS 5.4.  What happens if…

  1. Use “mysqldump –all-databases > FullBackup.sql” to make a full backup. (Better safe then sorry)
  2. Go to http://askmonty.org/wiki/MariaDB:Download and download the CentOS 5 packages.
  3. Stop msql “service mysqld stop”
  4. Pull MySQL out by the roots with “rpm -e mysql-server mysql –nodeps”
  5. Install Maria with “rpm -i Maria-*”

And the install start mysql up again.  Wow.  That’s “Drop in”.

If your reading this, you’re seeing it works.

Next day. - When you remove the older MySQL, RPM renames /etc/my.cnf to my.cnf.rpmsave.  Thus, when MySQL starts it is running with default settings.  Also, MariaDB changes the startup script to ‘mysql’  from the older mysqld.   When I put my.cnf back in place and restarted MySQL it didn’t run. I found the error in /var/log/mysqld.log.  I needed to remove the options ‘skip-bdb’.  I restarted MySQL again and is working.

  1. mv /etc/my.cnf.rpmsave /etc/my.cnf
  2. vi /etc/my.cnf  (removed the option skip-bdb)
  3. service mysql restart

Your options may very.

The log also tells me about deprecated options I was using.  So I cleaned them up.

UPDATE – 5/25/2010, 5/26/2010

I have run into one upset.  Although AskMonty.com says “Percona XtraDB 1.0.6-9 is included in MariaDB 5.1.42 and 5.1.44.”,  the copy I downloaded did not include it gave me some trouble.  This broke my LinuxFanBoy.com website badly.  XtraDB is a drop in replacement of InnoDB (same table formats, no need to convert any data).

I have learned that a bad my.cnf setting (innodb_log_file_size = 256M) make the XtraDB  (InnoDB) drop and not even report in the SHOW ENGINES command.

UPDATE 6/2/2010

I have reported this as a bug. #588849 I did find stopping the server, deleting all the ib_logfile* files and restarting the server fixed this.  I still consider it a bug because I could not find any message about this in the logs.

Mark Grennan

 



Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org
Creative Commons License
MySQL Fan Boy by Mark Grennan is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.