Is MySQL’s GIS really worth using?
Is GIS worth using in MySQL? In the past few post, I have explored what GIS is and how it is used. GIS encoded data is wonderful and can help with all kinds of cool queries. I’m late getting this article written so lets get right to it.
The most common geographical query is for all the point within some distance from a given point. I’ll try to focus on ways to answer this type of query. Accuracy of the answer is always important. Think carefully about your query. Do you want every pizza place within a radius of a port or within a square mile? Or, do you really want it within a miles walking distance?
I’m using the common city_lookup table for these tests. Here is the schema.
CREATE TABLE `city_lookup` ( `city_id` INT(7) NOT NULL DEFAULT '0', `feature` VARCHAR(20) NULL DEFAULT NULL, `name` VARCHAR(50) NULL DEFAULT NULL, `pop_2000` INT(10) NULL DEFAULT NULL, `fips_55` VARCHAR(7) NULL DEFAULT NULL, `county` VARCHAR(50) NULL DEFAULT NULL, `fips` VARCHAR(7) NULL DEFAULT NULL, `state` CHAR(3) NULL DEFAULT NULL, `state_fips` CHAR(3) NULL DEFAULT NULL, `display` TINYINT(3) NULL DEFAULT NULL, `lat` DOUBLE NULL DEFAULT NULL, `lon` DOUBLE NULL DEFAULT NULL,PRIMARY KEY (`city_id`), KEY `lat` (`lat`), KEY `lon` (`lon`))ENGINE=MyISAM
This table uses simple numeric columns to store latitude (lat) and longitude (lat). There are 35,432 records in the city_lookup table. I’ll flush the cache before each query.
I added at geomentry point to his table by converting latitude and longitude with this command.
ALTER TABLE city_lookup ADD location GEOMETRY NOT NULL AFTER lon; UPDATE city_lookup set location = point(lon,lat)
Simple Queries
Now to search for data. There are lots of formulas / calculations to find geo distences and navigation. Most of these are highly accurate. I have chosen a "POW()" formula because it requires the least work for MySQL. (Best guess.)
SELECT NAME,lat,lon,ASTEXT(location) FROM city_lookup
WHERE (POW(lat - 35.5,2) + POW(lon - -97.6,2)) < .02
| NAME | lat | lon | ASTEXT(location) |
|---|---|---|---|
| Oklahoma City | 35.4676 | -97.5164 | POINT(-97.5164 35.4676) |
| Woodlawn Park | 35.5114 | -97.65 | POINT(-97.65 35.5114) |
| Bethany | 35.5187 | -97.6323 | POINT(-97.6323 35.5187) |
| Warr Acres | 35.5226 | -97.6189 | POINT(-97.6189 35.5226) |
| Nichols Hills | 35.5509 | -97.5489 | POINT(-97.5489 35.5509) |
| The Village | 35.5609 | -97.5514 | POINT(-97.5514 35.5609) |
Now let's do almost the same query using GIS functions and a bounding box created with two points. Now we are searching a square, not a circle.
SELECT name,lat,lon,AsText(location) FROM city_lookupWHERE MBRContains(GeomFromText('LineString(-98.7 35.6, -97.5 35.4)'),location) ;
| name | lat | lon | AsText(location) |
|---|---|---|---|
| Oklahoma City | 35.4676 | -97.5164 | POINT(-97.5164 35.4676) |
| Woodlawn Park | 35.5114 | -97.65 | POINT(-97.65 35.5114) |
| Bethany | 35.5187 | -97.6323 | POINT(-97.6323 35.5187) |
| Warr Acres | 35.5226 | -97.6189 | POINT(-97.6189 35.5226) |
| Nichols Hills | 35.5509 | -97.5489 | POINT(-97.5489 35.5509) |
| The Village | 35.5609 | -97.5514 | POINT(-97.5514 35.5609) |
Timestamp Duration Message Line Position
10/27/2010 3:38:02 PM 0:00:02.420 Query OK
Well, that’s not any faster. Be we did get the results we expected. I ran a hundred of each query on a quit system. The POW() query takes 0.157 of a seconds and the MBRContains() query takes 0.171 of a second on average.
Is it the Math?
Maybe the math used in the queries is having an effect. I’ll use benchmark to test the basic functions. This will not be completely fair. To make this work, I had to added a POINT() function to the MBRContains() functions so I can run the MBRContains “calculation” in benchmark.
select benchmark (10000000, (POW(35.6 - 35.5,2) + POW(-97.7 - -97.6,2)) < .02 ) ;
This runs in 3.354 seconds.
select benchmark (10000000, MBRContains(GeomFromText('LineString(-97.7 35.6, -97.5 35.4)'),POINT(-97.6,35.5)) ) ;
This runs in 5.460 seconds. Now I’ll try to remove the time was taken by the POINT() function?
select benchmark (10000000, POINT(-97.6,35.5));
This ran in only 0.967. So the MBRContains() function runs in 4.493 after removing the time POINT() takes. Still the POW() functions looks better. It runs in 3/4th the time of the MBRContains() function.
Indexing?
Explain shows neither query is using an index. In a working application, both queries would contain variables that would replace the latitude and longitude numbers (35.5 and -97.6). Because the POW() query uses these as a part of the WHERE clause it is not able to use either the lat or lon index.
So far both POW() and GIS queries are searching through the entire database and taking the same time. (I saw that coming.)
Next I created an index for the location column and tried the query again.
ALTER TABLE city_lookup ADD SPATIAL INDEX `location` (`location`) ;
SELECT name,lat,lon,AsText(location) FROM city_lookup
WHERE MBRContains(GeomFromText('LineString(-98.7 35.6, -97.5 35.4)'),location) ;
Now the average time for the GIS query is .00162. That’s almost ten times faster!
Conclustion
You should be using GIS functions, but be aware of the limitations.
- MySQL only uses bounding box points. Complex shapes will NOT exclude records within the bounding box but outside your polygon.
- MBRContains function is NOT a distance function. If you are starting with a point and distance you will need to calculate the the difference in Lat and Lon to create the bounding box points. (1 deg of latitude ~= 69 miles and 1 deg of longitude ~= cos(latitude)*69)
For my next post:

Reverences:
This is a really great talk on GEO searches with MySQL by Alexander Rubin. http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
Wiki descriptions on Latitude http://en.wikipedia.org/wiki/Latitude Longitude http://en.wikipedia.org/wiki/Longitude and geographical distance http://en.wikipedia.org/wiki/Geographical_distance

LinkedIn
Twitter

Mysql GIS functions are great if your data is static. However, it’s impractical to use myisam tables on a database with frequent updates.
Link | November 12th, 2010 at 4:32 pm
What is ASFA / CAPTA Part 1 of 4 | Causes of autism Blog wrote:
[...] MySQL GIS – Part 6 | MySQL Fanboy [...]
Link | November 14th, 2010 at 4:21 am
I total agree. I manage weather radar data and predictions created hourly for points all over the world.
Iphone apps send their GPS locations and request this data. So even though the user iPhones are moving and the radar and predictions are changing the observations points (like airports) don’t change.
I can keep a MyISAM observation list. When a iPhone user requests an update, I find their nearest obs key to the changes.
Now if you wanted to know the flight number of planes that might fall on your head in real time?
Link | November 15th, 2010 at 2:20 pm
Mark,
Thank you so much for taking the time to post these tutorials. There seems to be a lack of good documentation and treatment of this subject and you did a very good job bridging that gap for me.
As a newb to MySQL and GIS in general I have a few quick follow up questions. – I’m particularly interested in radius searches at the moment and it would be nice to understand the accuracy of the two methods you talked about here.
1. Could you explain the POW() method a little more. (I know you posted links to some resources, and I read them, if not briefly, but I had trouble bridging the understanding gap there.)
2. Similarly, could you also explain your parting note about “(1 deg of latitude ~= 69 miles and 1 deg of longitude ~= cos(latitude)*69)”. How accurate does our radius result get if we use a longitude of 69.172 miles per degree, and latitude of 69.2 miles per degree?
2.b. As a bonus question, how would one convert your statement to code that runs fast.
Thank you in advance!
-Sean
Link | March 7th, 2011 at 10:43 am
Sean, I’ll work on your questions this week. It’s time to post more about GIS.
Link | March 7th, 2011 at 5:01 pm
In this example, what does 0.02 mean? If I want to do a search for all the places within a mile radius… how would the SQL change?
SELECT NAME,lat,lon,ASTEXT(location) FROM city_lookup
WHERE (POW(lat – 35.5,2) + POW(lon – -97.6,2)) < .02
Link | March 14th, 2011 at 3:59 pm
The 0.02 is the radius of the search area. This is a very inaccurate way of calculating distance. If you need a measure you will need to use a real formula. There are many with differing accuracy and processing requirements.
https://secure.wikimedia.org/wikipedia/en/wiki/Geographical_distance
Link | March 16th, 2011 at 9:13 am
I’ve been working with MySQL 5.5.x recently and have become of the MBR limitations in the CONTAINS/WITHIN spatial functions. I’ve seen numerous blog posts about the MySQL 5.6 release supposedly fixing these so that CONTAINS will work with complex multipolygon shapes. Do you know if this is true, and when is 5.6 going to be available?
Link | September 8th, 2011 at 9:58 am
Quick question. Does is matter when creating the Geographic data if you call POINT(lat,lon) or POINT(lon,lat)? Different people see to go for either one, I would think that it would matter quite a bit…
Thanks for your reply.
Link | April 2nd, 2012 at 1:37 pm
Not for point data if you are always consistent. However it does matter for polygons. For Polygons Lon must come first and the data points must move around the shape in a clock wise order.
Link | April 6th, 2012 at 9:54 am
Just a little curious, since mysql is managed by java is there a way I can draw d polygon from my database to a java frame?
Link | April 18th, 2012 at 3:49 am