MySQL GIS – Part 6

November 12, 2010

Is MySQL’s GIS really worth using?

Is GIS worth using in MySQL? In the past few post, I have explored what GIS is and how it is used. GIS encoded data is wonderful and can help with all kinds of cool queries.  I’m late getting this article written so lets get right to it.

The most common geographical  query is for all the point within some distance from a given point. I’ll try to focus on ways to answer this type of query. Accuracy of the answer is always important. Think carefully about your query. Do you want every pizza place within a radius of a port or within a square mile? Or, do you really want it within a miles walking distance?

I’m using the common city_lookup table for these tests. Here is the schema.

CREATE TABLE `city_lookup` (
`city_id` INT(7) NOT NULL DEFAULT '0',
`feature` VARCHAR(20) NULL DEFAULT NULL,
`name` VARCHAR(50) NULL DEFAULT NULL,
`pop_2000` INT(10) NULL DEFAULT NULL,
`fips_55` VARCHAR(7) NULL DEFAULT NULL,
`county` VARCHAR(50) NULL DEFAULT NULL,
`fips` VARCHAR(7) NULL DEFAULT NULL,
`state` CHAR(3) NULL DEFAULT NULL,
`state_fips` CHAR(3) NULL DEFAULT NULL,
`display` TINYINT(3) NULL DEFAULT NULL,
`lat` DOUBLE NULL DEFAULT NULL,
`lon` DOUBLE NULL DEFAULT NULL,
 PRIMARY KEY (`city_id`),
  KEY `lat` (`lat`),
  KEY `lon` (`lon`)
)ENGINE=MyISAM

This table uses simple numeric columns to store latitude (lat) and longitude (lat).  There are 35,432 records in the city_lookup table. I’ll flush the cache before each query.

I added at geomentry point to his table by converting latitude and longitude with this command.

ALTER TABLE city_lookup ADD location GEOMETRY NOT NULL AFTER lon;
UPDATE city_lookup set location = point(lon,lat)

Simple Queries

Now to search for data. There are lots of formulas / calculations to find geo distences and navigation. Most of these are highly accurate. I have chosen a "POW()" formula because it requires the least work for MySQL. (Best guess.)

SELECT NAME,lat,lon,ASTEXT(location) FROM city_lookup
  WHERE (POW(lat - 35.5,2) + POW(lon - -97.6,2)) < .02
NAME lat lon ASTEXT(location)
Oklahoma City 35.4676 -97.5164 POINT(-97.5164 35.4676)
Woodlawn Park 35.5114 -97.65 POINT(-97.65 35.5114)
Bethany 35.5187 -97.6323 POINT(-97.6323 35.5187)
Warr Acres 35.5226 -97.6189 POINT(-97.6189 35.5226)
Nichols Hills 35.5509 -97.5489 POINT(-97.5489 35.5509)
The Village 35.5609 -97.5514 POINT(-97.5514 35.5609)
Timestamp Duration Message Line Position 10/27/2010 3:37:10 PM 0:00:00.891 Query OK

Now let's do almost the same query using GIS functions and a bounding box created with two points.  Now we are searching a square, not a circle.

SELECT name,lat,lon,AsText(location) FROM city_lookup
  WHERE MBRContains(GeomFromText('LineString(-98.7 35.6, -97.5 35.4)'),location) ;
name lat lon AsText(location)
Oklahoma City 35.4676 -97.5164 POINT(-97.5164 35.4676)
Woodlawn Park 35.5114 -97.65 POINT(-97.65 35.5114)
Bethany 35.5187 -97.6323 POINT(-97.6323 35.5187)
Warr Acres 35.5226 -97.6189 POINT(-97.6189 35.5226)
Nichols Hills 35.5509 -97.5489 POINT(-97.5489 35.5509)
The Village 35.5609 -97.5514 POINT(-97.5514 35.5609)

Timestamp Duration Message Line Position

10/27/2010 3:38:02 PM 0:00:02.420 Query OK

Well, that’s not any faster.  Be we did get the results we expected.  I ran a hundred of each query on a quit system.  The POW() query takes 0.157 of a seconds and the MBRContains() query takes 0.171 of a second on average.

Is it the Math?

Maybe the math used in the queries is having an effect.  I’ll use benchmark to test the basic functions.  This will not be completely fair. To make this work, I had to added a POINT() function to the MBRContains() functions so I can run the MBRContains “calculation” in benchmark.

select benchmark (10000000, (POW(35.6 - 35.5,2) + POW(-97.7 - -97.6,2)) < .02 ) ;

This runs in 3.354 seconds.

select benchmark (10000000, MBRContains(GeomFromText('LineString(-97.7 35.6, -97.5 35.4)'),POINT(-97.6,35.5)) ) ;

This runs in 5.460 seconds.  Now I’ll try to remove the time was taken by the POINT() function?

select benchmark (10000000, POINT(-97.6,35.5));

This ran in only 0.967. So the MBRContains() function runs in 4.493 after removing the time POINT() takes.  Still the POW() functions looks better.  It runs  in 3/4th the time of the MBRContains() function.

Indexing?

Explain shows neither query is using an index.  In a working application, both queries would contain variables that would replace the latitude and longitude numbers (35.5 and -97.6). Because the POW() query uses these as a part of the WHERE clause it is not able to use either the lat or lon index.

So far both POW() and GIS queries are searching through the entire database and taking the same time.  (I saw that coming.)

Next I created an index for the location column and tried the query again.

ALTER TABLE city_lookup ADD SPATIAL INDEX `location` (`location`) ;
SELECT name,lat,lon,AsText(location) FROM city_lookup
  WHERE MBRContains(GeomFromText('LineString(-98.7 35.6, -97.5 35.4)'),location) ;

Now the average time for the GIS query is .00162.  That’s almost ten times faster!

Conclustion

You should be using GIS functions, but be aware of the limitations.

  1. MySQL only uses bounding box points. Complex shapes will NOT exclude records within the bounding box but outside your polygon.
  2. MBRContains function is NOT a distance function. If you are starting with a point and distance you will need to calculate the the difference in Lat and Lon to create the bounding box points.  (1 deg of latitude ~= 69 miles and 1 deg of longitude ~= cos(latitude)*69)

For my next post:

  • How to collect your own GIS data.
  • Good and bad examples of searching GIS data.
  • Data sources shared by users.
  • Mark Grennan

    Reverences:

    This is a really great talk on GEO searches with MySQL by Alexander Rubin.  http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL

    Wiki descriptions on Latitude http://en.wikipedia.org/wiki/Latitude Longitude http://en.wikipedia.org/wiki/Longitude and geographical distance http://en.wikipedia.org/wiki/Geographical_distance

    tags: , ,
    posted in GIS by mark

    Follow comments via the RSS Feed | Leave a comment | Trackback URL

    11 Comments to "MySQL GIS – Part 6"

    1. Gavin Towey wrote:

      Mysql GIS functions are great if your data is static. However, it’s impractical to use myisam tables on a database with frequent updates.

    2. What is ASFA / CAPTA Part 1 of 4 | Causes of autism Blog wrote:

      [...] MySQL GIS – Part 6 | MySQL Fanboy [...]

    3. mark wrote:

      I total agree. I manage weather radar data and predictions created hourly for points all over the world.

      Iphone apps send their GPS locations and request this data. So even though the user iPhones are moving and the radar and predictions are changing the observations points (like airports) don’t change.

      I can keep a MyISAM observation list. When a iPhone user requests an update, I find their nearest obs key to the changes.

      Now if you wanted to know the flight number of planes that might fall on your head in real time?

    4. Sean Conner wrote:

      Mark,

      Thank you so much for taking the time to post these tutorials. There seems to be a lack of good documentation and treatment of this subject and you did a very good job bridging that gap for me.

      As a newb to MySQL and GIS in general I have a few quick follow up questions. – I’m particularly interested in radius searches at the moment and it would be nice to understand the accuracy of the two methods you talked about here.

      1. Could you explain the POW() method a little more. (I know you posted links to some resources, and I read them, if not briefly, but I had trouble bridging the understanding gap there.)

      2. Similarly, could you also explain your parting note about “(1 deg of latitude ~= 69 miles and 1 deg of longitude ~= cos(latitude)*69)”. How accurate does our radius result get if we use a longitude of 69.172 miles per degree, and latitude of 69.2 miles per degree?

      2.b. As a bonus question, how would one convert your statement to code that runs fast.

      Thank you in advance!

      -Sean

    5. mark wrote:

      Sean, I’ll work on your questions this week. It’s time to post more about GIS.

    6. Daniel wrote:

      In this example, what does 0.02 mean? If I want to do a search for all the places within a mile radius… how would the SQL change?

      SELECT NAME,lat,lon,ASTEXT(location) FROM city_lookup
      WHERE (POW(lat – 35.5,2) + POW(lon – -97.6,2)) < .02

    7. mark wrote:

      The 0.02 is the radius of the search area. This is a very inaccurate way of calculating distance. If you need a measure you will need to use a real formula. There are many with differing accuracy and processing requirements.

      https://secure.wikimedia.org/wikipedia/en/wiki/Geographical_distance

    8. Jim wrote:

      I’ve been working with MySQL 5.5.x recently and have become of the MBR limitations in the CONTAINS/WITHIN spatial functions. I’ve seen numerous blog posts about the MySQL 5.6 release supposedly fixing these so that CONTAINS will work with complex multipolygon shapes. Do you know if this is true, and when is 5.6 going to be available?

    9. Eugene wrote:

      Quick question. Does is matter when creating the Geographic data if you call POINT(lat,lon) or POINT(lon,lat)? Different people see to go for either one, I would think that it would matter quite a bit…

      Thanks for your reply.

    10. admin wrote:

      Not for point data if you are always consistent. However it does matter for polygons. For Polygons Lon must come first and the data points must move around the shape in a clock wise order.

    11. kay wrote:

      Just a little curious, since mysql is managed by java is there a way I can draw d polygon from my database to a java frame?

    Leave Your Comment

     



    Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org
    Creative Commons License
    MySQL Fan Boy by Mark Grennan is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.