04 Apr

points in an area in mysql

3am, daughter wide awake, not allowed to sleep – what’s a guy to do? Let’s do an experiment.

Let’s say we want to efficiently select all points in an area from a database. This has real-world applications – I’ll be using it in a geographical project very soon.

First, create a simple table in MySQL. I’ve created mine in a database called ‘geodb’.

CREATE TABLE points(x INT, y INT, INDEX(x), INDEX(y));

Note that x and y are both indexed.

Then, seed that table using some PHP (I had to up the max_execution_time on my laptop to 300 for this).

<?php
mysql_connect('localhost','username','password');
mysql_select_db('geodb');
for($i=0;$i<10000000;++$i){
  $x=rand(-1000000,1000000);
  $y=rand(-1000000,1000000);
  mysql_query("INSERT INTO points (x,y) VALUES ($x,$y)");
}

Ok. For the rest of the experiment, we’ll be trying various ways to extract the number of points within a radius of 100000 from (0,0). The goal is to have the lowest working time. Each method should return the exact same result.

First, from the console, do a straight select statement.

time echo "SELECT COUNT(x) FROM points WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

Returned result 7993 in .414 seconds. Wow – pretty quick already… but, my daughter is still awake, so let’s continue.

We can improve this by avoiding the math on points that we are certain can not be in the area. For example, in a radius of 100000 from (0,0), and points with x<-100000, x>100000, y<-100000, y>100000 can definitely not be in the circle.

time echo "SELECT COUNT(x) FROM (SELECT x,y FROM points WHERE x>-100000 AND x<100000 AND y>-100000 AND y<100000) AS sub1 WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

.326 seconds. Better. However, there are two calculations being performed on each value. Let’s reduce that.

time echo "SELECT COUNT(x) FROM (SELECT x,y FROM points WHERE ABS(x)<=100000 AND ABS(y)<=100000) AS sub1 WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

.297 – more than 25% fster than the original.

It should be possible to reduce that further. For example, we can be certain that x,y values which are both less than (100000*Cos(Pi/4)) are contained inside the circle, so that’s another < comparison, reducing the number of maths operations. I’d test that one as well, but my daughter is finally asleep in my left arm as type.