Earlier this week I came across a person looking to find a local (to Louisiana) car safety inspection location. I think most states require this but they differ on schedules. Louisiana recently moved to letting you pay more for a two-year sticker which is nice, but it is still a bit of a hassle if you don't know where an inspection location can be found. Turns out - there is a web page for it: http://www.dps.state.la.us/safetydirections.nsf/f3f91999370ccaed862574a20074b158?OpenView.
I looked at this and thought - wouldn't it be cool if we could find the nearest station based on your current location. Turns out it was possible - just not very pretty. I've split this blog entry into two parts - getting the data - and using the data. If you don't care how I scraped the site, feel free to scroll down to the next part.
Scraping the Data
I had hoped the site was using fancy Ajax Ninja stuff with cool JSON-based data sources, but I quickly discovered that it was not. It was pure HTML. Lots, and lots, and, oh my god, lots of HTML. I began by figuring out how the site was set up. The home page contains a list of all the parishes:
Clicking a triangle (but oddly, not the parish name) opens a list of places where you can get your car inspected.
This gives you the location name and address. But to get hours of operation you need to click for details.
All in all, this gave me two things to scrape. First was a list of the locations, which can only be found by first getting all the parishes. Then for each location we needed to get the detail page for the hours of operation. Finally, I could take all those addresses and do a geocode on them to get precise locations.
What follows is a set of ColdFusion scripts I wrote to perform this task. These files are ugly. The HTML used on these pages were messy as hell. The phone numbers had multiple spans/font tags etc. It was a mess. I also took the opportunity to try some fancy ColdFusion 11 updates as well. All in all, this code is quite disgusting, but I'll share it so you can use it to scare away monsters.
First, open up all the parishes and save the location data.
<cfscript>
rootUrl = "http://www.dps.state.la.us/safetydirections.nsf/f3f91999370ccaed862574a20074b158?OpenView&Start=1&Count=1200";
//<cfset links = rematch("/safetydirections.nsf/.*?Expand=.*?""",cfhttp.fileContent)>
//<cfdump var="#links#">
///safetydirections.nsf/f3f91999370ccaed862574a20074b158?OpenView&Start=1&Count=1200&Expand=2#2" target="_self">
//number of parishes but I call it pages, because.
totalPages = 62;
//totalPages = 3;
locations = [];
for(i=1; i<= totalPages; i++) {
theUrl = rootUrl & "&Expand=#i#";
writeoutput(theUrl & "<br/><hr>");
cfhttp(url=theUrl);
//writeoutput("<pre>#htmlEditFormat(cfhttp.filecontent)#</pre>");
matches = reMatch("<font color=""##0000ff"">.*?</tr>",cfhttp.fileContent);
matches.each(function(m) {
var location = {};
var linkre = reFind("<a href=""(.*?)"">", m, 1, true);
location["link"] = m.mid(linkre.pos[2], linkre.len[2]);
var namere = reFind("<a href="".*?"">(.*?)</a>", m, 1, true);
location["name"] = m.mid(namere.pos[2], namere.len[2]);
var tds = reMatch("<td>(.*?)</td>", m);
var address = rereplace(tds[1], "<td><b><font color=""##0000ff"">(.*?)</font></b></td>", "\1");
address = address.replace("<br>","");
location["address"] = address;
location["types"] = [];
var typeList = rereplace(tds[3], "<td><b><font color=""##0000ff"">(.*?)</font></b></td>","\1");
typeList = typeList.replace("<br>", ",", "all");
typeList.each(function(t) {
t = trim(t);
location["types"].append(t);
});
//writedump(location);
// writedump(m);
locations.append(location);
});
// writedump(matches);
}
writedump(locations.len());
fileWrite(expandPath("./data1.json"), serializeJSON(locations));
</cfscript>
Next, get the details. This includes the hours of operation I mentioned earlier, as well as the phone number.
<cfscript>
rootUrl = "http://www.dps.state.la.us/";
data = deserializeJSON(fileRead(expandPath("data1.json")));
//filter by items w/o a phone number
writeoutput("There are #data.len()# items.<br/>");
/*
filtered = data.filter(function(x) {
return !structKeyExists(x, "phoneNumber");
});
writeoutput("There are #data.len()# items to process.<br/>");
*/
counter=0;
data.each(function(l) {
counter++;
if(counter mod 100 is 0) {
writeoutput("#counter#<br/>");
cfflush();
}
//Only get if we don't have the data already
if(structKeyExists(l, "phoneNumber")) continue;
cfhttp(url="#rootUrl#/#l.link#");
var content = cfhttp.fileContent;
var found = reMatch('Area Code</font></b><b><font color="##0000FF" face="HandelGotDLig"> </font></b><b><font color="##ff0000" face="HandelGotDLig">.*?</font>', content);
var areaCode = found[1].rereplace(".*>([0-9]{3})</font>", "\1");
found = reMatch('Phone Number</font></b><b><font color="##FF0000" face="HandelGotDLig"> </font></b><b><font color="##ff0000" face="HandelGotDLig">.*?</td>', content);
var phoneFirst = found[1].rereplace(".*>([0-9]{3})</font>.*", "\1");
var phoneSecond = found[1].rereplace(".*>([0-9]{4})</font>.*", "\1");
var phoneNumber = "(" & areaCode & ") " & phoneFirst & "-" & phoneSecond;
// writeoutput("<b>#phoneNumber#</b><p>");
found = content.reMatch('Hours of Operation.*?</tr>');
var hoo = found[1].rereplace(".*?</td><td width=""536"">(.*?)</td></tr>", "\1");
hoo = hoo.rereplace("<.*?>", " ", "all");
hoo = hoo.rereplace("[[:space:]]{2,}", " ");
// writedump(found);
// writeOutput("<pre>"&htmlEditFormat(cfhttp.fileContent)&"</pre>");
// abort;
l["phoneNumber"] = phoneNumber;
l["hours"] = hoo;
fileWrite(expandPath("data1.json"), serializeJSON(data));
});
writeoutput("<p>Done!</p>");
</cfscript>
Finally, do the geocoding.
<cfscript>
geo = new googlegeocoder3();
data = deserializeJSON(fileRead(expandPath("data1.json")));
writeoutput("There are #data.len()# items.<br/>");
counter=0;
data.each(function(l) {
counter++;
if(counter mod 100 is 0) {
writeoutput("#counter#<br/>");
cfflush();
}
//Only get if we don't have the data already
if(structKeyExists(l, "long")) continue;
var res = geo.googlegeocoder3(address = l.address);
l["long"] = res.longitude[1];
l["lat"] = res.latitude[1];
fileWrite(expandPath("data1.json"), serializeJSON(data));
});
writeoutput("<p>Done!</p>");
</cfscript>
Note - I used one more script to remove the link property from my data file to make it a bit smaller. So at this point, I had a data.json file containing every location in Louisiana where you can get your car inspected. I also had their phone numbers, hours of operation, and longitude and latitude. Woot! Now for the fun stuff - the front end!
Using the Data
For my front end, I decided to go simple. No bootstrap. No UI framework at all. Just a simple div to display dynamic data. I could make this pretty, but why bother?
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title></title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
</head>
<body>
<div id="status"></div>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="app.js"></script>
</body>
</html>
The real fun happens in app.js. I'll share the entire file, then describe what each part does.
var $status;
var geoData;
var myLong;
var myLat;
$(document).ready(function() {
$status = $("#status");
//Do we have the data locally?
geoData = localStorage["geocache"];
if(!geoData) {
$status.html("<i>Fetching initial data set. Please stand by. This data will be cached for future operations.</i>");
$.getJSON("data1.json").done(function(res) {
console.log("Done");
localStorage["geocache"] = JSON.stringify(res);
geoData = res;
$status.html("");
getLocation();
});
} else {
geoData = JSON.parse(geoData);
getLocation();
}
});
function getLocation() {
$status.html("<i>Getting your location.</i>");
navigator.geolocation.getCurrentPosition(gotLocation, failedLocation);
}
function failedLocation() {
$status.html("<b>Sorry, but we were unable to get your location.</b>");
}
function gotLocation(l) {
myLong = l.coords.longitude;
myLat = l.coords.latitude;
appReady();
}
function appReady() {
$status.html("<i>Now searching for nearby locations.</i>");
for(var i=0;i<geoData.length;i++) {
var dist = getDistanceFromLatLonInKm(myLat, myLong, geoData[i].lat, geoData[i].long);
geoData[i].dist = dist;
}
geoData.sort(function(x,y) {
if(x.dist > y.dist) return 1;
if(x.dist < y.dist) return -1;
return 0;
});
var s = "<h2>Nearby Locations</h2>";
for(var i=0;i<Math.min(9, geoData.length); i++) {
s+= "<p><b>"+geoData[i].name+"</b><br/>";
s+= geoData[i].address+" "+Math.round(geoData[i].dist)+" km away<br/>";
s+= "<a href='tel:"+geoData[i].phoneNumber+"'>"+geoData[i].phoneNumber+"</a><br/>";
s+= "Hours: "+geoData[i].hours+"<br/>";
s+= "Types: "+geoData[i].types.join(", ")+"<br/>";
s+= "</p>";
}
$status.html(s);
}
//Credit: http://stackoverflow.com/a/27943/52160
function getDistanceFromLatLonInKm(lat1,lon1,lat2,lon2) {
var R = 6371; // Radius of the earth in km
var dLat = deg2rad(lat2-lat1); // deg2rad below
var dLon = deg2rad(lon2-lon1);
var a =
Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) *
Math.sin(dLon/2) * Math.sin(dLon/2)
;
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
var d = R * c; // Distance in km
return d;
}
function deg2rad(deg) {
return deg * (Math.PI/180)
}
So, the first thing I wondered was - how do I handle the data? It was 700K, which isn't too big, but isn't tiny either. I decided to simply store the data in LocalStorage. I could also store an "update date" key so I knew when to refresh the data, but for now, what I have is sufficient. Get it - store it - and carry on.
Once we have the data file, we then simply detect where you are. This is boilerplate geolocation stuff so it isn't terribly fancy.
Next - we need to determine the distance between you and each location. There were quite a few locations (1,916) so I was concerned about the timing, but this portion ran very quickly as well. Then it was simply a matter of a sort operation. I display the closest 10 locations and that's it. Of course, these numbers are a bit high as I'm in San Francisco. ;)
If you want to try this yourself, just hit the demo link below. Enjoy!
Archived Comments
uh, you store the whole state's info in each user's w/s rather than just the nearby ones? or on some server?
and you're not restricting user's location queries to someplace within your state?
GLASS DOCTOR
1401 N MARKET SHREVEPORT 11144 km away
and while i admire your adherence to metric standards, maybe miles would work better for folks in LA?
hold out your hands while i get my rubber mallet ;-)
"uh, you store the whole state's info in each user's w/s rather than just the nearby ones? or on some server?"
Yes. I should have explained why. Having it on the client means I can do the distance calculation there. If I keep it on the server I have to setup an application server of some kind. Of course, here I have ColdFusion, but I'm thinking in general, when I can avoid using an application server, I will.
The "database" is around 670K. About 2 large images.
"and you're not restricting user's location queries to someplace within your state?"
I'm not quite sure I get your point here. If this app were released, and someone in CA decided they REALLY wanted to find the closest location in LA, well, then that is their choice, right? I think you are overthinking it a bit. ;)
"and while i admire your adherence to metric standards, maybe miles would work better for folks in LA?"
I considered that. But I figured if the results were sorted by closest, then it wouldn't really matter.
Very useful civic app. But something is wrong in the distance calculation. It shows that in Michigan I am 71 km away from some of these locations or roughly 44 miles. FYI I'm approximately 1865 km from Lafayette.
Which location was it? It is possible the geocoding failed for some. When I did the lookup, I used *just* the address. I think I should have appended "Louisiana" to make it more precise.
storing extraneous/unused data is kind of wasteful--doesn't take much to fill up a mobile device (the cycling playlist on my phone is in constant battle w/apps & updates). maybe its just my age, but always keeping an eye on storage on those infernal devices.
and still think you should filter based on location--you can make the app more useful by simple bounding box filter (box of coords around LA & only using user locations within that box & reject somebody from say bangkok). edge cases won't matter much as this is just an LA-only service, really can't image folks in alaska would want to find some place to get their LA sticker updated.
and since i've already taken the time, probably a map showing those garage locations would be useful as well.
you know if you had this for all 50 states, probably some beer/beard grooming money to be made.
i make spatial apps for a living, so no, not over thinking.
"storing extraneous/unused data is kind of wasteful" - again - less then 700k here. :) But ok - we can agree to disagree here.
And I still don't see why I would bother blocking someone from outside LA. If they want to use it... they use it. Heck, they may be on the border, outside, driving home, and want to check which one is closest.
"you know if you had this for all 50 states, probably some beer/beard grooming money to be made."
Sure - if I had that data.
Forgot to say - yeah - I could add a map easily enough. Even driving directions. Once you have the Long/Lat and your own Long/Lat, Google has services for that.
Thanks Ray! I'm learning a lot by closely examining your code.
That's very clean, putting the database in a json file and then caching it in localStorage.
So far, I've only used localStorage for variables, not objects.
I use webSQL for the heavier lifting, but as you know, webSQL is on the way out and IndexDB is on the way in.
So using localStorage to store an entire "database" is a neat technique.
You use a bracket syntax, and I wondered about using a dot syntax for localStorage. To my StackOverflow question about whether I can use localStorage.myVariable, someone said "It looks like Mozilla is planning a transition to the standard Storage implementation specified by WHATWG, which relies on the getter/setter methods only:".
http://stackoverflow.com/qu...
http://stackoverflow.com/qu...
@PS: I'd caution you that what I did may be stretching the "appropriate" usage of LS a bit much. :)
I wasn't aware that Moz was going to move away from bracket syntax. I prefer it when getting and setting items and only use the API for when I want to remove everything, as it is quicker. Thanks for warning me!
Ray all the locations mileages were incorrect. I noticed in revisiting today that all the locations are in duplicate which wasn't the case last night. Is the program failing to delete the cookie?
Here are the first three locations:
GORDONS SERVICE CTR
401 W MAIN HOMER 71 km away
MCKENZIE BROS. GARAGE & TOWING
635 WEST MAIN HOMER 71 km away
LONNIE'S SERVICE CENTER
618 WEST MAIN HOMER 71 km away
Sorry - whats wrong with the data you shared? I see 3 different businesses on 3 different addresses.
I gave it my location so I'm assuming that it should calculate the distance from where I am located which is outside of East Lansing, Michigan. I am approximately 1614 km from Homer, Louisiana.
Oh I'm sorry, the #s are wrong. Ok, something I can check. Chrome lets you fake your location. By any wild chance can you tell me your long/lat?