Home » Python » Yelp: Reverse geocoding businesses to extract detailed location information

About Mark Needham

Mark Needham

Yelp: Reverse geocoding businesses to extract detailed location information

I’ve been playing around with the Yelp Open Dataset and wanted to extract more detailed location information for each business.

This is an example of the JSON representation of one business:

$ cat dataset/business.json | head -n1 | jq
{
  "business_id": "FYWN1wneV18bWNgQjJ2GNg",
  "name": "Dental by Design",
  "neighborhood": "",
  "address": "4855 E Warner Rd, Ste B9",
  "city": "Ahwatukee",
  "state": "AZ",
  "postal_code": "85044",
  "latitude": 33.3306902,
  "longitude": -111.9785992,
  "stars": 4,
  "review_count": 22,
  "is_open": 1,
  "attributes": {
    "AcceptsInsurance": true,
    "ByAppointmentOnly": true,
    "BusinessAcceptsCreditCards": true
  },
  "categories": [
    "Dentists",
    "General Dentistry",
    "Health & Medical",
    "Oral Surgeons",
    "Cosmetic Dentists",
    "Orthodontists"
  ],
  "hours": {
    "Friday": "7:30-17:00",
    "Tuesday": "7:30-17:00",
    "Thursday": "7:30-17:00",
    "Wednesday": "7:30-17:00",
    "Monday": "7:30-17:00"
  }
}

The businesses reside in different countries so I wanted to extract the area/county/state and the country for each of them. I found the reverse-geocoder library which is perfect for this problem.

You give the library a lat/long or list of lat/longs and it returns you back a list containing the nearest lat/long to your points along with the name of the place, Admin regions, and country code. It’s way quicker to pass in a list of lat/longs than to call the function individually for each lat/long so we’ll do that.

We can write the following code to extract location information for a list of lat/longs:

import reverse_geocoder as rg
 
lat_longs = {
    "FYWN1wneV18bWNgQjJ2GNg": (33.3306902, -111.9785992),
    "He-G7vWjzVUysIKrfNbPUQ": (40.2916853, -80.1048999),
    "KQPW8lFf1y5BT2MxiSZ3QA": (33.5249025, -112.1153098)
}
 
business_ids = list(lat_longs.keys())
locations = rg.search(list(lat_longs.values()))
 
for business_id, location in zip(business_ids, locations):
    print(business_id, lat_longs[business_id], location)

This is the output we get from running the script:

$ python blog.py 
Loading formatted geocoded file...
FYWN1wneV18bWNgQjJ2GNg (33.3306902, -111.9785992) OrderedDict([('lat', '33.37088'), ('lon', '-111.96292'), ('name', 'Guadalupe'), ('admin1', 'Arizona'), ('admin2', 'Maricopa County'), ('cc', 'US')])
He-G7vWjzVUysIKrfNbPUQ (40.2916853, -80.1048999) OrderedDict([('lat', '40.2909'), ('lon', '-80.10811'), ('name', 'Thompsonville'), ('admin1', 'Pennsylvania'), ('admin2', 'Washington County'), ('cc', 'US')])
KQPW8lFf1y5BT2MxiSZ3QA (33.5249025, -112.1153098) OrderedDict([('lat', '33.53865'), ('lon', '-112.18599'), ('name', 'Glendale'), ('admin1', 'Arizona'), ('admin2', 'Maricopa County'), ('cc', 'US')])

It seems to work fairly well! Now we just need to tweak our script to read in the values from the Yelp JSON file and generate a new JSON file containing the locations:

import json
 
import reverse_geocoder as rg
 
lat_longs = {}
 
with open("dataset/business.json") as business_json:
    for line in business_json.readlines():
        item = json.loads(line)
        if item["latitude"] and item["longitude"]:
            lat_longs[item["business_id"]] = {
                "lat_long": (item["latitude"], item["longitude"]),
                "city": item["city"]
            }
 
result = {}
 
business_ids = list(lat_longs.keys())
locations = rg.search([value["lat_long"] for value in lat_longs.values()])
 
for business_id, location in zip(business_ids, locations):
    result[business_id] = {
        "country": location["cc"],
        "name": location["name"],
        "admin1": location["admin1"],
        "admin2": location["admin2"],
        "city": lat_longs[business_id]["city"]
    }
 
with open("dataset/businessLocations.json", "w") as business_locations_json:
    json.dump(result, business_locations_json, indent=4, sort_keys=True)

And that’s it!

Published on Web Code Geeks with permission by Mark Needham, partner at our WCG program. See the original article here: Yelp: Reverse geocoding businesses to extract detailed location information

Opinions expressed by Web Code Geeks contributors are their own.

(0 rating, 0 votes)
You need to be a registered member to rate this.
Start the discussion Views Tweet it!
Do you want to know how to develop your skillset to become a Web Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. Building web apps with Node.js
2. HTML5 Programming Cookbook
3. CSS Programming Cookbook
4. AngularJS Programming Cookbook
5. jQuery Programming Cookbook
6. Bootstrap Programming Cookbook
and many more ....
I agree to the Terms and Privacy Policy

Leave a Reply

avatar

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of