Tuesday, October 8, 2013

US address correction, address verification, and geocoding using ZP4, Google Geocoding API, and Bing Maps API


From a couple of months, I have been involved in Tax certificate address verification project for educational purpose; I have around 8 million parcel address  collected and maintained by different tax collecting agencies in various format for instance COBOL ,CSV, ASCII, Excel , and some are even in pdf for their tax districts. 


My task was to find the address from those different files and export them in CSV with corrected standard USPS address format (STREET, CITY, STATE, and ZIP) to make Universal Tax File format for data analysis by removing unwanted data from the address rows.  Initially, I used ZP4 software, it has “official United States Postal Service data files on a single DVD-ROM that provides a powerful tool for automatically determining the correct mailing address, ZIP + 4 code, and mail carrier route number for any location in the United States”, works fine for few address (about 15%) which has proper USPS standard address format. The real challenge starts with remaining   85% of the parcel address that has no USPS standard address format (either address without city and zip information or the vacant parcel lots that cannot accept mails and therefore not entered in the USPS mailing address database).

Looking around the Web, Google leaves me with tons of pages entitled on address correction and validation including close source and open source software APIs, tools, and datasets, but hard to figure out which method works for my project in terms of address quality(address was poorly written), quantity(8 million addresses), and resources (cheap and easy to implement) . After spending few days here and there, finally I come up with a solution that works for me, an integrated address correction strategy using ZP4, Google Geocoding API, and Bing Maps API.

Here, I am going to share how ZP4, Google and Bing integration makes my job easy and relatively cheap to validate and correct US physical street addresses.

ZP4: Address and postal data on DVD-ROM
Input format- House #, Street, City, State, and Zip (Red: Mandatory fields, Blue: At least two mandatory fields out of three)
Response format- House #, Street, City, State, Zip, Zip4, Lat/Lon, CITY STATE ZIP, and various others fields
Type- DVD- ROM with address correction ZP4.dll
Cost-$99
License- Commercial
Address source- USPS address
Provider- Semaphore Corporation
Interface: Window interface and the DVD comes with demo program showing how to send and receive address records from C, C#, Delphi, VB, and so forth.
Limitations:
a. Works only address in USPS  postal databases
b. Must provide at least two fields out of City, State ,and Zip
c. Limited addresses look up for correction and verification

Google - The Google Geocoding API
Input format- House #, Street, City, State, and Zip (Red: Mandatory fields, Blue: Not mandatory, recommends to provide for confidence on parsed address)
Response format- JSON, XML /Corrected fields-House #, Street, City, State, Zip
Type- REST API Protocol
Cost- Trial Free (2500 Geocode API request/ IP address within 24 hrs), Commercial- starting with $10000 (100,000 requests per day)
License- Trial, Commercial
Address source- Multiple sources
Provider- Google
Interface: No Window interface, but good documentation on using API for addresses geocoding.

Bing- Locations API
Input format- House #, Street, City, State, and Zip (Red: Mandatory fields, Blue: Not mandatory, recommends to provide for confidence on parsed address)
Response format- JSON, XML/ Corrected fields-House #, Street, City, State, Zip
Type- REST API Protocol
Cost- Trial Free (50,000 Geocode API request/ Bing Maps API key with in 24hrs) for 90 days.  Can generate up to 3 trail keys from a Microsoft live account that allows you to verify 150000 address/24hrs for 90 days.
License- Trial, Commercial
Address source- Multiple sources
Provider- Microsoft
Interface: No Window interface, but good documentation on using API for addresses geocoding.
Both Google and Bing APIs has limitations in terms of API key limit and can't correct address with 100% accuracy. However, helps a lot for address correction.
 
Another possible Geocoding API : Yahoo! Boss Geo PlaceFinder
Input format- House #, Street, City, State, and Zip (Red: Mandatory fields, Blue: Not mandatory, recommends to provide for confidence on parsed address)
Response format- JSON, XML/ Corrected fields-House #, Street, City, State, Zip
Type- REST API Protocol
Among three address correction tools/APIs Bing works for me well than ZP4 and GOOGLE in terms of finding missing city and zip. ZP4 works well for the STANDARD USPS address datasets and prevents from calling Geocoding API even for the standard addresses; the API key is the most important and limited resource for my educational trail project.

Integrated Address Correction Code Snippets in VB
 'Parse Address  
 Function TestRegExp(myPattern As String, myString As String) As Boolean  
      'Create objects.  
      Dim objRegExp As Regex  
      ' Create a regular expression object.  
      objRegExp = New Regex(myPattern)  
      'Test whether the String can be compared.  
      Dim match As Match = objRegExp.Match(myString)  
      If match.Success Then  
           Return True  
      Else  
           Return False  
      End If  
 End Function 
 Private Function getLastIndexOf(ByVal sArray As String(), ByVal checkArray As String()) As Integer  
      Dim sourceIndex As Integer = 0  
      Dim outputIndex As Integer = 0  
      For Each item As String In checkArray  
           For Each source As String In sArray  
                If source.ToLower = item.ToLower Then  
                     outputIndex = sourceIndex  
                     If item.ToLower = "box" Then  
                          outputIndex = outputIndex + 1  
                     End If  
                End If  
                sourceIndex = sourceIndex + 1  
           Next  
           sourceIndex = 0  
      Next  
      Return outputIndex  
 End Function 
 Public Function parseAddress(ByVal input As String) As Collection  
      Dim output As New Collection  
      If (Regex.IsMatch(input, "^[0-9 ]+$")) Then  
           Console.WriteLine("Only numbers in the Address field")  
           output.Add("", "Address1")  
           output.Add("", "Address2")  
           output.Add("", "City")  
           output.Add("", "State")  
           output.Add("", "Zip")  
           Return output  
      Else  
           input = input.Replace(",", "")  
           input = input.Replace(" ", " ")  
           input = input.Replace(" CR ", " County Road ")  
           input = input.Replace(" SR ", " State Road ")  
           Dim splitString() As String = Split(input)  
           Dim streetMarker() As String = New String() {"street", "County Road", "Sate Road", "st", "st.", "avenue", "ave", "ave e", "ave.", "blvd", "blvd.", "highway", "hwy", "hwy.", "box", "road", "plz", "madera", "walk", "bend", "rd", "rd.", "lane", "ln", "ln.", "circle", "circ", "cir", "circ.", "court", "ct", "ter", "ct.", "dr", "trl", "pl", "sr", "cr", "gln", "loop", "way", "xing", "us", "cv", "run", "bluff", "pky", "trce", "palm", "trace", "business", "fairway", "broadway", "island", "manor", "overlook", "oak", "majestic"}  
           Dim address1 As String  
           Dim address2 As String = ""  
           Dim city As String  
           Dim state As String  
           Dim zip As String  
           Dim streetMarkerIndex As Integer  
           Dim flag As Boolean  
           zip = splitString(splitString.Length - 1).ToString()  
           flag = TestRegExp("^\d{5}(?:[-\s]\d{4})?$", zip)  
           If flag = True Then  
                state = splitString(splitString.Length - 2).ToString()  
                streetMarkerIndex = getLastIndexOf(splitString, streetMarker) + 1  
                Dim sb As New StringBuilder  
                For counter As Integer = streetMarkerIndex To splitString.Length - 3  
                     sb.Append(splitString(counter) + " ")  
                Next counter  
                city = RTrim(sb.ToString())  
                Dim addressIndex As Integer = 0  
                For counter As Integer = 0 To streetMarkerIndex  
                     If IsNumeric(splitString(counter)) _  
                     Or splitString(counter).ToString.ToLower = "po" _  
                     Or splitString(counter).ToString().ToLower().Replace(".", "") = "po" Then  
                          addressIndex = counter  
                          Exit For  
                     End If  
                Next counter  
                sb = New StringBuilder  
                For counter As Integer = addressIndex To streetMarkerIndex - 1  
                     sb.Append(splitString(counter) + " ")  
                Next counter  
                address1 = RTrim(sb.ToString())  
                output.Add(address1, "Address1")  
                output.Add(city, "City")  
                output.Add(state, "State")  
                output.Add(zip, "Zip")  
                Return output  
           Else  
                output.Add(input, "Address1")  
                output.Add("", "Address2")  
                output.Add("", "City")  
                output.Add("", "State")  
                output.Add("", "Zip")  
                Return output  
           End If  
      End If  
 End Function
 'Correct address using ZP4 software  
 Public Function parseZP4(ByVal address As String, ByVal city As String, ByVal state As String) As StringBuilder  
      Dim addstring As String  
      Dim session As Integer = 0  
      ZP4.ZP4StartSession(session)  
      If (session = 0) Then  
           Console.WriteLine("Session allocation failed!")  
      End If  
      'ZP4.ZP4TimeLimit(session, 0)  
      If (ZP4.ZP4InputOrder(session, "Address" & vbTab & "City" & vbTab & "State") = 0) Then  
           Console.WriteLine("Invalid input list!")  
      End If  
      If (ZP4.ZP4OutputOrder(session, "Address (final)" & vbTab & "City (final)" & vbTab & "State (final)" & vbTab & "ZIP (final)" & vbTab & "ZIP (five-digit)" & vbTab & "ZIP (four-digit add-on)" & vbTab & "Error message") = 0) Then  
           Console.WriteLine("Invalid output list!")  
      End If  
      Dim sb As StringBuilder = New StringBuilder(5000)  
      ' mutable string buffer for result  
      addstring = address.Trim & vbTab & city.Trim & vbTab & state.Trim  
      If (ZP4.ZP4Correct(session, addstring.Trim, sb) = 0) Then  
           Console.WriteLine("Correction call failed!")  
      End If  
      'Dim s() As String = sb.ToString.Split(vbTab)  
      Return sb  
 End Function 
 'Correct address using Google Geocoding API  
 Public Function GetGoogleAddress(ByVal address1 As String, ByVal city As String, ByVal state As String) As Hashtable  
      Dim addressxml As String = String.Empty  
      Dim zipCode As String = String.Empty  
      Dim m_hashTable As New Hashtable()  
      m_hashTable.Clear()  
      Try  
           address1 = address1.Replace(" CR ", " County Road ")  
           address1 = address1.Replace(" SR ", " State Road ")  
           'Making 0.5 second delay on API call  
           System.Threading.Thread.Sleep(500)  
           'Create an object of web client  
           Dim wsClient As New WebClient()  
           'Construct the URL concating the address values with it  
           Dim zipcodeurl As String = "?address={0},+{1},+{2}&sensor=false"  
           'Here in constructing the URL sensor(is required) indicates whether or not the geocoding request comes from a device with a location sensor.  
           Dim url As String = "http://maps.googleapis.com/maps/api/geocode/xml" & zipcodeurl  
           url = [String].Format(url, address1.Replace(" ", "+"), city.Replace(" ", "+"), state.Replace(" ", "+"))  
           'Download the data in XML format as string by making a web request  
           addressxml = wsClient.DownloadString(url)  
           'Check if status is OK then proceed  
           If addressxml.Contains("OK") Then  
                'Check if postal_code section is there in the string then proceed  
                If addressxml.Contains("postal_code") Then  
                     Dim xmlDoc As New XmlDocument()  
                     xmlDoc.LoadXml(addressxml)  
                     Dim m_nodelist As XmlNodeList  
                     Dim node As XmlNode  
                     'Get the list of all address_companent nodes as this component only contans the address information  
                     'm_nodelist = node.SelectNodes("/GeocodeResponse/result/address_component")  
                     node = xmlDoc.SelectSingleNode("/GeocodeResponse/result")  
                     m_nodelist = node.SelectNodes("address_component")  
                     'From each component check for the type section for getting the particular postal_code section  
                     Dim gcount As Integer = 0 'Just to read first entry of google map  
                     For Each m_node In m_nodelist  
                          'Get the zipLongName Element Value  
                          Dim LongName = m_node.ChildNodes.Item(0).InnerText  
                          'Get the zipShortName Element Value  
                          Dim ShortName = m_node.ChildNodes.Item(1).InnerText  
                          'Get the zipType Element Value  
                          Dim Type = m_node.ChildNodes.Item(2).InnerText  
                          'If the type of the component is postal_code then get the postal code as zipLongName  
                          If Type = "street_number" Then  
                               m_hashTable.Add("street_number", LongName)  
                          End If  
                          If Type = "route" Then  
                               m_hashTable.Add("street", LongName)  
                          End If  
                          If Type = "locality" Then  
                               m_hashTable.Add("city", LongName)  
                          End If  
                          If Type = "administrative_area_level_1" Then  
                               m_hashTable.Add("state", ShortName)  
                          End If  
                          If Type = "postal_code" Then  
                               zipCode = LongName  
                               m_hashTable.Add("zip", LongName)  
                          End If  
                          gcount = gcount + 1  
                     Next  
                End If  
           Else  
                Console.WriteLine("Invalid address or Reached the Google API Quota Limit")  
           End If  
      Catch ex As WebException  
           Console.WriteLine(ex.ToString())  
      End Try  
      Return m_hashTable  
 End Function 
 'Validate address using Bing Geocoding API  
 Public Function GetBingAddress(ByVal address1 As String, ByVal city As String, ByVal state As String) As Hashtable  
      'Reference:http://msdn.microsoft.com/en-us/library/ff701714.aspx  
      Dim addressxml As String = String.Empty  
      Dim zipCode As String = String.Empty  
      Dim m_hashTable As New Hashtable()  
      m_hashTable.Clear()  
      Try  
           address1 = address1.Replace(" CR ", " County Road ")  
           address1 = address1.Replace(" SR ", " State Road ")  
           System.Threading.Thread.Sleep(500)  
           'Create an object of web client  
           Dim wsClient As New WebClient()  
           'Construct the URL concating the address values with it  
           Dim keyurl As String = "&output=xml&key=BINGKEY"         
           'Here in constructing the URL sensor(is required) indicates whether or not the geocoding request comes from a device with a location sensor.  
           Dim url As String = " http://dev.virtualearth.net/REST/v1/Locations?q=" & address1.Trim & " " & city.Trim & " " & state.Trim & keyurl.Trim  
           'Download the data in XML format as string by making a web request  
           addressxml = wsClient.DownloadString(url)  
           'Check if status is OK then proceed  
           If addressxml.Contains("OK") Then  
                'Check if postal_code section is there in the string then proceed  
                If addressxml.Contains("PostalCode") Then  
                     Dim xmlDoc As New XmlDocument()  
                     xmlDoc.LoadXml(addressxml)  
                     Dim nsmgr As New XmlNamespaceManager(xmlDoc.NameTable)  
                     nsmgr.AddNamespace("rest", "http://schemas.microsoft.com/search/local/ws/rest/v1")  
                     'Get formatted addresses: Option 1  
                     'Get all locations in the response and then extract the formatted address for each location  
                     Dim locationElements As XmlNodeList = xmlDoc.SelectNodes("//rest:Location", nsmgr)  
                     Dim countAddress As Integer = 0  
                     For Each location As XmlNode In locationElements  
                          If countAddress < 1 Then  
                               Try  
                                    Dim formatedAddress As String = location.SelectSingleNode(".//rest:FormattedAddress", nsmgr).InnerText  
                                    Dim splitfAdress() As String = formatedAddress.Split(",")  
                                    m_hashTable.Add("street", location.SelectSingleNode(".//rest:AddressLine", nsmgr).InnerText)  
                                    m_hashTable.Add("city", splitfAdress(1))  
                                    m_hashTable.Add("state", location.SelectSingleNode(".//rest:AdminDistrict", nsmgr).InnerText)  
                                    m_hashTable.Add("zip", location.SelectSingleNode(".//rest:PostalCode", nsmgr).InnerText)  
                                    countAddress = countAddress + 1  
                               Catch ex As Exception  
                                    Console.WriteLine(ex.ToString)  
                               End Try  
                          End If  
                     Next  
                End If  
           Else  
                Console.WriteLine("No OK on XML or Over Quota")  
           End If  
      Catch ex As WebException  
           'Console.WriteErrorLog(ex, "GetZipcode", "CompanyDetailViewerForm")  
           Console.WriteLine(ex.ToString())  
      End Try  
      Return m_hashTable  
 End Function 
 'Get Corrected address using mutually exclusive sequence of ZP4.dll,Google Geocoding API & Bing API  
 Public Function GetCorrectedAddress(ByVal address As String) As Hashtable  
      Dim googleAddress, bingAddress As Hashtable  
      Dim returnAddress As New Hashtable  
      Dim pAddress As Collection  
      Dim zp4address As StringBuilder  
      Dim zpaddress As String()  
      Dim pcity, pstate, pzip, pstaddress, zip4 As String  
      Dim zp4ParseStatus As String = ""  
      Dim ApiParseStaus As String = ""  
      Dim tempAddress As String  
      Dim oldParcelId As String = ""  
      Dim City, State, ZipCode, StreetAddress, ZipCode4 As String  
      returnAddress.Clear()  
      tempAddress = address  
      pAddress = parseAddress(tempAddress.Trim)  
      pcity = pAddress.Item("City")  
      pstate = pAddress.Item("State")  
      pzip = pAddress.Item("Zip")  
      pstaddress = pAddress.Item("Address1")  
      zp4address = parseZP4(pstaddress, pcity, pstate)  
      zpaddress = zp4address.ToString.Split(vbTab)  
      City = zpaddress(1)  
      State = zpaddress(2)  
      ZipCode = zpaddress(4)  
      StreetAddress = zpaddress(0)  
      ZipCode4 = zpaddress(5)  
      zip4 = zpaddress(3)  
      zp4ParseStatus = zpaddress(6)  
      If ((zpaddress(0) = "0") Or (zpaddress(0) = "NO") Or (zpaddress(0) = "")) Then  
           City = "Invalid"  
           State = "Invalid"  
           ZipCode = "Invalid"  
           StreetAddress = "Invalid"  
           ZipCode4 = "Invalid"  
      End If  
      'If ZP4 couldn't find the address  
      If ((zip4 = " ") Or (zip4 = "") Or (zpaddress(0) = "0") Or (zpaddress(0) = "NO") Or (zpaddress(0) = " ") Or (zpaddress(0) = "") Or (zpaddress(0) = "no") Or (zpaddress(0) = "No")) Then  
           'parse using Bing Geocoding API  
           bingAddress = GetBingAddress(tempAddress, "", "FL")  
           If (bingAddress.Count < 4) Then  
                'parse using Google Geocodign API  
                googleAddress = GetGoogleAddress(tempAddress.Trim, "", "FL")  
                Try  
                     If (googleAddress.Count < 5) Then  
                          ApiParseStaus = "Invalid Address:Google & Bing "  
                     Else  
                          City = googleAddress("city")  
                          State = googleAddress("state")  
                          ZipCode = googleAddress("zip")  
                          StreetAddress = googleAddress("street_number") & " " & googleAddress("street")  
                          ZipCode4 = ""  
                          ApiParseStaus = "Parse Address:Google "  
                     End If  
                Catch ex As Exception  
                     Console.WriteLine("Invalid Address:Google ")  
                End Try  
           Else  
                City = bingAddress("city")  
                State = bingAddress("state")  
                ZipCode = bingAddress("zip")  
                StreetAddress = bingAddress("street")  
                ZipCode4 = ""  
                ApiParseStaus = "Parse Address:Bing"  
           End If  
      End If  
      returnAddress.Add("city", City)  
      returnAddress.Add("state", State)  
      returnAddress.Add("zip", ZipCode)  
      returnAddress.Add("zip4", ZipCode4)  
      returnAddress.Add("street", StreetAddress)  
      returnAddress.Add("googlebingstatus", ApiParseStaus)  
      returnAddress.Add("zp4status", zp4ParseStatus)  
      Return (returnAddress)  
 End Function  


References

Thanks to all programmers and bloggers for their nice articles, replies, and code samples without them this post would not be possible.


Bing Maps API , Geocoding , Google API , Google Maps , VB.Net , ZP4

6 comments :

  1. "[ZP4] Must provide at least two fields out of City, State ,and Zip" is not true. Only a city or only a ZIP is sufficient as input, as is only a county.

    ReplyDelete
  2. Also, you may want to check out AccuZIP's API and Web Service for real-time address correction.

    http://www.accuzip.com/products/data_quality/index.htm

    ReplyDelete
  3. ZP4 is good software but user support is non-existent. Good luck getting them to even answer and email and the user manual is less than helpful. Many other more user friendly options out there.

    ReplyDelete
    Replies
    1. And that is founded in 1981. I bet it is started by 1 guy who is still making money selling the CDs of ZP4.

      Delete
  4. OP/anyone: can you help me track down old ZP4 CDs? I'm working on a project where I need to match addresses coded as ZIP+4 to their street address ranges to geocode them, for each year 1999-most recent. Would be great to have 1 CD per year (or per month)!

    ReplyDelete

 

© 2011 GIS and Remote Sensing Tools, Tips and more .. ToS | Privacy Policy | Sitemap

About Me