From a couple of months, I have been involved in Tax certificate address verification project for educational purpose; I have around 8 million parcel address collected and maintained by different tax collecting agencies in various format for instance COBOL ,CSV, ASCII, Excel , and some are even in pdf for their tax districts.
My task was to find the address from those different files and export them in CSV with corrected standard USPS address format (STREET, CITY, STATE, and ZIP) to make Universal Tax File format for data analysis by removing unwanted data from the address rows. Initially, I used ZP4 software, it has “official United States Postal Service data files on a single DVD-ROM that provides a powerful tool for automatically determining the correct mailing address, ZIP + 4 code, and mail carrier route number for any location in the United States”, works fine for few address (about 15%) which has proper USPS standard address format. The real challenge starts with remaining 85% of the parcel address that has no USPS standard address format (either address without city and zip information or the vacant parcel lots that cannot accept mails and therefore not entered in the USPS mailing address database).
Looking around the Web, Google leaves me with tons of pages entitled on address correction and validation including close source and open source software APIs, tools, and datasets, but hard to figure out which method works for my project in terms of address quality(address was poorly written), quantity(8 million addresses), and resources (cheap and easy to implement) . After spending few days here and there, finally I come up with a solution that works for me, an integrated address correction strategy using ZP4, Google Geocoding API, and Bing Maps API.
Here, I am going to share how ZP4, Google and Bing integration makes my job easy and relatively cheap to validate and correct US physical street addresses.