Home

Table of Contents

Instructors

Help

Geocoding Overview

[ Credit ]

Objectives

  • In this section, you will learn the basic process behind geocoding and the various software and data components involved in the process.

The process of geocoding is rather straight-forward in concept. You begin with an address that you would like to locate on a map. Let's say that the address you are looking for is 123 Main Street in Los Angeles, California. The geocoder's first job is to locate the database in which it will try to find the address. In Atlas GIS, that database is stored in the Address Finder CD that accompanies Atlas. Alternately, you can also use the extended geocoder CD produced by RPM Consulting.

What the Geocoding Database contains:

The geocoding database is essentially a list of every known street segment in the U.S. For each street segment, the following information has been compiled:

1. Street Name and alternate street name for each street.

2. The address range for each side of the street.

3. The street type.

4. The zip code(s) on the left and right side of the street.

5. State, County, Census Tract, and Block Group designation(s) for the left and right side of the street

6. Coordinate pairs (latitude and longitude) for the begining and ending points of the street segment.

Interpreting Addresses

In order for a geocoder to determine whether an address matches up to a street segment in its database, it must first break the address down into its components. These components (called "tokens" by programmers) represent the following items that must be extracted from an address:

  • Street number
  • Street name
  • Apartment or Unit Number
  • Directional Prefix
  • Directional Suffix
  • Type of Street

Since street address information is usually contained in a single field, the geocoder must use a combination of logic tests and lookup tables to determine what is contained in a particular address field. Take the following address, for example:

123 N. Main St. #30 Los Angeles, CA 90010

In this example, the geocoder must be able to determine that the "N." represents a directional prefix indicating North. Likewise, the name of the street must be identified as Main, and the type of street must be identified as a Street (as opposed to an Avenue, Blvd., etc.). Finally, the geocoder needs to identify that the #30 relates to a unit number or apartment number.

Now, let's look at the following address:

123 Main Street North, Suite 30

In this example, the address is geographically the same as in the prior example. Differences in how the address is written, however, make it look quite different. Through these examples, you can begin to see the amount of effort involved in breaking down all of the potential combinations of abbreviations and nomenclature that make up an address. Consider, for example, all of the different ways to abbreviate Boulevard:

Boulevard
Blvd.
Bld.
Bl.
Bvd.
Bd.

Translation Tables

The way the Atlas Geocoder is able to keep track of all of the possible abbreviations and ways to spell address components is through the use of translation tables. Translation tables allow you to specify what certain abbreviations and spellings equate to. Take, for example, this section of an Atlas translation file that standardizes street types. The values in the left column represent all of the possible ways in which street types may be abbreviated in an incoming address database. The column on the right, indicates what abbreviation they should be standardized to in order to geocode.

In this example, you can see that 7 possible spellings/abbreviations have been specified for the street type of Highway. For each of these possibilities, the right column indicates that the correct abbreviation should be Hwy.

Using this process, if the geocoder encountered the following address 5635 Sierra Hway, it would be able to determine that the correct spelling of the address is really: 5635 Sierra Hwy.

Overall, there are 4 translation tables used by the Atlas Geocoder:

Translation Table Description
Address.Trn Details the various possible spellings for numeric street names, such as 1st, First, etc.
AdParse.Trn Provides specifications for address conventions dealing with intersections (1st & Main versus 1st at Main, etc. In addition, AdParse contains possible spellings for unit numbers, building floors, P.O. Boxes and other site-specific address components.
StrDir.Trn The Street Direction table contains valid street direction phrases and their standard abbreviations.
StrType.Trn The Street Type table contains valid street type phrases and their standard abbreviations.

All of the translation tables are contained in the \AtlasGIS\Geocode directory and each can be modified in any text editor, if you wish to add or change any of the contents. To learn more about modifying these tables, please review the course module entitled Translation Tables.

Making the Match

Once the geocoder has successfully parsed the address into its individual components, the next step is to find the street segment where the address is located. This entails looking through the Geocoder's street database to find the record in which the street name, type, and direction match and where the address is contained within the address range of the matching street segment.

Let's take our earlier example of:

123 N. Main St. #30. Los Angeles, CA 90010

When the Geocoder looks into the street database, it begins by first narrowing down the streets to only those in the State and Zip Code specified in the address.

Next, it locates all of the streets which meet the following criteria:

Name = Main
Street type = St.
Direction = N.

Finally, it finds the segment that containing the address range that includes the address you are looking for (in this example, we are looking for the address of '123'). Of the following street segments, the 2nd segment would be correct since '123' is contained within the range 101 - 199 (the side of the street with odd address numbers as opposed to the even numbers).

Name Type Direction Beg. Range End Range
Main St. N. 100 198
Main St. N. 101 199
Main St. N. 200 298

Assigning Geocodes

Once the correct street segment has been found, the next step is to assign the correct geocodes to the address record. Regional geocodes such as state, county, census tract, and block group FIPS codes are assigned directly from the street record in the geocode database.

Coordinate pairs (latitude and longitude), however, must be estimated. This is because each street segment in the geocoding database contains only the latitude and longitude for the begining and ending points of the street segment (not the points in between). In order to determine the coordinate pair for the address being geocoded, the latitude and longitude are interpolated based on where on the street segment the address is likely to fall.

123 Main St., for example, will fall approximately 23% of the way along the segment from the beginning to the end.


<- Back Next ->