Geocoding Overview

Overview

Geocoding is the cross-referencing between geographic spatial data (features with coordinates relative to a standard reference system) and non-geographic data (such as addresses and postal codes).

Geocoding brings geographic data and tabular data together based on a common geographic unit of analysis. A geographic unit of analysis refers to the spatial characteristic within the data that is necessary to locate the data on a map. Such units include addresses (street level) and postal codes (postal district level). This spatial characteristic is common to the geographic data and the non-geographic tabular data. The tabular data also can contain extra, non-geographic information: e.g., telephone numbers, fax numbers, e-mail and website information.

The following diagram shows an overview of the geocoding process.

Figure: Geocoding Overview

How to Geocode

There are five basic steps in the geocoding process:
  1. Preparing files for geocoding;
  2. Standardizing geocoding parameters for the geocoder;
  3. Geocoding;
  4. Reviewing the results;
  5. Resetting geocoding parameter specifications and re-geocoding if results are not satisfactory.

Figure: Geocoding Process

1. Preparing Files for Geocoding

The first step in the geocoding process involves the sometimes extensive preprocessing of both tabular address data and geographical data in order to maximize the match rates.

The tabular data should be parsed into individual fields that have a clear function in the address data (e.g., postal codes, street names and house numbers). The parsing can include adding abbreviations of address elements to match the reference data.

For the geographical data from the reference database, an address style is defined and the fields are assigned according to that style.

Example: If the address style is "U.S. Streets" of the form "house number street name," then assign a column "STREET" from the reference database to the address style component "street name" and a column "HNR" to the component "house number." To speed up the process of finding the address from the tabular data in the reference data, build geocoding indexes.

The key to successful geocoding of the reference data is to include all reference data that is to be assigned to the geocoded addresses. For example, the geocoding tool provided with the ESRI® ArcView® package expects the user to define the address style of the reference data and specify the address field of the tabular data.

2. Standardizing Geocoding Parameters

The successful matching of the geographic units from each source, reference data and tabular data, can be fine-tuned through the specification of several parameters of both tabular data and reference data. Whether a match is successful depends on the needs of an end user. Accuracy, product costs, maintenance costs, utility and compatibility are some of the factors that determine whether a match is considered successful. Many times, information is geocoded against a street reference layer.

The configuration of the parameters-e.g., address style, offset distance-depends on the GIS or geocoding software. The objective is to obtain the optimal geocode match rate.

3. Geocoding

The geocoding process, which is performed by GIS or geocoding software, involves the placement of the tabular data in relation to the reference geographic layer. When the reference layer is a Street layer, the geocoded point is "interpolated."

Address interpolation refers to a calculated address value assigned by evenly distributing an address range along a street element. Address values at nodes are thus based on percentage of length along the chain.

Range-based geocoding locates addresses by interpolation within the address ranges.

Figure: Range-Based Interpolated Addresses

4. Reviewing the Results

Most GIS or geocoding software generate a geocoding outcome that indicates the success of the geocoding process. It is now up to the user to interpret these results and to decide whether or not these results are acceptable. Possible geocoding problems generally occur because of missing or incorrect address ranges or missing/incorrect street information in the geodatabase or in the address lists.

5. Resetting Geocoding Parameters

If the geocoding results are unsatisfactory, then the user may want to try other parameter settings; for example, lowering the spelling sensitivity. With these new settings the geocoding process is repeated.