How to Use MultiNet Shapefile and OSL |
Geocoding is the cross-referencing between geographic spatial data (features with coordinates relative to a standard reference system) and non-geographic data (such as addresses and postal codes).
Geocoding brings geographic data and tabular data together based on a common geographic unit of analysis. A geographic unit of analysis refers to the spatial characteristic within the data that is necessary to locate the data on a map. Such units include addresses (street level) and postal codes (postal district level). This spatial characteristic is common to the geographic data and the non-geographic tabular data. The tabular data also can contain extra, non-geographic information: e.g., telephone numbers, fax numbers, e-mail and website information.
Figure: Geocoding Overview
Figure: Geocoding Process
The first step in the geocoding process involves the sometimes extensive preprocessing of both tabular address data and geographical data in order to maximize the match rates.
The tabular data should be parsed into individual fields that have a clear function in the address data (e.g., postal codes, street names and house numbers). The parsing can include adding abbreviations of address elements to match the reference data.
For the geographical data from the reference database, an address style is defined and the fields are assigned according to that style.
Example: If the address style is "U.S. Streets" of the form "house number street name," then assign a column "STREET" from the reference database to the address style component "street name" and a column "HNR" to the component "house number." To speed up the process of finding the address from the tabular data in the reference data, build geocoding indexes.
The key to successful geocoding of the reference data is to include all reference data that is to be assigned to the geocoded addresses. For example, the geocoding tool provided with the ESRI® ArcView® package expects the user to define the address style of the reference data and specify the address field of the tabular data.
The successful matching of the geographic units from each source, reference data and tabular data, can be fine-tuned through the specification of several parameters of both tabular data and reference data. Whether a match is successful depends on the needs of an end user. Accuracy, product costs, maintenance costs, utility and compatibility are some of the factors that determine whether a match is considered successful. Many times, information is geocoded against a street reference layer.
The configuration of the parameters-e.g., address style, offset distance-depends on the GIS or geocoding software. The objective is to obtain the optimal geocode match rate.
The geocoding process, which is performed by GIS or geocoding software, involves the placement of the tabular data in relation to the reference geographic layer. When the reference layer is a Street layer, the geocoded point is "interpolated."
Address interpolation refers to a calculated address value assigned by evenly distributing an address range along a street element. Address values at nodes are thus based on percentage of length along the chain.
Figure: Range-Based Interpolated Addresses
Most GIS or geocoding software generate a geocoding outcome that indicates the success of the geocoding process. It is now up to the user to interpret these results and to decide whether or not these results are acceptable. Possible geocoding problems generally occur because of missing or incorrect address ranges or missing/incorrect street information in the geodatabase or in the address lists.
If the geocoding results are unsatisfactory, then the user may want to try other parameter settings; for example, lowering the spelling sensitivity. With these new settings the geocoding process is repeated.