3.3: Map Preprocessing
- Page ID
- 44909
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Map preprocessing functions are housecleaning tasks that make the data you input into the GIS usable for data analysis. The objective is to get all of your GIS datasets into the same projection, and then to make each layer spatially in tune with each other. Many map-preprocessing tasks do just and include reprojection, georeferencing, resampling, reclassification, and edge matching. In addition, verifying, editing, and manipulating your map features are part of this chapter as well.
Reprojection: Changing Projections, Coordinate Systems, and Datums
All of your project’s feature layers must be in the same projection and coordinate system if you intend to use them for analysis or map production. Both raster and vector GIS programs allow you to convert layers of features from one projection, coordinate system, and datum to another. In vector systems, it involves translating the x and y coordinates of all the features to new coordinates. In raster systems, it involves coordinate translation and resampling the pixels of one image into a new image (in some raster-based systems, changing projections is called resampling). For both vector and raster systems, the processes are not error free and datasets that are repeatedly translated back and forth compound errors.
All GIS programs have projection utilities that allow you to change your layer’s projection, coordinate system, and datum. When reprojecting data, you need to know both the existing and the output projection parameters (parameters include projection, coordinate system, and datum). Existing projection information is found in the layer’s metadata if it exists. If metadata does not exist, you need to speak with someone who created or at least uses the data set. As for the output projection parameters, presumably, you know these (this should be determined in the planning phase). When reprojecting, many programs give you the option to import your projection parameters by selecting an existing GIS layer that already uses them. If you choose this option, the GIS program takes the selected layer’s parameters and establishes them in the reprojected layer. This saves time especially if you have multiple layers to reproject.
Georeferencing
Any scanned image can be entered into a GIS, but to be useful, the image needs to be placed in its proper geographic location. Georeferencing aligns images to their spatial location. This process is common due to the popularity of “heads up” digitizing (described in Chapter 2).
Georeferencing is typically done by aligning the image to existing projected feature layers that are in their correct position. Since any scanned image is fundamentally a matrix of pixels, georeferencing the raster layer involves moving and stretching this matrix so that it rests at its true location (see Figure 3.7). To do this, you need to load the unprojected image and the projected feature layers and—in order—select corresponding control points, which are locations you can distinguish on both the image and the feature layers (left map in Figure 3.7). For greater accuracy, select as many control points as possible and make sure they are scattered throughout the image. If they are clustered in a corner of the unprojected image, only that part of the image will be georeferenced properly.

Figure 3.7: Georeferencing an image to its real-world location. The left map displays both the image we want to move and the area's parcels. The red arrows represent the corresponding points where you want to move points in the image. When the georeferencing is complete, the image will be at its real-world location.
Georeferencing assigns coordinate information about where the image rests in relation to the Earth’s surface. When you save your georeferenced image, a “world” file is created. This is an ASCII text file that has the exact name of your image file but with a different yet related file type. For example, if you have a TIFF image called Mexelev.tif, the world file (called a GeoTiff file) will be Mexelev.tfw. The “w” at the end of the file type denotes its status as a world file. Most GIS software packages are able to interpret these files and display the images in their proper location as long as the file names are the same and the two files are located in the same directory.
The first column of Figure 3.8 is an example of a world file. It has six lines with locational values. The second column describes what the six locational values are, and it is not contained in the world file.

Figure 3.8: Typical world file format.
Resampling
As briefly mentioned above, resampling changes raster layers from one projection to another, but it can also be used to transform the resolution of raster images. For example, resampling can convert each 2 by 2 array of pixels (4 pixels total) into a single but geographically larger pixel. To accomplish this, it changes the pixel’s attribute values with mathematical formulas to best approximate the attribute values for the new layer. For instance, it might average the four numeric values and place the mean in the single resampled pixel that replaces them in the new image. In Figure 3.9 below, the image to the right is a generalized resampling of the image to the left. Resampling is important if you are working with multiple raster images with varying resolutions. You need to translate your images to a common resolution (much like a common projection) to analyze them.

Figure 3.9: Resampling. The original image on the left had a pixel resolution of 250 meters. The resampled image on the right has a resolution of 2500 meters.
Reclassification
Reclassification generalizes values in a raster layer to highlight broader classes. This popular preprocessing technique re-assigns values in an input raster layer to create a new, more generalized, raster layer. Reclassification changes pixel values based on a criterion that you specify. In Figure 3.10, a raster image that denotes land covers is reclassed into two values. Reclassifying the database may reveal broader patterns by removing the layer’s unique classes. Reclassification is also commonly used to convert interval and ratio attribute pixel data into ordinal data used in the overlay process.

Figure 3.10: Reclassification. The image on the left depicts 4 different land covers. The image on the right aggregates land covers (D and R become D; P and W become U) into two classes. Image by Mike Tuck.
Edge Matching
When side-by-side map layers are retrieved and displayed, they might not line up well with each other (see Figure 3.11 below). Edge matching adjusts the location of features that extend across one map’s boundaries into another.
Edge matching requires your input in matching together the common edge of the two maps. The features that you believe are positioned correctly are usually “anchored” down, and the remainder of the map is moved, stretched or contracted like a sheet of rubber to line up the features on the maps. The map features, except those that are anchored, are spatially adjusted.
Which map features should you anchor down and which should be stretched? It is not an easy question. The answer might be found in the layer’s metadata. Perhaps one layer was entered at a coarser (less accurate) scale or with less precision. If, however, the properties and author of the two layers are identical, you should use a third layer (perhaps a georeferenced aerial photograph) that you have some confidence in to check the positional accuracy of the features within these two layers. When all else fails, you might have the features split the difference.

Figure 3.11: Edge matching. Sometimes two maps that should fit up against each other do not. Edge matching manipulates one of both of them until they do.
Conflation
Conflation is similar to edge matching with one difference: It does not rectify the placement of features across maps. Instead, it tries to rectify feature locations within a single raster image. For that reason it is also referred to as rubber sheeting. It is an interactive process where you tack down features that are positioned correctly and move the remainder to more accurate locations.
Tiling
Sometimes workspaces get large; geographically they can be vast and thematically numerous. Tiling involves breaking up your workspace into more manageable and logical geographic subunits. Tiling subdivides existing layers (both the geography and the attributes) by geographic units. Figure 3.12, displays a portion of the U.S.G.S. topographic map grid across California. Tiling can be done by splitting existing layers with the larger geographic boundary or it can be planned from the outset of the GIS project. The GIS then maintains a library of all the tiles that represent the project area.

Figure 3.12: U.S.G.S. quadrangle maps for a portion of California.
Vectorization and Rasterization
These two common processes switch feature layers between vector and raster. For example, you might digitize data into a vector format but want to use it in a raster form. Vector layers are converted to raster by a process known as rasterization (see A in Figure 3.13). Alternatively, raster data can be converted to a vector layer through vectorization (see B in Figure 3.13).
Like any translation, it is not error free. Think, for example, of converting points from vector to raster. Each precise point location in the vector layer swims in the pixel of the new raster layer in which it now belongs. The precise spatial locations of the points are lost because the points now reside in much larger areas, which are determined by the pixel’s resolution. Now convert the new raster point layer back to vector. The points in the resultant vector layer are located at the center of each pixel in which they were contained. Comparing the new and original vector layers, you would see that they resemble each other but do not line up exactly.

Figure 3.13: From vector to raster to vector again. Converting a vector file to raster is called rasterization (process A). Converting from raster to vector is vectorization (process B). Notice the differences between the first and third maps, both vector layers.
Coordinate Thinning
Coordinate thinning (also known as map generalization) generalizes or “smoothes” feature shapes by removing nodes (vertices) from line and polygon features. It reduces layer storage size, and it can be used to remove unwanted detail from map features. Sometimes detail held in a layer is not always appropriate for a small-scale map. For example, in the top map of Figure 3.14, notice how the detail of portions of the coastline resembles ink blobs because of the amount of coastal detail. The detail may take away from the map’s purpose. If one were to enlarge the image, the detail might be welcomed. By thinning the vertices along the coast, the map becomes simpler and clearer (bottom map in Figure 3.14). Perhaps this map is too generalized at this scale. Notice that a couple of islands have disappeared.

Figure 3.14: Coordinate thinning. Notice the differences in detail.
Topological Functions
Topology, the spatial relationships among features, focuses on where features are in relation to one another and how they are related to one another. Focusing on how features relate to each other, topological functions (a semi-automated process) help you clean up your layer’s spatial errors and determine what parts of different features are shared, contained, or connected to other features. In other words, these functions build topology. Most vector-based systems provide routines that help you find the following common topological problems (see Figure 3.15):
- Slivers, the most common topological problem, are small polygons that occur when either shared boundaries are entered separately for contiguous polygons or when the features of two layers are overlaid but do not match precisely. Topological functions can remove many of these slivers and reconcile common boundaries.
- Overshoots and undershoots usually occur when features are entered without the aid of a snapping routine. The feature vertices extend beyond (overshoots) or just short of (undershoots) their intended location. Topological functions can clean up these errors when you define a distance tolerance. If within the distance, the overshoot or undershoot snaps the vertex of one feature to the vertex of another feature.
- Redundancy occurs when two or more features in the same layer share the same node (vertex) or line but the layer duplicates these nodes (vertices) and lines. Layers should only store one node or line, which prevents duplication that could lead to errors. Most GIS programs have automatic elimination routines to eliminate duplicates.

Figure 3.15: Typical topological errors.


