5.2: Selection and Measurement
- Page ID
- 44919
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The selection part of this category barely justifies its placement in this chapter. Selection is not an analysis function, but it is an important first step for many analysis functions. Due to its heavy use in the analytical phase, however, it is included. The following two selection processes, attribute query and spatial selection, have been discussed to some degree earlier in this text. Measurement, the second part of this category, is easier to justify as an analytical process because numbers that describe features are generated by these functions.
Attribute Query (Boolean Selection)
As described in Chapter 4, attribute query selects features based on their attribute values. It involves picking features based on query expressions, which use Boolean algebra (and, or, not), set algebra (>, <, =, >=, <=), arithmetic operators (=, -, *, /), and user-defined values. Simply put, the GIS compares the values in an attribute field with a query expression that you define. For example in Figure 5.2, if you want to select every restaurant whose price is considered inexpensive, you would use a query expression like “PRICE = $” (where “PRICE” is the attribute field under investigation, “=” is the set algebra operator, and “$” is the value). Your software then looks for a value equal to $ in the price field of each record, and selects only those records that satisfy the equation. In Figure 5.2, fifty out of 112 restaurants fit the query expression and are selected within the attribute file. They are simultaneously highlighted on the map.

Figure 5.2: Selecting by attribute. In this example, restaurants are selected on their price being inexpensive (PRICE = ‘$’). The results are displayed both on the map and in the attribute table (highlighted in orange).
Attribute queries can be complex. Query expressions, like the one above, can be strung together to form long equations that could include any of the operators listed above and any number of existing attribute fields. Once the desired features are selected, you can perform a number of analytical processes on just the selected features, or, alternatively, you could save the highlighted features to a new layer.
Attribute query is a vector process, but reclassification (discussed as a preprocessing function in Chapter 3) is a similar raster-based process.
Spatial Selection (Spatial Searches)
While attribute queries select features by sorting through records in a data file, spatial selection chooses features from the map interface. In most cases, it selects features from one layer that fall within or touch an edge of polygon features in a second layer (or an interactively drawn graphic polygon). Figure 5.3 is an example that uses the same restaurant layer as the previous figure. Again, the first layer consists of restaurants, some of which one wants to select. The second layer is composed of polygons radiating out from points of interest. After the selection process, the customers falling within the overlaying polygons are highlighted (selected). Ninety out of 112 restaurants fit the query expression and are selected within the attribute file and simultaneously on the map.

Figure 5.3: Spatial selection. Only those restaurants that fall within the blue polygons are selected.
There are many types of spatial selection. Point in polygon, perhaps the most used, selects the points of one layer if they are contained within a selected polygon (or polygons) of a different layer (or graphic). Line in Polygon, a similar operation, selects line features that are wholly or partially contained within a different layer’s polygon. Polygon in Polygon is another variation that selects polygon features within (or overlapping) selected polygons from a second layer. Another type of spatial selection is point distance (which has line and polygon versions too), which identifies all the points in one layer that are within a specified distance of a selected point(s) in a different layer. Like any type of selection, you can perform analytical processes on those highlighted features or save them to a new layer.
You can mix spatial selections and attribute queries. Here you might spatially select features first, and then from the feature’s attribute file, you would reduce (or alternatively increase) the selected records through attribute query expressions. Figure 5.4 is an example using the same restaurant data as above. Thirty seven out of 112 restaurants fit the query expression and fall within the overlaying polygons. They are highlighted both within the attribute file and on the map.

Figure 5.4: Spatial and attribute selection combined. In this example, restaurants that fall within the blue polygons and are inexpensive (PRICE = $) are highlighted in orange.
Spatial selection is a vector process, but when you combine reclassification (discussed in Chapter 3) and overlay (discussed later in this chapter), the combination produces a raster version of spatial selection.
Measuring Distance
There are many ways to measure distance. Most GIS programs, both raster and vector, have a ruler button that allows you to measure distances across a map. After clicking the button, you point on the map where you want to begin your distance measurement and then click at the ending point (or intervening points that define the path you want to measure).
Many vector-based systems measure distances along existing vector line networks, like streets, sewers, and railroads. This type of distance measurement relies on topological network relationships, which are discussed later (see Connectivity Analysis). In addition, some vector systems automatically generate length measurements for line features as you enter them. They store the length result in an attribute field within the layer’s data file. Those systems that do not have this automatic function usually provide a way for you to calculate line feature length and store the result in an attribute field that you define. Once calculated and stored, you can sum the length of multiple line features by selecting them and calculating their sum (see Calculating Descriptive Statistics below).
Raster-based systems allow you to generate distance measurements in all directions away from a selected pixel or group of pixels. These distances are placed in a new layer where each cell’s value represents the distance from that cell to the nearest selected pixel. These “distance” layers are often used for spread functions (see Spread Functions below).
Measuring Area/Perimeter
Many vector systems automatically generate area and perimeter measurements for polygon features and store these values in prescribed attribute fields. Those systems that do not have this automatic function do provide a way for you to generate area and perimeter and store the results in user-defined fields. See Figure 5.5 for an example. Once calculated and stored, you can select multiple polygon features and sum their area and perimeter (see Calculating Descriptive Statistics below).

Figure 5.5: Area and perimeter contained as attributes in the layer’s data file.
Calculating areas and perimeters are done differently in raster systems. Instead of measuring and storing each polygon’s area and perimeter in the feature’s pixels, raster systems already know the size—the area covered—by a single pixel. To calculate area, it simply adds up the number of pixels with a specified attribute and multiples the count by the area contained in a single pixel. It is easy math. For example, your layer might have 100 polygons that possess one of twelve land cover categories. The routine finds each occurrence of the twelve categories (even if they are not contiguous) and sums the category’s area and perimeter. Perimeter is usually equally easy if the pixels are square, and in the vast majority of cases they are. These measurements are provided either in standard tables or in new layers where the pixels exhibit the sums of the area and perimeter of the category to which it originally belonged.
Calculating Descriptive Statistics
Descriptive statistics summarize attribute data. They reduce the complexities of numerous individual values into a few meaningful numbers that describe the individual features collectively. Descriptive statistics are organized into two groups: measures of central tendency and measures of dispersion.
Central tendency describes the center of the attribute data’s distribution. The mean, median, and mode are its three common measures, but which measure you should use depends largely on the attribute’s level of measurement (described in Chapter 2). Figure 5.6 depicts the three central tendency measures for the attribute values of a single field.
- The most used measure is the mean (commonly referred to as average), which is calculated by adding together each feature’s attribute value and dividing the sum by the number of features. For example, if you wanted to characterize the age of the people reading this e-text, you would sum the age of each reader and divide it by the number of readers. The result is the mean. It—like all measures of central tendency—is a surrogate used to describe all the values within a single attribute field. This measure requires interval or ratio data.
- If we placed the attribute values in ascending or descending order, the median is the middle score in the distribution (this works for an odd number of cases). In other words, half of the attribute values are above and half are below this value. In an even numbered distribution, the median is the average of the two middle scores. Median is used for ordinal and derived (aggregated) data.
- Mode is the most frequent score in a distribution. Of course, some distributions do not have a mode if there are no repeated values. At times, the only repeated value might be at the low or high end of the distribution, making this measure a bit unreliable and certainly un-central. The measure, however, is helpful in describing leading categories (for instance the different political parties). It is the only measure for describing the central tendency of nominal data.

Figure 5.6: Central tendency measures. These attribute values attained from a vector layer’s attribute field or from selected pixels in a raster layer.
Dispersion, the second group of descriptive statistics, looks at the attribute data’s spread. Its measures (including range, variance, and standard deviation) describe how much the attribute values vary around the distribution’s center (its central tendency measures). Are the values clustered tightly or are they spread out? These measures help you judge how well the central tendency characterizes all of the values in the attribute field. If the measure of dispersion is small, the values are clustered and the central tendency measure describes the distribution well. There are several types of measures of dispersion (also see Figure 5.7):
- Counts and frequencies are not measures of dispersion, but they are basic ways to summarize data. Counting simply denotes quantity. Frequency is the number of times an attribute field has a particular value. A frequency distribution, usually in the form of a histogram, describes the shape (or structure) of the attribute data by tabulating the frequencies of each value (or range of values).
- Range is the distance between the minimum and maximum attribute values. To derive it, simply subtract the minimum value from the maximum value. It is the simplest measure of dispersion, but it is vulnerable to outliers (rogue values that are significantly different from the rest of the attribute values). If you think outliers affect the range, use the interquartile range instead. It divides the distribution, arranged from low to high, into four parts each containing 25 percent of the attribute values, and it is the difference between the 25th & 75th percentiles.
- The variance looks at the difference between the distribution’s values and its central tendency measure (in this case the mean). It is more complex than computing the average difference that each attribute value falls from the mean. Such a score does not provide enough numeric emphasis to the attribute values on the low and high end of the distribution. The variance adjusts for this by squaring the difference, summing the squares, and dividing by the count.
- Standard deviation is the square root of the variance. Like the variance, it describes the dispersion around the mean and allows you to evaluate how closely the numbers in the dataset are packed around the mean (in other words, how well the mean describes or summarizes the set of numbers). Similarly, the smaller the number the tighter the values are clustered around the central tendency measure. Unlike the variance’s higher values, however, standard deviation uses numbers that are similar to the original dataset. Still, the two are essentially the same thing.

Figure 5.7: Measures of Dispersion.
In vector systems, descriptive statistics are usually generated within the attribute file interface. In raster layers, menu commands process the descriptive statistics. Each attribute field can be summarized in its entirety or confined to selected records or pixels. Which descriptive statistics are calculated depends on the attribute data’s level of measurement (Figure 5.8).

Figure 5.8: Depicts which descriptive statistics can be used with different data levels.


