4.5: Attribute Preprocessing And Editing
- Page ID
- 44915
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)When you add feature layers, containing both spatial and attribute data, to an active workspace, the attribute data file might not be immediately visible. Opening and editing the attribute files are easy processes, but they are specific to individual programs. Once the attribute table is open, you can enter data by typing attribute values directly into the data file or loading and joining external data files to it. Other processes like editing attributes, adding or deleting fields, deleting records, querying attributes (record selection), calculating fields, and geocoding are completed through the data file interface.
Adding and deleting fields
As described above, fields define feature attributes. Most GIS programs provide a way for you to add or delete fields from within your open data file. The GIS program will instruct you to define a new field. You will give it a name and select from options that determine the data format of the values that will be placed into the field. Deleting a field usually involves selecting the field and deleting it.
Deleting records
You can delete a single record or a group of records in a data file by first selecting them and then deleting them. Since records are the database representation of features, when you delete records in the attribute file, you are also permanently discarding their spatial representation. The entire feature, graphic and record, is deleted.
Generally, you can not add a record through the data file interface because it must also be represented spatially. See Chapter 3 for how to add a feature. Its record is automatically created when the graphic feature is added to the workspace.
Joining Data Files
Once a GIS layer is created, its attribute file can be linked (“joined”) to external data files. Joining is one of the most frequently performed data file processes because it brings together feature attributes that are contained in multiple digital data files. To perform a join, a unique matching field, the key identifier, must be observed in both data files. As stated in Chapter 3, the key identifier could be something like a social security number or an assessor parcel number. It is a field that gives the feature a unique identification. Once linked, the join can be temporary or made permanent.
The external files that you load into the GIS to perform a join are typically in file formats such as dBase, ASCII, Microsoft Excel, or Microsoft Access. The precise steps involved in joining together two files are software specific, but it usually involves:
- loading the external file that you wish to join to the GIS attribute file,
- selecting the external file and the GIS attribute file that you wish to join,
- selecting the field (containing the key identifier) in each file, and when joined,
- making sure that the join was successful.
In the example in Figure 4.6, the parcel layer exists, but it does not include assessed value. It does contain a field named APN (Assessor’s Parcel Number) whose values are unique to each record and which could be used to join other data files. A spreadsheet file, with assessed value, also exists, and it must be loaded into the GIS either in its native format (if accepted) or exported from the spreadsheet program to a format that the GIS can read. The spreadsheet has a field named APN_NUM, which, after a visual check, has the same values as those under APN in the parcel layer, and it can be used to perform the join.

Figure 4.6: Joining two attribute files together requires that the two files each have a common key identifier.
Once the spreadsheet file is loaded, you begin the joining process by specifying the two files (the layer’s table and the spreadsheet file) and the two field names that the join will be made on. APN and APN_NUM are the key identifiers of these two files (see Figure 4.7), and even though the field names are not identical, the GIS will be able to join these two files together provided that the values under the two field names match.

Figure 4.7: Matching key identifiers.
If the match is successful, your two files will be joined together into a single file (see Figure 4.8).

Figure 4.8: A joined file with accessed values a one of the attributes.
Perhaps the most time consuming tasks are the first and fourth steps. Loading an external data file should be easy —and frequently it is—but sometimes the imported data file may be misformatted or unreadable. If it is, return to the host program (your spreadsheet or DBMS programs) and save it in a different format. The probability of your GIS program being able to read the external data file usually improves as you go from more sophisticated file formats (like Excel and Access) to dBase to ASCII (basic formats). Many data files are coded in ASCII because of its almost universal compatibility with computers and software programs, but it does have its complications—it comes in several forms. Below are four of the most used variants of ASCII based on what delimits the file’s fields.
Whitespace delimited ASCII files differentiates fields by the use of one or more spaces. Since spaces separate fields, fields that have no value must be represented by a non-blank code and character attributes cannot contain spaces between words (underscores can be used to separate words). You can open ASCII files in any word processer or text editor. A whitespace-delimited ASCII data file with five records might look something like the following:
M1 Betsy_Burns Yes 38.5 0.85
P1 Dan_Arreola No 45.7 0.99
M2 Frank_Aldrich Yes 32.8 0.55
P2 Fritz_Steiner No – –
P3 Ruth_Yabes No 37.72 –
Spacequote delimited ASCII is a variant of whitespace delimitation, but the attributes containing multiple words are enclosed in double quotes, and consequently, they can contain embedded spaces between words. The spacequote delimited ASCII file may look like the following in a text editor:
M1 “Betsy Burns” Yes 38.5 0.85
P1″Dan Arreola” No 45.7 0.99
M2 “Frank Aldrich” Yes 32.8 0.55
P2 “Fritz Steiner” No – –
P3 “Ruth Yabes” No 37.72 –
Tab delimited files separate fields by the use of a single tab. Two tabs in a row signify a blank field. Values within an attribute field cannot contain embedded tabs. A tab delimited ASCII file would look like the following in a text editor.
M1 Betsy Burns Yes 38.5 0.85
P1 Dan Arreola No 45.7 0.99
M2 Frank Aldrich Yes 32.8 0.55
P2 Fritz Steiner No
P3 Ruth Yabes No 37.72
Comma delimited, also known as comma-quote delimited and CSV, separate fields by commas. Character fields may be enclosed in double quotes, and need to be if they contain an embedded comma. Two commas in a row signify that the field is blank. Usually whitespace is not allowed before or after fields (although this may be tolerated in the CSV form). The comma-delimited ASCII file might look like the following in a text editor:
M1,”Betsy Burns”,Yes,38.5,0.85
P1,”Dan Arreola”,No,45.7,0.99
M2,”Frank Aldrich”,Yes,32.8,0.55
P2,”Fritz Steiner”,No,,
P3,”Ruth Yabes”,No,37.72,
Sorting records
Sorting temporarily rearranges your data file records, so you can view, select, update, or print them in the new sorted sequence. Although the specifics vary by program, you generally choose the field (or fields) you want to sort by. The first sort field arranges, usually in ascending or descending order, the records based on the field’s contents. For example, a class roster might be sorted alphabetically by last name. Some systems allow you to choose a second sort field (or more), which arranges records (in ascending or descending order) when two or more records have the same first field value. In the example above, if your alphabetical list has four students with the last name Smith, those four records could be rearranged in alphabetical order based on their first name.
Record selection/Attribute Query (Boolean Selection)
Selecting specific records is one of the most common database functions. Often called attribute query, it consists of highlighting a subset of the records based on a specific criteria. In other words, you create an expression—a formula—that queries all the records in the data file and the GIS highlights—both in the data file and on the map display—only those features that fit the criteria.
Most GIS programs use a Standard Query Language (SQL) interface to conduct attribute queries. If one is using an external relational DBMS program (like Access or Oracle), SQL makes the call to the external database and isolates only the necessary records that you will use. SQL uses set algebra, Boolean algebra, and arithmetic operators (=, -, *, /) for attribute queries. Set Algebra includes the use of less than (<), greater than (>), equal to (=), and not equal to (<>) operations. You can create an expression like that found below (see Figure 4.9) to isolate only those records that fit your criteria. You can extend or constrain the selected features by using Boolean algebra, which uses the conditions OR (extend), AND (constrain), and NOT to further select or isolate records. Each record is queried and added to the set if it meets the criteria.

Figure 4.9: Select records based on their attributes by using SQL expressions.
Once the records are selected, you can work with just those records. This is helpful for viewing, sorting, editing, calculating fields, generating statistics, using the selected features to select features in another GIS layer, creating a new layer with only the selected features, and isolating specific records to perform analysis functions on (like buffering selected features).
In addition, spatial queries, selecting features based on their geographic location (see Chapter 5), can be combined with attribute queries for more sophisticated queries. There is more on attribute and spatial queries in Chapter 5.
Calculate Attributes
Within an open data file, you can create new attributes by using values in existing fields, mathematical expressions, and text functions (see Figure 4.10). Mathematical operations allow you to add, subtract, multiply, and divide existing fields or values to create new, derived attributes. Text functions allow you to populate fields with data, copy values from one field to another, concatenate fields (and or values), truncate attributes, and convert text to different formats. Before calculating the new field, however, you need to create a new attribute field, which includes defining its field name and its data properties). Calculations can be performed on a single record, several selected records, or on every record in the data file. The calculate function can also be used to copy data from one field to another.

Figure 4.10: Calculating fields. In this example, population density is calculated by dividing population by area. First, the field must be added. Then, you calculate the results directly into the new field.
Geocoding
There is a way to create geographic data directly from attribute data. The process, called geocoding, assigns geographic locations to features directly from attribute fields that contain locational information within a data file. This is a popular way to create GIS feature layers; you create or obtain a spreadsheet or data file with location information, open the attribute table in your GIS, and direct the system toward the appropriate attribute fields. There are two types of geocoding: coordinate locations and address matching.
Spatial features can be created from data files containing fields with x,y coordinate values. The coordinates need to be separated into two separate fields: one for the x coordinate and one for the y coordinate. The process is straightforward; you direct the GIS to the data file’s appropriate x,y fields, and it creates a spatial layer of point features from the coordinates. One possible complication is that the data file’s coordinates are different than the coordinate system you are using. This requires that you open the file in a temporary workspace registered to the data file’s coordinate system and then convert the new spatial layer to the desired coordinate system.
Address matching is another type of geocoding. It matches records in two data files—one containing a list of addresses and the other having street network attributes—to create a new layer (see Figure 4.11). In other words, it creates a layer of point features alongside street segments when addresses in the two data files match. It essentially looks up the address in the first record of the external data file and tries to find a match along the street network layer. If multiple possibilities exist, the routine will present them for user input. After the first record is matched or not, it moves to the second record and tries again. The resultant file is assigned the street network’s coordinate system.

Figure 4.11: Address matching. The addresses in an external data file (A) are compared to a street network's (B) attribute fields, and if a match is made, the record in the external data file gets a point on the map (C).
Both the street network layer and the external data file need address data (street name, street type, and an address range for start and end of each line segment), and perhaps even more information like city, state, and Zip code attributes to make your address information unique (multiple cities will likely contain streets with the same name). The process works well if the addresses in both the external data file and the street network layer are accurate and complete, but address matching is a time consuming process.
Data Export
Exporting your GIS layers, including their geographic and attribute data files, are covered in Chapter 6. Most GIS programs can export your layer’s attribute file in a number of formats including dBase and ASCII. The exported files can then be used in database, statistic, and spreadsheet programs for additional analysis.


