2.2: Phase 1- GIS Database Design
- Page ID
- 44903
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)What is your goal or your research question? How should you proceed? You need to define your objective at the very beginning. Having a well-defined research question, goal, or even multiple goals is the key to a successful GIS project because it guides the project’s input, analysis, and output stages. Spend time and thought on the design of your GIS because good planning results in successful projects.
Start by thinking about the people, land, and the issues in your study. This has a direct bearing on what datasets (features and attributes) are needed. Next, think about how you will analyze the data. This could affect your choice of GIS software and your data model (vector or raster). Even envision how you would like to present your results. In other words, think through the entire project. If you are working for someone else, consider your employer’s purpose and understand how the company, agency, or organization functions.
It seems like everyone wants to use GIS for their projects today. It is both trendy and powerful, but you should ask yourself whether GIS is the appropriate tool. It may not be. To determine if GIS can help achieve your objectives, think again about the relevant features (land, people, and issues) of your study and whether these can be displayed geographically. Almost all features have geographic locations, but you must further decide if the feature’s “geography” is important. One might be interested in population and economic statistics like income, population density, age, and ethnicity, but are you interested in how these variables differ across neighborhoods, across cities and towns, or across space in general? If location does not play a role in your study, then the data are “aspatial,” and you should close this e-text.
If you can conceptualize your variables as features on a map or depicted within geographic boundaries like census tracts, ZIP Codes, or agricultural fields, GIS can aid your project. As an employee or a researcher, you may already have your goal or research topic chosen for you by your superiors. Your goal or research question might be something like “I want to know the best places to plant sorghum in Mauritania when the new dam goes in.” Alternatively, if you are in a large planning department, you might have multiple goals, and then you must try to satisfy all the tasks, which might include updating and producing high quality maps, generating lists of residents that need to be contacted when a zoning amendment or a liquor license is requested, or determining which roads need resurfacing. It all starts with your goals. If your goals are fuzzy in your own mind, stop and sharpen them. Nothing in the design phase is more important.
Although all components of a GIS project need to be planned, including what software and hardware you will use and what procedures and people will guide your operation, this chapter focuses on data.
In this first phase of the input process, you determine what features and attributes are needed and how they should be coded. This starts with identifying each type of feature and their related attributes, but it goes beyond identification to include several important planning decisions. Figure 2.2 features a list of planning tasks. A discussion of each item follows.

Figure 2.2: Key questions to ask yourself in planning a GIS database.
1. Determine Your Features
What features are necessary? Think back to your project’s goals. If you want to analyze a particular plant species’ distribution, it may be necessary to have a feature devoted to the specific plant type. Equally important, however, are the other features—nearby plant species, soil types, climate conditions, land tenure practices, and landform conditions like slope and aspect. These other features, along with many others, play a role in the distribution of your plant. If you are developing a GIS database for a city’s planning department, you will want layers for many features including streets, parcels, parks, water, sewer, electricity, and buildings. In this case, the features are obvious.
The question of which features to use, however, may seem simple, but frequently it can be complicated. For example, you might want to interview hundreds of people in Modesto, California about their family income and quality of life. Is it appropriate to construct a layer with a point location at the home of each respondent, or would it be better to aggregate the responses of the individuals into neighborhood or census tract boundaries? In the first scenario, your feature layer might be called “respondents”, and each point feature would be located at the home of an individual respondent. Attributes would be stored within each individual feature. In the second scenario, you would have a feature called “census tracts” and the aggregated responses of all the individual respondents would be contained within the appropriate census tract’s data table. In other words, in the second scenario you would not have a respondent layer; you would have a census tract layer with aggregated responses. While the appropriateness question is hypothetical, you can see the ethical issue. You may not want to produce a map or provide the data to others that has the exact locations of your respondents if it includes sensitive or personal data.
Yet as a general rule, it is better to input data in its most precise and detailed form. In the income and quality of life example, you could create the more precise respondent layer and later aggregate (see Chapter 5) the individual respondents to census tracts for output purposes. The public will not see the detailed data. Benefits of having precise data come in verifying your census tract results, and in changing the resolution of your study. For example, what if—down the line—during the analysis portion of your project, you discover that the census tract boundaries are too coarse. It would be difficult, and perhaps impossible, to change or disaggregate the aggregated data into smaller, more detailed boundaries. If you have the respondent layer, however, one can quickly re-aggregate the data to a different and finer boundary feature like census blocks or a neighborhood unit.
2. Determine the Project’s Spatial Extent, Scale, and Temporal Extent
You must determine the area and the period in which your project focuses. Sometimes it is obvious. If you are working for a county planning department, chances are your extent is the county boundary. Cities, however, need to go beyond their boundaries to sphere of influence borders that may lay beyond the city limits because these regions have a direct impact on the city and this region may someday become part of the city. Other boundaries, especially those in research projects, are more difficult to establish.
Along with the project’s spatial extent, you should think about an appropriate scale. There is a relationship between scale and detail (see Figure 2.3). Small-scale maps depict large territories, but they usually are less precise and may require that some reference layers be left out. Large-scale maps show smaller areas but comparatively include more detail. Although GIS allows one to zoom in at increasingly larger scales, data captured at a “small scale” become inherently inaccurate when zoomed in on. The desired scale affects both the amount and precision of the data to be collected and the scale at which the geography can be cartographically represented. There is more on this topic later in this chapter under Precision Errors.

Figure 2.3: Map scale. Small-scale maps depict larger territories, but large-scale maps present more detail.
Similarly, you may want to define a temporal extent. Is time an important variable in your study? Most GIS projects focus on the contemporary scene and ignore the past. With contemporary databases, the assumption is that the GIS database is up-to-date. That may be a major assumption since some of the layers might be quite dated. If, however, you want to determine how much an area has changed, you need to define a period for your project. Does your project focus on neighborhood change or the shrinking of Central Asia’s Aral Sea? Then you need to establish a temporal extent. Determining the temporal period helps you determine your project’s needed attributes. It also makes you aware that you may have to look for features that are no longer present on the landscape.
3. Determine the Attributes for Each Feature Type
As described in Chapter 1, attributes are the characteristics of features. You need to identify the required attributes for each feature type. The more you can do this before you collect your data, the less you will retrace your steps and collect additional attributes later. Again, look to the project’s goals for some clues to the necessary attributes. Also consider how you will analyze the features. You cannot use some analytical processes (like many statistical tests) if the attribute values that you collect are in an improper form to be used in a particular analytical process (see levels of measurement below).
One other thing to consider at this point is that some attributes (like a polygon’s area, a line’s length, and even the number of point features falling within polygon features) can be generated automatically by the software. Additional attributes can be created by multiplying, dividing, adding, subtracting, truncating, or concatenating attributes with other attributes, numbers, or characters.
4. Determine How the Features and Their Attributes should be Coded
Once you have decided on the features and their attributes, determine how they will be coded in the GIS database. Remember from Chapter 1, there is not just one way to code features. Although roads are usually coded as lines, they do not have to be.
Decide whether to code each feature type as a point, line, or polygon. Then define the format and storage requirements for each of the feature’s attributes. For instance, is the attribute going to be in characters (string) or numbers? If they are going to be numbers, are they byte, integer, or real numbers? You will have to establish these database parameters before you enter data into the GIS. Look at the example below (Figure 2.4). Listed are some attributes (under Field Name) relating to the feature “streets”. Notice that street “LENGTH” has a data type called double (a type of real numbers), and in this case, the database will store up to 18 numbers including 5 decimal places for the length of each individual street.

Figure 2.4: Each feature’s attribute needs to be coded.
It is critical that you think about the value of your attributes before you code. Obviously, if one street segment needed room for 9 numbers to report its length, than 8 is not enough and the correct value could not be entered without modifying the field’s length.
Also, while thinking about your attribute values, consider where it fits on the “levels of measurement” scale with its four different data values: nominal, ordinal, interval, and ratio. Stanley S. Stevens, an American psychologist, developed these categories in 1946. Although Steven’s classification is widely used, it is not universally accepted. Some researchers have problems with the categories, and others with how categorization affects research. For our purposes, it is a useful way to conceptualize how data values differ, and it is an important reminder that only some types of variables can be used for certain mathematical operations and statistical tests, including many GIS functions. The different “levels” are depicted in Figure 2.5 and demonstrated using an example of a marathon race:

Figure 2.5: Levels of Measurement.
- Nominal data use characters or numbers to establish identity or categories within a series. In a marathon race, the numbers pinned to the runners’ jerseys are nominal numbers (first column in the figure above). They identify runners, but the numbers do not indicate the order or even a predicted race outcome. Besides races, telephone numbers are a good example. It signifies the unique identity of a telephone. The phone number 961-8224 is not more than 961-8049. Place names (and those of people) are nominal too. You may prefer the sound of one name, but they serve only to distinguish themselves from each other. Nominal characters and numbers do not suggest a rank order or relative value; they identify and categorize. Nominal data are usually coded as character (string) data in a GIS database.
- Ordinal datasets establish rank order. In the race, the order they finished (i.e. 1st, 2nd, and 3rd place) are measured on an ordinal scale (second column in Figure 2.5). While order is known, how much better one runner is than the other is not. The ranks ‘high’, ‘medium’, and ‘low’ are also ordinal. So while we know the rank order, we do not know the interval. Usually both numeric and character ordinal data are coded with characters because ordinal data cannot be added, subtracted, multiplied, or divided in a meaningful way. The middle value, the “median”, in a string of ordinal values, however, is a good substitute for a mean (average) value.
- The Interval scale, like we will discuss with ratio data, pertains only to numbers; there is no use of character data. With interval data the difference—the “interval”—between numbers is meaningful. Interval data, unlike ratio data, however, do not have a starting point at a true zero. Thus, while interval numbers can be added and subtracted, division and multiplication do not make mathematical sense. In the marathon race, the time of the day each runner finished is measured on an interval scale. If the runners finished at 10:10 a.m., 10:20 a.m. and 10:25 a.m., then the first runner finished 10 minutes before the second runner and the difference between the first two runners is twice that of the difference between the second and third place runners (see third column 3 Figure 2.5). The runner finishing at 10:10 a.m., however, did not finish twice as fast as the runner finishing at 20:20 (8:20 p.m.) did. A good non-race example is temperature. It makes sense to say that 20° C is 10° warmer than 10° C. Celsius temperatures (like Fahrenheit) are measured as interval data, but 20° C is not twice as warm as 10° C because 0° C is not the lack of temperature, it is an arbitrary point that conveys when water freezes. Returning to phone numbers, it does not make sense to say that 968-0244 is 62195 more than 961-8049, so they are not interval values.
- Ratio is similar to interval. The difference is that ratio values have an absolute or natural zero point. In our race, the first place runner finished in a time of 2 hours and 30 minutes, the second place runner in a time of 2 hours and 40 minutes, and the 450th place runner took 5 hours (see forth column in Figure 2.5). The 450th place finisher took over five times longer than the first place runner did (12.667 hrs / 2.5 hrs = 5.0668). With ratio data, it makes sense to say that a 100 lb woman weighs half as much as a 200 lb man, so weight in pounds is ratio. The zero point of weight is absolute. Addition, subtraction, multiplication, and division of ratio values make statistical sense.
5. Determine the Base Map Reference Features
What features are helpful to include? Add reference features that help people orient themselves within your study area even if you are not going to analyze these features. Major roads, rivers, and principal buildings are good examples of features that help orient map readers. These secondary features are often the easiest features to find on the Internet, and sometimes they are bundled with GIS software. In short, having these base-map features may not be important for analysis, but they are important for clarity.
6. Determine your Project’s Projection, Coordinate System, and Datum
Before you collect or look for data, you should decide on which projection, coordinate system, and datum to use. These three terms, collectively termed “projection parameters”, are discussed in Chapter 3, but it is important that these parameters remain consistent throughout your layers. Consistency enables you to properly overlay your feature layers to produce maps and analyze feature relationships. Figure 2.6 shows an example of how a parcel layer is not properly overlaid upon street centerline and building layers due to small differences in parcel layer’s projection parameters.

Figure 2 6: Due to differences in the projection parameters of the parcel layer, it is not lining up properly with the street centerline and buildings layers.
Deciding on the projection parameters upfront helps you select among nearly identical data files, if available, on the Internet because one file might already be in the desired projection, coordinate system, and datum. It is, however, rare to find everything the way you want it. Usually, you need to convert your layers, and GIS programs come with algorithms that convert between standard projections, coordinate systems, and datums. This topic is covered in Chapter 3.
Before choosing your projection parameters, do some investigating. If you are using mostly state GIS databases, it makes sense to use their parameters if they fit your needs. For instance, California has created its own Albers projection, and the vast majority of their GIS layers are in this projection. Local agencies, however, might use State Plane Coordinates or Universal Transverse Mercator. If substantial amounts of data are shared with a local jurisdiction, you may decide to use their projection parameters. You can change your projection parameters at any time in your project’s life, but it takes time and organization.
The decision of which projection parameters to use can also be based on many other considerations. The project’s scale, the desired projection properties (like the preservation of area, which is discussed in Chapter 3), the needed precision of positional accuracy, the attributes to be derived internally, and the project’s location are among those questions that need to be evaluated before selecting a projection. Know something about projection parameters before you choose them for your project. For instance, if you desire to calculate area from the shape of your polygons, you need to use an equal area projection. If you do not use an equal area projection, the area measurements will be inaccurate.
Before actively collecting and inputting data, consider the six items listed on page 15. Next, find out what GIS datasets already exist, and—for those datasets that do not exist—the different ways to create the needed data. The next two sections of this chapter deal with these topics.
One last suggestion, when planning a GIS project, talk with GIS technicians and analysts at government agencies or researchers that have put together similar GIS projects. Their suggestions will save you time, and they could become a great contact for troubleshooting problems and providing you with needed GIS datasets.


