3.2: Data and Information

Last updated
Save as PDF

Page ID: 20566

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

To understand how we get from analog to digital maps, let us begin with the building blocks and foundations of the geographic information system (GIS) – namely, data and information. Geographic information systems store, edit, process, and present data and information. However, what exactly is data? Moreover, what exactly is information? The terms “data” and “information” refer to the same thing for many. For our purposes, it is helpful to distinguish between the two. Data refer to facts, measurements, characteristics, or traits of an object of interest. For your grammar sticklers out there, note that “data” is the plural form of “datum.” For example, we can collect data about all kinds of things, like the length of rainbow trout in a Colorado stream, the number of vegetarians in Alaska, the diameter of mahogany tree trunks in the Brazilian rainforest, student scores on the last GIS midterm, the altitude of mountain peaks in Nepal, the depth of snow in the Austrian Alps, or the number of people who use public transportation to get to work in London.

Once data are put into context, used to answer questions, situated within analytical frameworks, or obtain insights, they become information. Information refers to the knowledge of value obtained through collecting, interpreting, and analyzing data. Though a computer is not necessary to collect, record, manipulate, process, or visualize data or process it into information, information technology can significantly help. For instance, computers can automate repetitive tasks, store data efficiently in terms of space and cost, and provide tools for analyzing data from spreadsheets to GIS. In addition, an incredible amount of data is collected daily by satellites, grocery store product scanners, traffic sensors, temperature gauges, smartphone apps, and endlessly more. Again, this data would not be possible without the aid and innovation of information technology.

Geographic or spatial data refer to geographic facts, measurements, or characteristics of an object that permit us to define its location on the earth’s surface. Such data include, but are not restricted to, the latitude and longitude coordinates of points of interest, street addresses, postal codes, political boundaries, and even the names of places of interest. It is also important to note and reemphasize the difference between geographic and attribute data. Geographic data defines the location of an object of interest; attribute data is concerned with its nongeographic traits and characteristics.

“Spatial data is information about the locations and shapes of geographic features and the relationship between them, usually stored as coordinates and topology.” – Esri

To illustrate the distinction between geographic and attribute data, think about your home, where you grew up, or where you currently live. Within the context of this discussion, we can associate both geographic and attribute data to it. We can define the location of your home in many ways, such as with a street address, the street names of the nearest intersection, the zip code or Census block your home is located in, or latitude and longitude coordinates. What is essential is that geographic data permit us to define the location of an object (i.e., your home) on the surface of the earth.

In addition to the geographic data that defines the location of your home are the attribute data that describes the various qualities of your home. Such data could include the number of bedrooms and bathrooms in your home, whether your home has central air, the year your home was built, the number of occupants, or whether there is a swimming pool. These attribute data tell us a lot about your home but little about where it is.

It is beneficial to recognize and understand how geographic and attribute data differ and complement each other, but it is also vital when learning about and using GIS. Because a GIS requires and integrates these two distinct types of data, being able to differentiate between geographic and attribute data is the first step in organizing your GIS. Furthermore, determining which kinds of data you need will aid in your implementation and use of a GIS. Often, and in the age and context of information technology, the data and information discussed thus far is the stuff of computer files, which are the focus of the next section.

Files and Formats

When we collect data about your home, rainforests, or anything, we usually need to put them somewhere. Though we may scribble numbers and measures on the back of an envelope or write them down on a pad of paper, if we want to update, share, analyze, or map them in the future, it is often helpful to record them in digital form so a computer can read them. So, though we will not bother ourselves with the bits and bytes of computing, it is necessary to discuss some fundamental elements of computing that are both relevant and required when learning and working with a GIS.

One of the most common elements of working with computers and computing is the file. Files in a computer can contain any number of things, from a complex set of instructions (e.g., a computer program) to a list of numbers and letters (e.g., an address book). Furthermore, computer files come in all varied sizes and types. One of the clues we can use to distinguish one file from another is the file extension. A file extension refers to the letters that follow the period (“.”) after the file’s name. The table below contains some of the most common file extensions and the types of files with which they are associated.

filename.txt Simple text file

filename.doc Microsoft Word document

filename.pdf Adobe portable document format

filename.jpg Compressed image file

filename.tif Tagged image format

filename.html Hypertext markup language (used to create websites)

filename.xml Extensible markup language

filename.zip Zipped/compressed archive

Some computer programs may be able to read or work with only specific file types, while others are more adept at reading multiple file formats. As you begin to work more with information technology and GIS, you will realize that familiarity with different file types is essential. In addition, learning how to convert or export one file type to another is also a beneficial and valuable skill to obtain. In this regard, recognizing and knowing how to identify different and unfamiliar file types will undoubtedly increase your proficiency with computers and GIS.

Of the numerous file types, one of the most common and widely accessed files is simple text, plain text, or just text file. Simple text files can be read widely by word processing programs, spreadsheet and database programs, and web browsers. Often ending with the extension “.txt” (i.e., filename.txt), text files contain no special formatting (e.g., bold, italic, underlining) and contain only alphanumeric characters. In other words, images or sophisticated graphics are not well suited for text files. Text files, however, are ideal for recording, sharing, and exchanging data because most computers and operating systems can recognize and read simple text files with programs called text editors.

When a text file contains organized or structured data in some fashion, it is sometimes called a flat file (but the file extension remains the same, i.e., .txt). Flat files are organized in a tabular format or line by line. In other words, each line or row of the file contains one and only one record. So, if we collected height measurements on three people, Tim, Jake, and Harry, the file might look something like this:

Name Height

Tim 6’1″

Sarah 5’7″

Maria 5’5″

Each row corresponds to one and only one record, observation, or case. There are two other essential elements to know about this file. First, note that the first row does not contain any data; instead, it describes the data contained in each column. When the first row of a file contains such descriptors, it is referred to as a header row or just a header. Columns in a flat-file are also called fields, variables, or attributes. For example, “Height” is the attribute, field, or variable that we are interested in, and the observations or cases in our data set are “Tim,” “Jake,” and “Harry.” In short, rows are for records; columns are for fields.

The second unseen but critical element of the file is the spaces between each column or field. For example, a space separates the “name” column from the “height” column in the example. Upon closer inspection, however, note how the initial values of the “height” column are aligned. If a single space were used to separate each column, the height column would not be aligned. In this case, a tab is being used to separate the columns of each row. The delimiter or separator is the character used to separate columns within a flat file. Though any character can be used as a delimiter, the most common delimiters are the tab, the comma, and a single space. The following are examples of each.

Tab-Delimited Single-Space-Delimited Comma-Delimited

Name Height Name Hight Name, Height

Tim 6’1″ Tim 6’1″ Tim, 6’1″

Sarah 5’7″ Sarah 5’7″ Sarah, 5’7″

Maria 5’5″ Maria 5’5″ Maria, 5’5″

Knowing the delimiter to a flat-file is essential because it enables us to distinguish and separate the columns efficiently and without error. Sometimes such files are referred to by their delimiters, such as a “comma-separated values” file or a “tab-delimited” file.

The same general format is applied when recording and working with geographic data. Rows are reserved for records, or in the case of geographic data, locations and columns or fields are used for the attributes or variables associated with each location. For example, the following tab-delimited flat file contains data for three places (i.e., countries) and three attributes or characteristics of each country (i.e., population, language, continent), as noted by the header.

Country Population Languages Continent

France 65,000,000 French Europe

Brazil 192,000,000 Portuguese South American

Jordan 9,531,712 Arabic Southwest Asia

Files like those presented here are the building blocks of the various tables, charts, reports, graphs, and other visualizations that we see online, in print, and on television every day. They are also vital components of GIS maps and geographic representations. Rarely if ever, however, will you work with one and only one file or file type. Often, especially when working with GIS, you will work with multiple files. Such a grouping of multiple files is called a database. Since the files within a database may be of varied sizes, shapes, and even formats, we need to devise a system that will allow us to work, update, edit, integrate, share, and display the various data within the database. Such a system is referred to as a database management system (DBMS). So databases and DBMSs are crucial to GIS, and a later chapter is dedicated to them. Geodatabases are a collection of geographic data contained within a standard file system.

Search

Text Color

Text Size

Margin Size

Font Type