Skip to main content
Geosciences LibreTexts

3.4: Researching Data

  • Page ID
    20568

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    Now that we have a basic understanding of data and information, where can we find such data and information? Though an Internet search will undoubtedly produce myriad sources and types of data, the hunt for relevant and valuable data is often a challenging and iterative process. Therefore, before hopping online and downloading the first thing that appears from a web search, it is helpful to frame our search for data with the following questions and considerations:

    What exactly is the purpose of the data?

    Given that the world is drowning in vast amounts of data, articulating why we need or do not need a given data set will streamline the search for valuable and relevant data. To this end, the more specific we can be about the purpose of the needed data, the more efficient our search for data will be. For example, if we are interested in understanding and studying economic growth, it is helpful to determine both temporal and geographic scales. In other words, for what periods (e.g., 1850–1900) and intervals (e.g., quarterly, annually) are we interested, and at what level of analysis (e.g., national, regional, state)? Frequently, data availability, or the lack of relevant data, will force us to change the purpose or scope of our original question. A clear purpose will yield a more efficient search for data and enables us to accept or quickly discard the various data sets that we may come across.

    What data already exists and is available?

    Before searching for new data, it is always a good idea to the inventory we already have data. Such data may be from previous projects or analyses or colleagues and classmates, but the key point here is that we can save a lot of time and effort using data we already possess. Furthermore, we better understand what we need by identifying what we have. For instance, though we may already have census data (i.e., attribute data), we may need updated geographic data that contains the boundaries of US states or counties.

    What are the costs associated with data acquisition?

    Data acquisition costs go beyond financial costs. Just as important as the financial costs to data are those that involve your time. Time is money. The time and energy you spend collecting, finding, cleaning, and formatting data are time and energy taken away from data analysis. Therefore, depending on deadlines, time constraints, and deliverables, it is critical to learn to manage your time when looking for data.

    What format does the data need to be in?

    Though many programs can read many formats of data, some data types can only be read by some programs, and some programs require data formats—understanding what data formats you can use and those that you cannot aid in your search for data. For instance, one of the most common forms of geographic information system (GIS) data is called the shapefile. Not all GIS programs can read or use shapefiles, but it may be necessary to convert to or from a shapefile or another format. The more data formats we are familiar with, the better off we will be in our search for data because we will understand what we can use and what format conversions will need to be made if necessary.

    All these questions are of equal importance, and being able to answer them will assist in a more efficient and effective search for data. Several other considerations behind the search for data, particularly GIS data, but those listed here provide an initial pathway to a successful search for data.

    As information technology evolves and more data are collected and distributed, the various forms of data that can be used with GIS increase. GIS uses and integrates two types of data: geographic and attribute data. Sometimes the source of both geographic and attribute data is the same. For instance, the United States Census Bureau distributes geographic boundary files (e.g., census tract level, county level, state-level) and the associated attribute data (e.g., population, race/ethnicity, income). What is more, such data are freely available at no charge. US census data are exceptional in many respects: free and comprehensive.

    Every search for data will vary according to the purpose. However, government data tend to have good coverage and provide a point of reference from which other data can be added, compared, and evaluated. Whether you need satellite imagery data from the National Aeronautics and Space Administration (NASA) or land use data from the United States Geological Survey (USGS), such government sources tend to be dependable, reputable, and consistent. Another critical element of most government data is that they are freely accessible to the public. In other words, there is no charge to use or to acquire the data. Data that are free to use are called public data.

    Unlike publicly available data, there are numerous private or proprietary data sources. The main difference between public and private data is that the former tends to be free, and the latter must be acquired at a cost. Furthermore, there are often restrictions on distributing and disseminating proprietary data sets (i.e., sharing the purchased data is not allowed). Again, proprietary data may be the only option depending on the subject matter. Another reason for using proprietary data is that the data may be formatted and cleaned according to your needs. When working with deadlines, the trade-off between financial cost and time saved must be seriously considered and evaluated.

    The search for data, particularly the data you need, is often the most time-consuming aspect of any GIS-related project. Therefore, it is essential to define and clarify your data requirements and needs, from the temporal and geographic scales of data to the formats required, as clearly as possible and as early as possible. Such definition and clarity will pay dividends in your search for the correct data, better analyses, and well-informed decisions.


    This page titled 3.4: Researching Data is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Adam Dastrup.

    • Was this article helpful?