Skip to main content
Geosciences LibreTexts

17.4: Datasets

  • Page ID
    24685
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In pandas, a DataFrame can be thought of as a collection of Series objects that share a common index, allowing for the alignment of data across rows. This structure is exceedingly efficient for handling one-dimensional sequences of values, typically time series or other forms of sequentially indexed data, which are common in various domains including finance, economics, and simple observational studies.

    Similarly, Xarray Datasets can be thought of as a collection of DataArrays with common coordinates, similar to the way a Pandas DataFrame is a collection of Series. Xarray's Dataset enables the alignment and joint operation of multiple DataArrays across their shared coordinates, providing a robust framework for managing the intricacies of multidimensional datasets, like those generated from climate models or satellite observations.

    Here is one example of how you would create a Dataset:

    import numpy as np
    import pandas as pd
    import xarray as xr

    # Define the coordinates
    latitudes = np.linspace(-90, 90, 181)   # 181 points from South to North Pole
    longitudes = np.linspace(-180, 180, 361) # 361 points for full longitude range
    pressure_levels = np.array([1000, 850, 700, 500, 300, 200, 100])  # in hPa
    times = pd.date_range('2005-01-01', '2005-12-01', freq='MS')  # Monthly intervals for 2005

    # Generate random data
    temperature_data = np.random.rand(len(times), len(pressure_levels), len(latitudes), len(longitudes))
    precipitation_data = np.random.rand(len(times), len(pressure_levels), len(latitudes), len(longitudes))

    # Create the xarray Dataset
    ds = xr.Dataset({
        'temperature': (['time', 'pressure', 'latitude', 'longitude'], temperature_data),
        'precipitation': (['time', 'pressure', 'latitude', 'longitude'], precipitation_data)
    }, coords={
        'time': times,
        'pressure': pressure_levels,
        'latitude': latitudes,
        'longitude': longitudes
    })

    In this example, the xr.Dataset constructor is used to create a new Dataset. The first argument is a dictionary that maps variable names to dimension names and data arrays.  The coords argument is a dictionary that assigns coordinate labels to the dimensions. The times array generated with pd.date_range provides monthly intervals throughout the year 2005, while latitudes and longitudes are evenly spaced points on the globe, and pressure_levels represent typical atmospheric pressure levels in hPa.

    Accessing the variables within an Xarray dataset is straightforward, similar to how one might interact with a dictionary in Python. Each variable in the dataset is a Xarray DataArray, which can be retrieved using the variable name as the key. For example, to access the 'temperature' variable from the dataset `ds` created in the previous example, you simply use `ds['temperature']` or `ds.temperature`. This retrieves the DataArray corresponding to temperature, complete with its associated dimensions, coordinates, and attributes. This structure allows for intuitive querying and manipulation of the data. You can similarly access the 'precipitation' DataArray using `ds['precipitation']` or `ds.precipitation`. These variables can be handled independently or in conjunction with other dataset variables, depending on the analysis required. It's this simplicity in variable access that makes Xarray a potent tool for handling high-dimensional datasets typical in fields like atmospheric sciences.

    When performing calculations or operations on an Xarray dataset, the operation propagates across all the included variables, applying the computation en masse. This attribute of Xarray is particularly advantageous for dataset-wide adjustments or analyses. Consider a scenario where we want to add one to every data point in the dataset. We could do this with the command `ds = ds + 1`, which would add one to each variable in `ds`.

    Similarly, when performing coordinate-based selections, the operation is universally applied. If you were to select a subset of data along a specific coordinate, say a particular range of latitudes, executing something like `ds.sel(latitude=slice(10, 20))` would concurrently extract the slice for all variables within `ds` that correspond to those latitudes. It would return a new dataset with the same variables but with the requested range of latitudes.  Remember that this is a view, not a copy, so you may get irregular results if you try to modify the contents of the view.


    17.4: Datasets is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?