Skip to main content
Geosciences LibreTexts

17.6: Groupby and Resample

  • Page ID
    24687
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Xarray's groupby functionality is similar to that of pandas and it allows one to aggregate over one or multiple dimensions based on a coordinate. In atmospheric sciences, this feature is invaluable, as it allows for the segmentation of complex, high-dimensional datasets into meaningful groups based on specified criteria, such as time periods or spatial regions, and then apply operations to each group independently.

    For instance, suppose we have a dataset of monthly global temperature readings, spanning several years. We may wish to compute the monthly average (i.e., the average of all Januaries, the average of all Februaries, ...). In Xarray, we could accomplish this using the `.groupby()` method along with the coordinate we want to group by, which in this case could be 'time.month'. Once grouped, we would then apply an aggregation method `.mean()`.

    Here's a short example demonstrating this:

    import xarray as xr
    import numpy as np
    import pandas as pd

    # Set the seed for reproducibility
    np.random.seed(0)

    # Create a range of monthly periods across multiple years (2005-2007, for example)
    time = pd.date_range('2005-01-01', '2007-12-31', freq='M')

    # Create a DataArray with random temperature data for each month
    temperature = xr.DataArray(np.random.rand(len(time), 2, 2),
                               dims=('time', 'lat', 'lon'),
                               coords={'time': time,
                                       'lat': [10, 20],
                                       'lon': [50, 60]})

    # Create a DataArray with random precipitation data for each month
    precipitation = xr.DataArray(np.random.rand(len(time), 2, 2),
                                 dims=('time', 'lat', 'lon'),
                                 coords={'time': time,
                                         'lat': [10, 20],
                                         'lon': [50, 60]})

    # Combine into a Dataset
    ds = xr.Dataset({'temperature': temperature, 'precipitation': precipitation})

    # Group by 'time.month' and calculate the mean for each month across all years
    monthly_mean = ds.groupby('time.month').mean('time')

    print(monthly_mean)

    Here is the output:

    <xarray.Dataset>
    Dimensions:        (lat: 2, lon: 2, month: 12)
    Coordinates:
      * lat            (lat) int64 10 20
      * lon            (lon) int64 50 60
      * month          (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
    Data variables:
        temperature    (month, lat, lon) float64 0.4836 0.3663 ... 0.3119 0.4013
        precipitation  (month, lat, lon) float64 0.6187 0.5886 ... 0.5526 0.425

    Note that what .groupby() has done is take the months from the years and average them, so that the coordinate month = 1, 2, ... is the average of Januarys, average of Februaries, ..., from three years of the dataset.  It has also retained the `lat` and `lon` coordinates.

    By changing the function after the .groupby() method, we can get other statistics of the group.  For example, we could get the maximum of each month with the command: monthly_max = ds.groupby('time.month').max('time')

    A related process is resampling, a powerful tool that enables one to restructure a dataset from one time resolution to another, aggregating the data according to a given rule. It can be particularly useful when looking for long-term trends or smoothing out short-term fluctuations. This process involves defining a new time frequency and then specifying an aggregation method, such as taking the mean, sum, or maximum over each period in the resampled dataset.

    If we use the same example Dataset from above, here's how we can take the dataset with monthly data and resample it to get annual averages:

    ds.resample(time='A').mean()

    It returns:

    <xarray.Dataset>
    Dimensions:        (lat: 2, lon: 2, time: 3)
    Coordinates:
      * lat            (lat) int64 10 20
      * lon            (lon) int64 50 60
      * time           (time) datetime64[ns] 2005-12-31 2006-12-31 2007-12-31
    Data variables:
        temperature    (time, lat, lon) float64 0.5198 0.6375 ... 0.6292 0.5636
        precipitation  (time, lat, lon) float64 0.6056 0.5719 ... 0.5868 0.5283

    In the above code, `ds.resample(time='A')` changes the dataset's temporal resolution from monthly ('M') to annual ('A'). Following resampling, `.mean()` computes the average of the data points for each year, which could reveal year-over-year changes and trends that are less discernible in monthly variations.

    Note that Xarray resample command is .resample(time='A').  This is different from Pandas, which would just require .resample('A') because Xarray needs to know which coordinate to resample over.


    17.6: Groupby and Resample is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?