14.2: The Pandas Series
- Page ID
- 21257
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The standard way to import Pandas is to use this line:
import pandas as pd
This code should go at the top of your first notebook cell, right under your import numpy as np
line. The two go hand in hand.
Let's begin talking about Pandas series. A Series is conceptually a set of key-value pairs. The keys are normally all of the same type (e.g., all integers or all strings), and so are the values, although the keys might be of a different type than the values (e.g., the keys could be strings and the values could be floats).
The Pandas package calls the keys “the index,” which is an overlap with the term we used for ordinary arrays.
Creating a Series
Here are a few common ways of creating a Pandas Series object in memory.
Way 1: create an empty Series
Perhaps this first one sounds dumb, but we will indeed have occasion to start off with an empty Series and then add key/value pairs to it from there. The code is simple:
Code \(\PageIndex{1}\) (Python):
my_new_series = pd.Series()
Voilà.
Way 2: pd.Series([], index=[])
As with NumPy ndarrays, we can explicitly list the values we want in a new Series. We also have to list the index values (the keys). The syntax for doing so is:
Code \(\PageIndex{2}\) (Python):
alter_egos = pd.Series(['Hulk','Spidey','Iron Man','Thor'], index=['Bruce','Peter','Tony','Thor'])
This creates the Series shown in Figure 11.1.2.
Code \(\PageIndex{4}\) (Python):
Way 3: “wrapping” an array
Associative arrays, and the Pandas Series we’ve been using to implement them, are inherently one-dimensional data structures. This is just like the NumPy arrays we used before. Pandas Series also provide a bunch of features for manipulating, querying, computing, and even graphing aspects of their content. It’s a lot of rich stuff on top of plain-old NumPy.
For this reason, it’s common to want to create a Series that just “wraps” (or encloses) an underlying NumPy ndarray, and provides all that rich stuff.
The way to do this is simple:
Code \(\PageIndex{5}\) (Python):
my_numpy_array = np.array(['Ghost','Pumpkin','Vampire','Witch'])
my_pandas_enhanced_thang = pd.Series(my_numpy_array)
You can then treat my_pandas_enhanced_thang as an ordinary aggregate variable which has the more sophisticated operations of next chapter automatically glommed on to it. The keys (index values) of this thang will simply be the integers 0 through 3.
Way 4: pd.read_csv()
Finally, there’s reading data from a text file. Data typically resides in sources and files external to our programming environment, and we want to do everything we can to play ball with this open universe.
One common data format is called CSV, which stands for comma-separated values. Files in this format are normally named with a “.csv” extension. As the name suggests, the lines in such a file consist of values separated by commas. For example, suppose there’s a file called disney_rides.csv whose contents looked like this:
Pirates of the Carribean,25
Small World,20
Peter Pan,29
These are the current expected wait time (in minutes) for each of these Disney World rides at some point of the day.
To read this into Python, we use the pd.read_csv() function. It’s a bit awkward since it has several mandatory arguments if you want to deal with Series. Here’s how it works:
Code \(\PageIndex{6}\) (Python):
wait_times = pd.read_csv('disney_rides.csv', index_col=0, squeeze=True, header=None)
Most of that junk is just to memorize for now, not to fully understand. If you’re curious, index_col=0 tells Pandas that the first (0th) column – namely, the ride names – should be treated as the index for the Series. The header=None means “there is no separate header row at the top of the file, Pandas, so don’t try to treat it like one.” If our .csv file did have a summary row at the top, containing labels for the two columns, then we’d skip the header=None part. Finally, “squeeze=True” tells Pandas, “since this is so skinny anyway – just two columns – let’s have pd.read_csv() return us a Series, rather than a more complex DataFrame object (which is the subject of a later chapter).”