Core pandas Data Structures

There are three main pandas data structures: Series, DataFrame, and Index. Each of them is a subclass of numpy.ndarray. This means that the functionality you have learned about numpy arrays still applies to these data structures, but these pandas objects have additional, data-specific features. We will provide some high-level principles that apply to each of them and then dig into their specific features. (Note that many of the materials in this section come from the Intro to Data Structures page in the pandas documentation.)

One very important thing to understand about pandas data structures is that the they are like numpy arrays, but they have axis labels. Think of them more like a working version of an Excel spreadsheet where you have labels in the first row and column and then the data corresponding to those labels on the inside. An important characteristics about pandas objects is that the link between the data and the labels is intrinsic, meaning that the labels are actually a part of the data and that link will not be broken unless you explicitly break it.

Throughout the rest of the lab, the code examples assume that you have made the following imports.

In [183]: import numpy as np

In [184]: from numpy.random import randn

In [185]: import pandas as pd

In [186]: from pandas import Series, DataFrame, Index

Series

The Series can be thought of as a labeled one dimensional array. It consists if a single dimension of data and an Index containing the labels. The data in a Series can be of any data type: float, int, string, matplotlib figures, and any other python object.

There are many different ways you can create a Series. Two of the most common are to create a Series using a numpy array or a python list.

# Using a numpy array
In [187]: s1 = Series(randn(5))

In [188]: s1
Out[188]: 
0    0.496714
1   -0.138264
2    0.647689
3    1.523030
4   -0.234153
dtype: float64

In [189]: s1.index
Out[189]: Int64Index([0, 1, 2, 3, 4], dtype=int64)

In [190]: s1.values
Out[190]: array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337])

# Using a python list
In [191]: s2 = Series([i ** 2 for i in xrange(1, 6)])

In [192]: s2
Out[192]: 
0     1
1     4
2     9
3    16
4    25
dtype: int64

In [193]: s2.index
Out[193]: Int64Index([0, 1, 2, 3, 4], dtype=int64)

In [194]: s2.values
Out[194]: array([ 1,  4,  9, 16, 25])

Notice that each of these Series objects has both index and values attributes. The index is an Index type and values is the underlying numpy array holding the data. Below we will show a few more ways we could have created a Series.

# We can explicitly specify the index
In [195]: s3 = Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [196]: s3
Out[196]: 
a   -0.234137
b    1.579213
c    0.767435
d   -0.469474
e    0.542560
dtype: float64

# Create using a dict (note the data are actual python lists)
# Notice the dict keys become the index
In [197]: s4 = Series({'first': [1, 2], 'second':[2, 4]})

In [198]: s4
Out[198]: 
first     [1, 2]
second    [2, 4]
dtype: object

In [199]: s4.values
Out[199]: array([[1, 2], [2, 4]], dtype=object)

In [200]: s4.index
Out[200]: Index([first, second], dtype=object)

# Passing a dict and index uses the index we pass, but fills values from the dict.
# notice NaN that appears because we 'd' in the index, but not in the dict
In [201]: s5 = Series({'a' : 0., 'b' : 1., 'c' : 2.}, index=['b', 'c', 'd', 'a'])

In [202]: s5
Out[202]: 
b     1
c     2
d   NaN
a     0
dtype: float64

Once we have created a Series object, there is a lot that we can do with it. Below we will demonstrate that they act a lot like their parent class numpy.ndarray.

In [203]: s3
Out[203]: 
a   -0.234137
b    1.579213
c    0.767435
d   -0.469474
e    0.542560
dtype: float64

In [204]: s3[0]
Out[204]: -0.23413695694918055

In [205]: s3[:3]
Out[205]: 
a   -0.234137
b    1.579213
c    0.767435
dtype: float64

In [206]: s3[s3 > s3.mean()]
Out[206]: 
b    1.579213
c    0.767435
e    0.542560
dtype: float64

In [207]: s3[[2, 3, 1, -1, 1]]
Out[207]: 
c    0.767435
d   -0.469474
b    1.579213
e    0.542560
b    1.579213
dtype: float64

In [208]: np.exp(s3)
Out[208]: 
a    0.791253
b    4.851136
c    2.154233
d    0.625331
e    1.720406
dtype: float64

Because Series objects have a labeled index, they also behave a lot like python dictionaries:

In [209]: s3['a']
Out[209]: -0.23413695694918055

In [210]: s3['b':'d']
Out[210]: 
b    1.579213
c    0.767435
d   -0.469474
dtype: float64

In [211]: 'e' in s3
Out[211]: True

The same vectorized operations that you are used to doing on numpy.ndarray s can be done on Series. The BIG difference between arrays and series objects when it comes to arithmetic is that because axis labels are an intrinsic property, data is aligned every operation. The example below will demonstrate.

In [212]: s3[1:] + s3[:-1]
Out[212]: 
a         NaN
b    3.158426
c    1.534869
d   -0.938949
e         NaN
dtype: float64

Notice that the ‘a’ and ‘e’ (or 1, -1 if doing integer indexing) values are NaN. This happened because the slice s3[1:] doesn’t contain the ‘a’ element and the other slice doesn’t contain ‘e’. Therefore, when doing the sum of the two slices we get NaN values where one of the Series doesn’t have data. This illustrates a more general principle. The combination of Series objects, via arithmetic operations or otherwise, creates a new Series where the index is the union of the index of the former Series. The adoption of this new index will lead to NaN values if any of the objects being combined don’t have data for a particular index. This might seem like unwanted behavior, but being able to write code without doing any explicit data alignment provides great freedom and flexibility and is one feature of pandas that really distinguishes it from the other domain specific platforms.

The last main thing to talk about with regards to Series is that they have a name attribute. This will show up below the printout when you ask python to print or otherwise display a series and is used in various other pandas tasks like combining Series objects into a DataFrame (more to come on DataFrame s next!).

In [213]: s3
Out[213]: 
a   -0.234137
b    1.579213
c    0.767435
d   -0.469474
e    0.542560
dtype: float64

In [214]: print(s3.name)
None

In [215]: s3.name = 'letter_ind'

In [216]: s3
Out[216]: 
a   -0.234137
b    1.579213
c    0.767435
d   -0.469474
e    0.542560
Name: letter_ind, dtype: float64

In [217]: print(s3.name)
letter_ind

DataFrame

A DataFrame is a 2-dimensional labeled data structure. It is very much like a Series, but it can have multiple columns with associated labels. For this reason, you can think of a DataFrame as being like a spreadsheet or SQL table. It is by far the most commonly used pandas object, as a Series is simply a DataFrame with only one column.

Like a Series, there are many ways to create a DataFrame. We will illustrate some of them below.

# Create DataFrame from 2d-numpy array
In [218]: df1 = DataFrame(randn(4, 4))

In [219]: df1
Out[219]: 
          0         1         2         3
0 -0.463418 -0.465730  0.241962 -1.913280
1 -1.724918 -0.562288 -1.012831  0.314247
2 -0.908024 -1.412304  1.465649 -0.225776
3  0.067528 -1.424748 -0.544383  0.110923

# Give explicit index for columns using columns= keyword argument
In [220]: df2 = DataFrame(np.arange(16).reshape(4, 4), columns=list('abcd'))

In [221]: df2
Out[221]: 
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15

In [222]: s1
Out[222]: 
0    0.496714
1   -0.138264
2    0.647689
3    1.523030
4   -0.234153
dtype: float64

In [223]: s2
Out[223]: 
0     1
1     4
2     9
3    16
4    25
dtype: int64

# Create DataFrame from dict of Series
In [224]: df3 = DataFrame({'one': s1, 'two':s2})

In [225]: df3
Out[225]: 
        one  two
0  0.496714    1
1 -0.138264    4
2  0.647689    9
3  1.523030   16
4 -0.234153   25

# Note: the union of the two indecies becomes the DataFrame Index and
#       missing values at an index result in NaN
In [226]: df4 = DataFrame({'s3': s3, 's5':s5})

In [227]: df4
Out[227]: 
         s3  s5
a -0.234137   0
b  1.579213   1
c  0.767435   2
d -0.469474 NaN
e  0.542560 NaN

# Use list of numpy arrays. Note each array becomes a row
In [228]: ar1 = np.arange(1, 6)

In [229]: ar2 = ar1 * .1

In [230]: df5 = DataFrame([ar1, ar2], index=['ar1', 'ar2'])

There are other ways to create both DataFrame and Series objects, and I encourage you to test out different ways as well as browse docstrings and documentation.

Indexing (getting slices or chunks of data) a DataFrame is very similar to indexing a Series. An important thing to keep in mind is that the [] syntax indexes along the columns.

In [231]: data = {'state': ['Utah', 'Arizona', 'Idaho', 'Nevada', 'Colorado'],
   .....:         'year': [2000, 2001, 2002, 2001, 2002],
   .....:         'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
   .....:

In [232]: df = DataFrame(data, index=['one', 'two', 'three', 'four', 'five'])

In [233]: df
Out[233]: 
       pop     state  year
one    1.5      Utah  2000
two    1.7   Arizona  2001
three  3.6     Idaho  2002
four   2.4    Nevada  2001
five   2.9  Colorado  2002

# Access using dict notation
In [234]: df['state']
Out[234]: 
one          Utah
two       Arizona
three       Idaho
four       Nevada
five     Colorado
Name: state, dtype: object

# Access using "dot" notation
In [235]: df.year
Out[235]: 
one      2000
two      2001
three    2002
four     2001
five     2002
Name: year, dtype: int64

# Get slice using boolean array (actually a boolean Series)
In [236]: year2001 = df.year == 2001

In [237]: year2001
Out[237]: 
one      False
two       True
three    False
four      True
five     False
Name: year, dtype: bool

In [238]: df[year2001]
Out[238]: 
      pop    state  year
two   1.7  Arizona  2001
four  2.4   Nevada  2001

If instead you want to slice along the rows you use the special .ix field

# Access using row label
In [239]: df.ix['one']
Out[239]: 
pop       1.5
state    Utah
year     2000
Name: one, dtype: object

# Slice along Row labels
In [240]: df.ix['two':'five']
Out[240]: 
       pop     state  year
two    1.7   Arizona  2001
three  3.6     Idaho  2002
four   2.4    Nevada  2001
five   2.9  Colorado  2002

# list of row labels
In [241]: df.ix[['one', 'three', 'four']]
Out[241]: 
       pop   state  year
one    1.5    Utah  2000
three  3.6   Idaho  2002
four   2.4  Nevada  2001

# Get slices with integers instead of row labels
In [242]: df.ix[1]
Out[242]: 
pop          1.7
state    Arizona
year        2001
Name: two, dtype: object

# An integer range slice
In [243]: df.ix[1:3]
Out[243]: 
       pop    state  year
two    1.7  Arizona  2001
three  3.6    Idaho  2002

# list of integers
In [244]: df.ix[[1, 3, 4]]
Out[244]: 
      pop     state  year
two   1.7   Arizona  2001
four  2.4    Nevada  2001
five  2.9  Colorado  2002

It should be noted that when you have integer row labels, indexing by label or indexing by position can be ambiguous. To overcome this, it is recommended you learn to use loc for row label and iloc for row position. Note that the default behavior is to use label indexing first and fall back on positional indexing if row lookup fails. Note that the loc and iloc methods are new in pandas 0.11. If you don’t have them you should first check your pandas version using pd.__version__ and if you don’t have \(\ge 0.11\) you should update (ask Spencer if you don’t know how to do this).

In [245]: ambiguous = DataFrame(randn(4, 4), index=[1, 1, 3, 4])

In [246]: ambiguous
Out[246]: 
          0         1         2         3
1 -1.150994  0.375698 -0.600639 -0.291694
1 -0.601707  1.852278 -0.013497 -1.057711
3  0.822545 -1.220844  0.208864 -1.959670
4 -1.328186  0.196861  0.738467  0.171368

In [247]: ambiguous.ix[1]
Out[247]: 
          0         1         2         3
1 -1.150994  0.375698 -0.600639 -0.291694
1 -0.601707  1.852278 -0.013497 -1.057711

In [248]: ambiguous.loc[1]
Out[248]: 
          0         1         2         3
1 -1.150994  0.375698 -0.600639 -0.291694
1 -0.601707  1.852278 -0.013497 -1.057711

In [249]: ambiguous.iloc[1]
Out[249]: 
0   -0.601707
1    1.852278
2   -0.013497
3   -1.057711
Name: 1, dtype: float64

You can create new columns using dictionary syntax

# Notice single value is broadcasted, like in numpy arrays
In [250]: df['budget'] = '?'

# Could also give list where length is equal to index
In [251]: df['GDP'] = [20, 34., 11, 26, 13]

# Can create new columns using existing ones
In [252]: df['perCapita'] = df.GDP / df['pop']

In [253]: df
Out[253]: 
       pop     state  year budget  GDP  perCapita
one    1.5      Utah  2000      ?   20  13.333333
two    1.7   Arizona  2001      ?   34  20.000000
three  3.6     Idaho  2002      ?   11   3.055556
four   2.4    Nevada  2001      ?   26  10.833333
five   2.9  Colorado  2002      ?   13   4.482759

The same note about data alignment and NaN values when combining Series is also true when combining DataFrame objects. In this case, however, the data alignment happens for both the row and column labels.

In [254]: df6 = DataFrame(randn(10, 4), columns=list('ABCD'))

In [255]: df7 = DataFrame(randn(7, 3), columns=list('ABC'))

In [256]: df6 + df7
Out[256]: 
          A         B         C   D
0  1.362246 -0.819374 -2.287016 NaN
1 -0.962396  1.972524  0.672369 NaN
2 -0.205676  0.128185 -0.579844 NaN
3  1.999645  0.229227 -1.166880 NaN
4 -0.060845 -0.487970 -0.183054 NaN
5 -0.845280 -1.191093  0.577939 NaN
6 -1.487381  0.582888  0.018922 NaN
7       NaN       NaN       NaN NaN
8       NaN       NaN       NaN NaN
9       NaN       NaN       NaN NaN

The transpose attribute of a DataFrame simply swaps rows and columns.

In [257]: df
Out[257]: 
       pop     state  year budget  GDP  perCapita
one    1.5      Utah  2000      ?   20  13.333333
two    1.7   Arizona  2001      ?   34  20.000000
three  3.6     Idaho  2002      ?   11   3.055556
four   2.4    Nevada  2001      ?   26  10.833333
five   2.9  Colorado  2002      ?   13   4.482759

In [258]: df.T
Out[258]: 
                one      two     three      four      five
pop             1.5      1.7       3.6       2.4       2.9
state          Utah  Arizona     Idaho    Nevada  Colorado
year           2000     2001      2002      2001      2002
budget            ?        ?         ?         ?         ?
GDP              20       34        11        26        13
perCapita  13.33333       20  3.055556  10.83333  4.482759

Index

You have already seen and dealt with Index objects, but we have barely scratched the surface of what Index objects can be or do. The following list gives a few important properties regarding Index objects

  • They are immutable, meaning once they are created, the user cannot change them. This is very desirable because immutable indexes preserve the intrinsic link between data and labels and allow you to safely share Index objects across data.
  • They behave a lot like the python Set type and respond to logic like x in index.

One very useful thing about Index objects is that they can be hierarchal, or have multiple levels. This is best understood through example. Below we will create a hierarchal index and apply it to a Series.

In [259]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
   .....:           ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
   .....:

In [260]: tuples = zip(*arrays)

In [261]: h_ind = pd.MultiIndex.from_tuples(tuples, names=['level1', 'level2'])

In [262]: hi_s = Series(randn(8), index=h_ind)

In [263]: hi_f = DataFrame(randn(8, 4), index=h_ind)

Listing out all the labels and their multiplicities before passing to zip and from_tuples is definitely not the only way to create a MultiIndex. I will show a few more of my favorites below.

# Use cartesian product of a few lists. Note this index has 3 levels
In [264]: from itertools import product

In [265]: words = ['hi', 'hello', 'hola']

In [266]: nums = [10, 20]

In [267]: letters = ['a', 'b']

In [268]: ind = pd.MultiIndex.from_tuples(list(product(words, nums, letters)),
   .....:                                 names=['word', 'num', 'let'])
   .....:

In [269]: hi_f2 = DataFrame(randn(12, 3), columns=['A', 'B', 'C'], index=ind)

# have column names of DataFrame become outer level of series index using unstack()
In [270]: df3
Out[270]: 
        one  two
0  0.496714    1
1 -0.138264    4
2  0.647689    9
3  1.523030   16
4 -0.234153   25

In [271]: hi_f3 = df3.unstack()

In [272]: hi_f3
Out[272]: 
one  0     0.496714
     1    -0.138264
     2     0.647689
     3     1.523030
     4    -0.234153
two  0     1.000000
     1     4.000000
     2     9.000000
     3    16.000000
     4    25.000000
dtype: float64

# Use index from other DataFrame
In [273]: hi_f4 = DataFrame(np.arange(30).reshape(10, 3), index=hi_f3.index)

In [274]: hi_f4
Out[274]: 
        0   1   2
one 0   0   1   2
    1   3   4   5
    2   6   7   8
    3   9  10  11
    4  12  13  14
two 0  15  16  17
    1  18  19  20
    2  21  22  23
    3  24  25  26
    4  27  28  29

To index into a hierarchal indexed object, just use the .ix field with multiple slice categories, one for each level, starting at the outermost level. Note if you don’t give inner slices, the whole level is taken. (If some of the examples don’t work it is because I use the development version of pandas and there may be new features. Don’t worry about it for now).

In [275]: hi_f2
Out[275]: 
                      A         B         C
word  num let                              
hi    10  a    0.184634  0.259883  0.781823
          b   -1.236951 -1.320457  0.521942
      20  a    0.296985  0.250493  0.346448
          b   -0.680025  0.232254  0.293072
hello 10  a   -0.714351  1.865775  0.473833
          b   -1.191303  0.656554 -0.974682
      20  a    0.787085  1.158596 -0.820682
          b    0.963376  0.412781  0.822060
hola  10  a    1.896793 -0.245388 -0.753736
          b   -0.889514 -0.815810 -0.077102
      20  a    0.341152  0.276691  0.827183
          b    0.013002  1.453534 -0.264657

In [276]: hi_f2.ix['hi']
Out[276]: 
                A         B         C
num let                              
10  a    0.184634  0.259883  0.781823
    b   -1.236951 -1.320457  0.521942
20  a    0.296985  0.250493  0.346448
    b   -0.680025  0.232254  0.293072

In [277]: hi_f2.ix['hi', :, :]
Out[277]: 
                A         B         C
num let                              
10  a    0.184634  0.259883  0.781823
    b   -1.236951 -1.320457  0.521942
20  a    0.296985  0.250493  0.346448
    b   -0.680025  0.232254  0.293072

In [278]: hi_f2.ix['hi', 10, 'b']
Out[278]: 
A   -1.236951
B   -1.320457
C    0.521942
Name: (hi, 10, b), dtype: float64

# Slicing works too, but......
In [279]: hi_f2.ix['hi':'hello']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-279-e0483a8ad44c> in <module>()
----> 1 hi_f2.ix['hi':'hello']
/Users/spencerlyon2/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
     46             return self._getitem_tuple(key)
     47         else:
---> 48             return self._getitem_axis(key, axis=0)
     49 
     50     def _get_label(self, label, axis=0):
/Users/spencerlyon2/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
    389         labels = self.obj._get_axis(axis)
    390         if isinstance(key, slice):
--> 391             return self._get_slice_axis(key, axis=axis)
    392         elif _is_list_like(key) and not (isinstance(key, tuple) and
    393                                          isinstance(labels, MultiIndex)):
/Users/spencerlyon2/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_slice_axis(self, slice_obj, axis)
    636         else:
    637             try:
--> 638                 indexer = labels.slice_indexer(start, stop, slice_obj.step)
    639             except Exception:
    640                 if _is_index_slice(slice_obj):
/Users/spencerlyon2/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in slice_indexer(self, start, end, step)
   1178         This function assumes that the data is sorted, so use at your own peril
   1179         """
-> 1180         start_slice, end_slice = self.slice_locs(start, end)
   1181         return slice(start_slice, end_slice, step)
   1182 
/Users/spencerlyon2/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in slice_locs(self, start, end, strict)
   2224             if not isinstance(start, tuple):
   2225                 start = start,
-> 2226             start_slice = self._partial_tup_index(start, side='left')
   2227 
   2228         if end is None:
/Users/spencerlyon2/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in _partial_tup_index(self, tup, side)
   2238         if len(tup) > self.lexsort_depth:
   2239             raise KeyError('MultiIndex lexsort depth %d, key was length %d' %
-> 2240                            (self.lexsort_depth, len(tup)))
   2241 
   2242         n = len(tup)
KeyError: 'MultiIndex lexsort depth 0, key was length 1'

What happened in the last example? The problem with taking slices of indexes with string labels is that they labels need to be sorted lexicographically in order for pandas to be able to slice them up. Luckily for us, pandas will sort the index for us.

In [280]: hi_f2 = hi_f2.sort_index()

In [281]: hi_f2
Out[281]: 
                      A         B         C
word  num let                              
hello 10  a   -0.714351  1.865775  0.473833
          b   -1.191303  0.656554 -0.974682
      20  a    0.787085  1.158596 -0.820682
          b    0.963376  0.412781  0.822060
hi    10  a    0.184634  0.259883  0.781823
          b   -1.236951 -1.320457  0.521942
      20  a    0.296985  0.250493  0.346448
          b   -0.680025  0.232254  0.293072
hola  10  a    1.896793 -0.245388 -0.753736
          b   -0.889514 -0.815810 -0.077102
      20  a    0.341152  0.276691  0.827183
          b    0.013002  1.453534 -0.264657

In [282]: hi_f2.ix['hello':'hi']
Out[282]: 
                      A         B         C
word  num let                              
hello 10  a   -0.714351  1.865775  0.473833
          b   -1.191303  0.656554 -0.974682
      20  a    0.787085  1.158596 -0.820682
          b    0.963376  0.412781  0.822060
hi    10  a    0.184634  0.259883  0.781823
          b   -1.236951 -1.320457  0.521942
      20  a    0.296985  0.250493  0.346448
          b   -0.680025  0.232254  0.293072

In [283]: hi_f2.ix[('hello', 10, 'a'):('hi', 20, 'b')]
Out[283]: 
                      A         B         C
word  num let                              
hello 10  a   -0.714351  1.865775  0.473833
          b   -1.191303  0.656554 -0.974682
      20  a    0.787085  1.158596 -0.820682
          b    0.963376  0.412781  0.822060
hi    10  a    0.184634  0.259883  0.781823
          b   -1.236951 -1.320457  0.521942
      20  a    0.296985  0.250493  0.346448
          b   -0.680025  0.232254  0.293072

One very common operation to perform on a hierarchal indexed data set is to take cross sections. This is done using the xs method and is given two arguments: 1.) The name of the row label(s) you would like, 2.) the level at at which they appear in the index.

# Specify level using integer
In [284]: hi_f2.xs(10, level=1)
Out[284]: 
                  A         B         C
word  let                              
hello a   -0.714351  1.865775  0.473833
      b   -1.191303  0.656554 -0.974682
hi    a    0.184634  0.259883  0.781823
      b   -1.236951 -1.320457  0.521942
hola  a    1.896793 -0.245388 -0.753736
      b   -0.889514 -0.815810 -0.077102

# Or specify level using the level name for same result.
In [285]: hi_f2.xs(10, level='num')
Out[285]: 
                  A         B         C
word  let                              
hello a   -0.714351  1.865775  0.473833
      b   -1.191303  0.656554 -0.974682
hi    a    0.184634  0.259883  0.781823
      b   -1.236951 -1.320457  0.521942
hola  a    1.896793 -0.245388 -0.753736
      b   -0.889514 -0.815810 -0.077102

# pass multiple row labels and levels to get all 'hello', 'b' combinations
In [286]: hi_f2.xs(['hello', 'b'], level=[0, 'let'])
Out[286]: 
            A         B         C
num                              
10  -1.191303  0.656554 -0.974682
20   0.963376  0.412781  0.822060

This has been a very brief overview of the main data structures in pandas. We will be using them more later so if you do not feel like you have a good understanding of them, consult the pandas documentation

Return to the main pandas lab page.

This Page