Python

Python/pandas: Column value in list

I’ve been using Python’s pandas library while exploring some CSV files and although for the most part I’ve found it intuitive to use, I had trouble filtering a data frame based on checking whether a column value was in a list.

A subset of one of the CSV files I’ve been working with looks like this:
 
 
 
 
 
 

$ cat foo.csv
'Foo'
1
2
3
4
5
6
7
8
9
10

Loading it into a pandas data frame is reasonably simple:

import pandas as pd
df = pd.read_csv('foo.csv', index_col=False, header=0)
>>> df
   Foo
0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
9   10

If we want to find the rows which have a value of 1 we’d write the following:

>>> df[df['Foo'] == 1]
   Foo
0    1

Finding the rows with a value less than 7 is as you’d expect too:

>>> df[df['Foo'] < 7]
   Foo
0    1
1    2
2    3
3    4
4    5
5    6

Next I wanted to filter out the rows containing odd numbers which I initially tried to do like this:

odds = [i for i in range(1,10) if i % 2 <> 0]
>>> odds
[1, 3, 5, 7, 9]
 
>>> df[df['Foo'] in odds]
Traceback (most recent call last):
  File '<stdin>', line 1, in <module>
  File '/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/pandas/core/generic.py', line 698, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Unfortunately that doesn’t work and I couldn’t get any of the suggestions from the error message to work either. Luckily pandas has a special isin function for this use case which we can call like this:

>>> df[df['Foo'].isin(odds)]
   Foo
0    1
2    3
4    5
6    7
8    9

Much better!

Reference: Python/pandas: Column value in list from our WCG partner Mark Needham at the Mark Needham Blog blog.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button