Pandas: a Quick Reference Guide

 0 data science

Before I start, to placate readers who were expecting a blog post about panda bears, here’s a picture of pandas at play:

Pandas at play
Panda bears – more mysterious than the Python library? Certainly cuter.

 

From now on, ‘pandas’ will refer to the Python library, not the bears.

 

Motivation

Pandas is a Python library designed to help with data wrangling. I’ve been using it for a few months now, and I can’t shake the nagging feeling that I haven’t quite got the hang of it yet. For all its power and obvious usefulness, there’s something about it that I just find unintuitive. I’ve looked at a few step-by-step tutorials online about it, such as the one on Kaggle, and it still hadn’t clicked, so I decided to create an IPython Notebook as a reference guide. Initially, I was going to make a rough one for myself, but then I thought I might as well share it considering other people have complained of similar difficulties.

 

The Notebook

As it’s meant as a quick reference guide and not a tutorial, the notebook itself consists mainly of headers and code snippets, often without much explanation. Where there are caveats, gotchas, or general things to remember I’ve made additional notes.

 

I was considering pasting the text from the notebook into this post. However, it will evolve over time as I learn more about pandas, so instead, you can look at the most up-to-date version on NBViewer (an online IPython Notebook renderer) or grab it for yourself on GitHub.

 

Also, I’ve seen notebooks with dynamic tables of contents at the top, so I’ll try to figure out how to do that at some point, especially if the notebook gets unwieldy.

Leave a Reply

Your email address will not be published. Required fields are marked *