Before I start, to placate readers who were expecting a blog post about panda bears, here’s a picture of pandas at play:
From now on, ‘pandas’ will refer to the Python library, not the bears.
Pandas is a Python library designed to help with data wrangling. I’ve been using it for a few months now, and I can’t shake the nagging feeling that I haven’t quite got the hang of it yet. For all its power and obvious usefulness, there’s something about it that I just find unintuitive. I’ve looked at a few step-by-step tutorials online about it, such as the one on Kaggle, and it still hadn’t clicked, so I decided to create an IPython Notebook as a reference guide. Initially, I was going to make a rough one for myself, but then I thought I might as well share it considering other people have complained of similar difficulties.
As it’s meant as a quick reference guide and not a tutorial, the notebook itself consists mainly of headers and code snippets, often without much explanation. Where there are caveats, gotchas, or general things to remember I’ve made additional notes.
I was considering pasting the text from the notebook into this post. However, it will evolve over time as I learn more about pandas, so instead, you can look at the most up-to-date version on NBViewer (an online IPython Notebook renderer) or grab it for yourself on GitHub.
- NBViewer version
- GitHub version (or you can visit the full repo, if you want to download the notebook for yourself)
Also, I’ve seen notebooks with dynamic tables of contents at the top, so I’ll try to figure out how to do that at some point, especially if the notebook gets unwieldy.