The gardener's guide to data analysis

Learning

How can you continue your journey as a data analyst after foundational training? Think of it as a long-term project, like tending to a garden.

Learn data science like you would learn jazz

Learning

Aspiring analysts want to know how data analysis is done. As it turns out, there are similarities between learning how to analyse data and learning how to play music, especially jazz.

How generalists learn

Learning

One of the most common questions I get from students near the end of a data science course is "what next?". How do you keep learning once you have the basics, especially if you're a generalist?

Why I now call myself a "data generalist"

Data science

I've realised that pigeonholing myself as a data scientist, broad as that term is, doesn't work for me. Finding a good job title for myself is actually non-trivial. If you feel the same way, you may also be a "data generalist", and this post is for you.

Citizen Data Science

Data science

What does "citizen data science" mean beyond being a buzzword for "everyone should do data science" (which they most certainly shouldn't)?

How to use your impostor syndrome to learn anything

Data science

How can you harness your impostor syndrome? Get good at being able to learn anything.

Machine Learning Haikus

Machine learning

Forget "machine learning in plain English". Instead, I present some of the most popular algorithms in haiku form. Consider it "machine learning for the busy".

Analysis: Is Alan Davies Getting Better at QI?

Projects

I was watching a later series of QI recently and couldn't help but notice that Alan Davies was winning quite a few episodes. That prompted me to ask the question: is Alan Davies getting better at QI?

Visualising the Worldwide Win Percentage of the Hungarian National Football Team

Projects

I've often read the advice that side projects should be solving problems or answering questions that you yourself are interested in. To that end, I've always wanted to know how well the Hungarian national team have done against various countries worldwide and to explore this question, I scraped the matches played by the Hungarian national team and made an interactive world map.

The World Map of the 2016 FIFA Awards

Projects

A mini project to visualise the votes for the 2016 FIFA Awards, to see which country voted for which player.

Method Chaining in Pandas

Programming for Data Scientists

A discussion of "method chaining" in pandas. Used for better readability, or harder debugging, depending on how you look at it.

Visualising Decision Trees in Python

Machine learning

Having an accurate machine learning model may be enough in itself, but in some cases the only way to turn it into a business decision is if you can understand why it's getting the results it's getting. In this short tutorial I want to show a quick way to visualise a trained decision tree in Python.

SQL For Data Scientists

Programming for Data Scientists

SQL is a useful part of a data scientist's toolkit and it can feel like an intimidatingly big area to try and learn alongside all the other data science concepts. I want to present a few key concepts that are enough to get you up and running with SQL!

More on K-means Clustering

Machine learning

In this post I look at a practical example of k-means clustering in action, namely to draw puppies. I also touch on a couple of more general points to consider when using clustering.

Introduction to K-means Clustering

Machine learning

An introduction to the popular k-means clustering algorithm with intuition and Python code.

Turning Jupyter Notebooks into Reusable Scripts

Programming for Data Scientists

As part of my commitment to occasionally talk about "programming for data scientists", I want to share ideas that will facilitate this to help data scientists focus on important stuff. In this post I want to share some thoughts on how to make your Jupyter notebooks easier to "productionise".

Duck Typing

Programming for Data Scientists

My first attempt to bridge the gap between the two disciplines of programming and data science, by talking about programming concepts useful for data scientists, and vice versa. Today: duck typing.

How to Connect to Google Sheets in Python

Data science

A quick tutorial on how to connect to Google Sheets in Python, so you can access it like a regular CSV file.

Markov Chains for Text Generation

Machine learning

Markov chains are a popular way to model sequential data. I want to run through an implementation where I generate new songs based on lyrics by Muse.

Why You Should Reinvent the Machine Learning Wheel

Machine learning

As data scientists we spend a lot of our time using other people's implementations of machine learning algorithms. I suggest that as part of the learning process it's worthwhile to try to implement them ourselves from scratch, in order to fully understand them.

Realistic Machine Learning

Machine learning

As most data scientists quickly realise, there's a difference between the kind of data science you do while learning about it, and the kind you do at a real job. This is equally true of data cleaning/wrangling and machine learning.

"Intuition First" Machine Learning

Machine learning

I've often felt machine learning needs to be taught "intuition first, equations later", but this doesn't seem to be the norm with most learning sources.

Self-Organising Maps: In Depth

Machine learning

In Part 1, I introduced the concept of Self-Organising Maps (SOMs). Now in Part 2 I want to step through the process of training and using a SOM – both the intuition and the Python code. At the end I'll also present a couple of real life use cases, not just the toy example we'll use for implementation.

Self-Organising Maps: An Introduction

Machine learning

When you learn about machine learning techniques, you usually get a selection of the usual suspects. In this post I want to introduce an often-overlooked, but (I think) very interesting and useful idea – a Self-Organising Map.

The Junk in Fallout 4 - a Web Scraping Tutorial

Projects

This is a short web scraping tutorial based on a script I wrote to fetch and analyse data about junk in the game Fallout 4.

Analysing London House Prices

Projects

London is expensive. So much so that it's a trope now for those of us who live here. But what does the data show? Are things getting better or worse? How did the 2008 recession affect behaviour for example? I wanted to find out. With data.

Unsure how to start or continue your learning journey? Start here.

Explore ways to build enduring data skills beyond the basics.

About David

I'm a freelance data scientist, consultant, and educator with an MSc. in Data Science and a background in software and web development. I'm a generalist; my previous roles have been a range of data science, software development, and software architecting jobs.

Things I also do:

  • I co-host the Half Stack Data Science podcast where we talk about the realities of data science in the business world
  • I've written various articles and tutorials about data science
  • I've given a selection of talks at large conferences and universities, all on similar topics of "real world data science"
  • I occasionally stream some data science over on Twitch, where I take a vague project idea, a dataset, and try to come up with an answer in about an hour, explaining the code and thought process as I go.

Contact me

The best way to get in touch with me is to email me at hello@davidasboth.com

I'm also on LinkedIn and Bluesky

Join my newsletter

Subscribe to get my latest articles by email and updates on my book and the podcast.

    I won't send you spam. Unsubscribe at any time.