The gardener's guide to data analysis
LearningHow can you continue your journey as a data analyst after foundational training? Think of it as a long-term project, like tending to a garden.
Learn data science like you would learn jazz
LearningAspiring analysts want to know how data analysis is done. As it turns out, there are similarities between learning how to analyse data and learning how to play music, especially jazz.
How generalists learn
LearningOne of the most common questions I get from students near the end of a data science course is "what next?". How do you keep learning once you have the basics, especially if you're a generalist?
Why I now call myself a "data generalist"
Data scienceI've realised that pigeonholing myself as a data scientist, broad as that term is, doesn't work for me. Finding a good job title for myself is actually non-trivial. If you feel the same way, you may also be a "data generalist", and this post is for you.
Citizen Data Science
Data scienceWhat does "citizen data science" mean beyond being a buzzword for "everyone should do data science" (which they most certainly shouldn't)?
How to use your impostor syndrome to learn anything
Data scienceHow can you harness your impostor syndrome? Get good at being able to learn anything.
Machine Learning Haikus
Machine learningForget "machine learning in plain English". Instead, I present some of the most popular algorithms in haiku form. Consider it "machine learning for the busy".
Analysis: Is Alan Davies Getting Better at QI?
ProjectsI was watching a later series of QI recently and couldn't help but notice that Alan Davies was winning quite a few episodes. That prompted me to ask the question: is Alan Davies getting better at QI?
Visualising the Worldwide Win Percentage of the Hungarian National Football Team
ProjectsI've often read the advice that side projects should be solving problems or answering questions that you yourself are interested in. To that end, I've always wanted to know how well the Hungarian national team have done against various countries worldwide and to explore this question, I scraped the matches played by the Hungarian national team and made an interactive world map.
The World Map of the 2016 FIFA Awards
ProjectsA mini project to visualise the votes for the 2016 FIFA Awards, to see which country voted for which player.
Method Chaining in Pandas
Programming for Data ScientistsA discussion of "method chaining" in pandas. Used for better readability, or harder debugging, depending on how you look at it.
Visualising Decision Trees in Python
Machine learningHaving an accurate machine learning model may be enough in itself, but in some cases the only way to turn it into a business decision is if you can understand why it's getting the results it's getting. In this short tutorial I want to show a quick way to visualise a trained decision tree in Python.
SQL For Data Scientists
Programming for Data ScientistsSQL is a useful part of a data scientist's toolkit and it can feel like an intimidatingly big area to try and learn alongside all the other data science concepts. I want to present a few key concepts that are enough to get you up and running with SQL!
More on K-means Clustering
Machine learningIn this post I look at a practical example of k-means clustering in action, namely to draw puppies. I also touch on a couple of more general points to consider when using clustering.
Introduction to K-means Clustering
Machine learningAn introduction to the popular k-means clustering algorithm with intuition and Python code.
Turning Jupyter Notebooks into Reusable Scripts
Programming for Data ScientistsAs part of my commitment to occasionally talk about "programming for data scientists", I want to share ideas that will facilitate this to help data scientists focus on important stuff. In this post I want to share some thoughts on how to make your Jupyter notebooks easier to "productionise".
Duck Typing
Programming for Data ScientistsMy first attempt to bridge the gap between the two disciplines of programming and data science, by talking about programming concepts useful for data scientists, and vice versa. Today: duck typing.
How to Connect to Google Sheets in Python
Data scienceA quick tutorial on how to connect to Google Sheets in Python, so you can access it like a regular CSV file.
Markov Chains for Text Generation
Machine learningMarkov chains are a popular way to model sequential data. I want to run through an implementation where I generate new songs based on lyrics by Muse.
Why You Should Reinvent the Machine Learning Wheel
Machine learningAs data scientists we spend a lot of our time using other people's implementations of machine learning algorithms. I suggest that as part of the learning process it's worthwhile to try to implement them ourselves from scratch, in order to fully understand them.
Realistic Machine Learning
Machine learningAs most data scientists quickly realise, there's a difference between the kind of data science you do while learning about it, and the kind you do at a real job. This is equally true of data cleaning/wrangling and machine learning.
"Intuition First" Machine Learning
Machine learningI've often felt machine learning needs to be taught "intuition first, equations later", but this doesn't seem to be the norm with most learning sources.
Self-Organising Maps: In Depth
Machine learningIn Part 1, I introduced the concept of Self-Organising Maps (SOMs). Now in Part 2 I want to step through the process of training and using a SOM – both the intuition and the Python code. At the end I'll also present a couple of real life use cases, not just the toy example we'll use for implementation.
Self-Organising Maps: An Introduction
Machine learningWhen you learn about machine learning techniques, you usually get a selection of the usual suspects. In this post I want to introduce an often-overlooked, but (I think) very interesting and useful idea – a Self-Organising Map.
The Junk in Fallout 4 - a Web Scraping Tutorial
ProjectsThis is a short web scraping tutorial based on a script I wrote to fetch and analyse data about junk in the game Fallout 4.
Analysing London House Prices
ProjectsLondon is expensive. So much so that it's a trope now for those of us who live here. But what does the data show? Are things getting better or worse? How did the 2008 recession affect behaviour for example? I wanted to find out. With data.
Unsure how to start or continue your learning journey? Start here.
Explore ways to build enduring data skills beyond the basics.
About David
I'm a freelance data scientist, consultant, and educator with an MSc. in Data Science and a background in software and web development. I'm a generalist; my previous roles have been a range of data science, software development, and software architecting jobs.
Things I also do:
- I co-host the Half Stack Data Science podcast where we talk about the realities of data science in the business world
- I've written various articles and tutorials about data science
- I've given a selection of talks at large conferences and universities, all on similar topics of "real world data science"
- I occasionally stream some data science over on Twitch, where I take a vague project idea, a dataset, and try to come up with an answer in about an hour, explaining the code and thought process as I go.
Contact me
The best way to get in touch with me is to email me at hello@davidasboth.com