Citizen Data Science
This was a popular phrase a few years back. At the peak of the data science hype cycle we somehow managed to go beyond "everyone needs data science" and start convincing ourselves that, in fact, everyone needs to be a data scientist. There was also a lot of pushback on this. After all, why should Debbie from Finance learn Python? Why should our salesmen be able to tell the difference between supervised and unsupervised learning? Clearly this was one step too far, or so we thought. At the time something bugged me about this particular debate, but I wasn't sure what. I agreed that data science should largely be restricted to its own function, not "democratised" just because it was becoming popular. My motivation wasn't gatekeeping, but a separation of concern.
What I've since realised though is that some of the skills necessary for data science, namely the ones that make a data scientist a good analyst, are actually useful across the board. Much like how computer literacy is a necessary part of life in the 21st century, knowing a few data science fundamentals will only become more important. Specifically, there are two areas where I believe widespread education would be helpful: data literacy and an automation mindset.
Data literacy
What is data literacy exactly? There need not be a societal push to teach everyone confidence intervals and probability theory. You can be data literate without linear algebra. What most people need is an intuition to always be sceptical about data-driven findings, question every result, and put it in the wider context. Does this chart show what it claims to show? Do these reported numbers make sense, and if they seem plausible, is the process that generated them trustworthy and repeatable? Are there truly fewer Covid-19 cases over the weekend, or is it just that numbers aren't updated and reported until Monday?
All we need to do is let people flex this particular muscle. Discuss case studies, ones that contain realistic ambiguity. Most business decisions are made under uncertainty, regardless of the quality and quantity of data analysis. We need to better understand and embrace this. Data will not tell us what to do, it can only give us suggestions, and we need to be equipped to make the most of these. I argue this isn't a skill that only the biggest decision makers, the C-level execs need, this is something with which we should all get comfortable.
So that's fine, our "citizens" don't need mathematical training, just an intuition for how to use data in the right way. But what about technical skills? How much programming for example do people need? Clearly not everyone needs to know how to train and deploy a machine learning model (although exposure to this process can't hurt, if only to help demystify it) but we also can't ignore the need for some technical training that people can genuinely benefit from. The balance is struck, in my opinion, when we focus on teaching programming for automation.
Automation
Automation is a superpower. If you have any sort of repetitive task that is part of your day-to-day, business as usual work, imagine how great it would be if the computer could do it for you. No, it wouldn't result in the computers taking away your job. I know people genuinely fear that, I get it. The right way to think about it is that automating the boring stuff (see the book/online course of the same name) frees you, the human, up to do the things that computers can't. Build human relationships. Consider nuance. See the bigger picture.
Students of mine have already heard me on my soapbox about this. Learn just enough programming (Python or otherwise) to automate the boring stuff. The more practice you have with this skill, the more you will see it everywhere, and the smaller your tolerance will be for manually doing repetitive tasks. You should get annoyed that a stakeholder demands a particular report emailed to them every Monday. Find a way to take yourself out of the loop. Got a bunch of spreadsheets to download from different web pages and collate once a week? Write a script for it.
So what do we call this thing?
Let's not call it "citizen data science", that sounds patronising to me. It probably doesn't even need a name (but then how will it ever go viral, David??). After the somewhat misguided idea of citizen data science came and went we're left with some key takeaways. There is a reason to take elements of data science beyond the data science team. We don't want any gatekeeping ("only data scientists should analyse data") and everyone benefits from data literacy and some basic coding knowledge.
About David
I'm a freelance data scientist consultant and educator with an MSc. in Data Science and a background in software and web development. My previous roles have been a range of data science, software development, team management and software architecting jobs.