Hi ,
Have you ever noticed that the models built from "standard" datasets in machine learning textbooks and courses always perform so much better than any of the models you try to build using the same techniques in real life?
I always wondered what it was about these datasets (such as the Iris, Wine, and Cat and Dog Image datasets) that made them work so well, but it was only recently I figured out what it was.
The datasets used in ML textbooks and courses almost always focus exclusively on physical traits.
In the Iris dataset, it's measurements of flower parts; in the Wine dataset, it's the chemical components of wine. These characteristics are objective and the relationships between these features and the target are static. This makes the patterns easy to describe.
It's when human behaviour enters the mix that
things get messy. Yet, from a business perspective, it's social processes based entirely on human behaviour (such as buying a product or cancelling a membership) that often matter the most.
And this is where predictive modelling can fall apart.
However, it's also
where techniques drawn from the social sciences, such as causal inference, specifically designed for understanding human behaviour, can assist.
I recently had the opportunity to talk to Joanne Rodrigues, author of Product Analytics, about how social science techniques can increase your impact as a data scientist.
You can listen to our entire conversation here: Episode 47: Leveraging Causal Inference to Drive Business Value in Data Science.
Talk again soon,
Dr Genevieve Hayes.