Hi ,
Suppose you are a data scientist working in a hospital and have produced two models to test patients for a fatal disease:
- Model A achieves a test accuracy score of 95% across the entire
population that the hospital serves but can only achieve 75% accuracy for one particular demographic subgroup that makes up 10% of the community.
- Model B, on the other hand, achieves a test accuracy score of 85% across the entire population and within all demographic subgroups served.
Which model should you recommend the hospital adopt for
diagnostic use?
👇
👇
👇
👇
👇
👇
This problem (which I first encountered online, a while back, but now can't find the source) is an ML update of the classic "trolley problem" ethical thought experiment. Data scientists are being asked to determine whether it is preferable to use a model that performs better overall to the disadvantage of a small few, versus one that performs less well overall but treats everyone the same.
It's an interesting discussion to have. Yet,
it misses one key point.
In the original trolley problem, subjects are asked to select who to save from a certain death by a runaway tram in an either/or scenario. But in data science, as in the rest of life, such situations are rare.
We can use multiple models to
solve a problem and that's typically how data scientists achieve the best results. Here, that could involve using Model B for the demographic subgroup and Model A for everyone else.
One of the big promises of AI is greater personalisation than we've ever seen before. It makes little sense for that to be achieved by forcing your model to be one size fits all.
Talk again soon,
Dr Genevieve Hayes.