Hi ,
When you're dealing with a large population, averages and
probabilities make sense.
It's perfectly valid for the average family to have 2.3 kids or for a person to have a 50% chance of dying before age 85.
But when your population size drops to 1, statistics fall apart.
It doesn't make sense to have 0.3 of a kid, and except for Schrodinger's Cat, nobody can be 50% dead.
Life insurance pricing calculations work because insurers don't care about individual outcomes - only the outcome for their entire portfolio of policyholders.
Yet, if you want to determine
the total savings needed to fund your own retirement, problems arise.
As Todd Tresidder points out in How Much Money Do I Need to Retire?:
"The day you die is not a
probabilistic outcome. Your individual lifespan has no statistical validity."
The difference comes down to context.
Here's the thing...
A golden rule of data science is models should only ever be used in their intended context - outside that context, all bets are off.
You wouldn't use a model built for US stock market predictions to predict future movements of the Australian Stock Exchange, for this very reason.
Statistics rely on large populations for validity, so if your model is based on statistics (as most machine learning models
are), that is the context in which it must be used.
The use of a statistical model outside that context is fundamentally flawed.
But what if you are interested in making predictions for a population of only one?
Well then, don't use statistics.
Not every problem can be solved using statistics. Being able to differentiate those that can from those that can't, is part of your job as a data scientist.
Talk again soon,
Dr Genevieve Hayes.
p.s. I recently had the opportunity to interview Todd Tresidder on Value Driven Data Science. You can listen to our conversation HERE.