Hi ,
What if you had no data at all to build a model? What sort of advice would you provide?
Obviously, your answer will depend on the context. But suppose you were trying to predict the optimal amount of physical activity required each day to remain healthy.
Common sense suggests that doing some activity is better than doing no activity at all.
Now suppose you had all the data you could ever possibly hope to collect. Enough to build the best model the world has ever seen.
What level of improvement, in the quality of your advice, could you realistically hope to see?
Here's the thing...
Nobody ever has no data or all the data, for building a model. What you have will lie somewhere in between.
These scenarios determine the range of results you could possibly hope to achieve - the first scenario is the baseline, while the second is the best case scenario.
No data scientist would ever say "no" to the prospect of gaining more
data.
Yet, if the advice you're currently providing, given the data you already have, is close to being best case, you've gotta ask yourself if collecting that extra data is really worth the effort.
Talk again soon,
Dr Genevieve Hayes.
p.s. This post is based on a conversation I recently had with Dr Torri Callan on Value Driven Data Science. You can listen to the entire conversation HERE.