The Trouble with AutoML

Published: Sun, 11/10/24

Hi ,

The trouble with AutoML is it automates the part of model building people actually enjoy, while leaving the "boring bits" up to you.

What data scientist doesn't love training models? That's why Kaggle competitions are so popular.

Yet, as the data-centric ML movement has shown, greater gains can often be achieved by focussing on improving data quality and context-aware feature engineering, rather than on hyperparameter tuning and algorithm selection.

Unfortunately, data quality improvement and feature engineering are exactly the areas where AutoML struggles most and the aspects of model building no data scientist actually wants to do.

Here's the thing...

Like LLMs, AutoML is yet another tech tool designed to augment human capabilities, rather than replace them.

The problem is, when tech augments the parts of a task humans want to do and leaves the parts they don't, it can be tempting to turn a blind eye and allow the tech to do "everything" for you.

Being able to fit 39 models in a matter of minutes does have a certain appeal. But what use are 39 models if none of them produce acceptable results?

Garbage data produces garbage models regardless of who or what does the fitting.

If you want to fit models fast, use AutoML, but if you want to produce models that are good, AutoML isn't a substitute for good ol' fashion human data prep.

Talk again soon,

Dr Genevieve Hayes.

Received this from a friend? Click here to subscribe!

All past emails are available in the archive.

PO Box 250
Brighton VIC 3186
AU

Unsubscribe | Change Subscriber Options