Hi ,
Walk along the aisles at your local supermarket and almost every item you come across will include an
ingredient list somewhere on its packaging.
Depending on the item, the ingredient list may even go into detail about the origins of the ingredients. For example: "fair trade cocoa", "dairy-free chocolate" or "Australian-grown pineapple".
Even homemade cakes sold at charity bake-sales now commonly include a handwritten ingredient list taped to the bottom of the plate.
Yet, when it comes to
AI-based tools, the creators are frequently silent about the composition of the datasets used to train these new technologies.
I assume that GPT-4 (the model powering ChatGPT) was trained using text scraped from millions of websites, but as to which websites were used and how exactly they contribute
to a given answer - well, your guess is as good as mine.
Here's the thing...
Whether it's for ethical reasons, health reasons or just to support local industry, people like to know what goes into the items they consume.
And as flourishing markets for everything from free-range eggs to electric cars show, people are willing to pay extra, depending on what
the ingredient list says.
Savvy manufacturers have already realised an "ends justifies the means" approach is no longer necessarily the best approach to running a successful business.
How long will it take for the big tech companies to realise the same?
Talk again soon,
Dr Genevieve Hayes.
p.s. I recently spoke to Consumer Data Advocate Dr Kate Bower about the responsible sourcing of data for AI model building. You can hear the entire conversation HERE.