In the AI science boom, beware: your results are only as good as your data

We are in the middle of a data-driven science boom. Huge, complex data sets, often with large numbers of individually measured and annotated ‘features’, are fodder for voracious artificial intelligence (AI) and machine-learning systems, with details of new applications being published almost daily.

But publication in itself is not synonymous with factuality. Just because a paper, method or data set is published does not mean that it is correct and free from mistakes. Without checking for accuracy and validity before using these resources, scientists will surely encounter errors. In fact, they already have.

In the past few months, members of our bioinformatics and systems-biology laboratory have reviewed state-of-the-art machine-learning methods for predicting the metabolic pathways that metabolites belong to, on the basis of the molecules’ chemical structures¹. We wanted to find, implement and potentially improve the best methods for identifying how metabolic pathways are perturbed under different conditions: for instance, in diseased versus normal tissues.

Lifeboat Foundation

Safeguarding Humanity

Blog

Feb 2, 2024