Critical Assessment of Function Annotation

The Critical Assessment of Function Annotation (CAFA) is an ongoing community-driven experiment designed to evaluate computational methods for protein function prediction. Organized as a recurring challenge since 2010, CAFA aims to improve the accuracy, transparency, and benchmarking of algorithms that predict the biological function of proteins, using ontologies such as the Gene Ontology (GO). By fostering open and rigorous assessments, CAFA has become a central benchmark in computational biology and bioinformatics.

Overview

CAFA assesses methods by comparing predictions made by participating teams against experimentally determined annotations that accumulate over time in public protein databases. Predictions are submitted blindly before a predefined target accumulation period, during which newly curated experimental data becomes available. This approach enables objective evaluation of methods without bias from known annotations. The goal is to assign current labels from the Gene Ontology, a structured vocabulary describing protein function,

Over the years, CAFA has included additional subchallenges such as phenotype prediction and the prediction of disease-associated genes.

History

CAFA1 (2010–2011)

CAFA1 was the inaugural challenge, launched in 2010 with results published in Nature Methods in 2013. It established the baseline for method performance and popularized the use of time-delayed evaluation in function prediction. CAFA1 demonstrated that state of teh art methods outperformed basic sequence similarity-based methods (like BLAST) but also highlighted that overall performance still lagged behind curated annotations.

CAFA2 (2013–2014)

Building on CAFA1, CAFA2 increased the scale and diversity of target proteins. It introduced improved metrics including cuustomized semantic-precision recall based scores. This round demonstrated that ensemble methods and domain-specific predictors had improved considerably. The results were published in Genome Biology in 2016.

CAFA3 (2016–2017)

CAFA3 marked a major milestone by incorporating large-scale experimental validation into the assessment pipeline. Collaborating with experimental labs, the CAFA3 organizers tested top predictions in Candida albicans, Pseudomonas aeruginosa, and Drosophila melanogaster. This direct validation approach provided biological insights and uncovered novel gene functions. Results were published in Genome Biology in 2019.

CAFA4 (2019–2020)

CAFA4 expanded its experimental reach further and introduced new model organisms. It featured more extensive phenotype prediction tasks and incorporated community-driven annotations from various resources. Methodologies involving deep learning and protein language models began to gain prominence. CAFA4 also laid the groundwork for integrative approaches combining sequence, structure, and network data.

CAFA5 (2023)

CAFA5, the most recent iteration, was held as a challenge on the Kaggle website, which dramatically increased the number of participants. The challenge saw significant performance gains across multiple function prediction categories. It also introduced new benchmarking tasks for pathogens and environmental samples. Preliminary results were presented in 2024, with a comprehensive publication expected in 2025.

References

External links

Automated Function Prediction Special Interest Group - CAFA Challenge participation information