To understand AI capabilities across these cognitive abilities, we propose a three-stage evaluation protocol that benchmarks system performance in relation to human capabilities:
- Evaluate AI systems across a broad suite of cognitive tasks covering each ability, using held-out test sets to prevent data contamination
- Collect human baselines for the same tasks from a demographically representative sample of adults
- Map each AI system’s performance relative to the distribution of human performance in each ability
Going from theory to practice
Defining these cognitive abilities is a crucial first step, but we need more than a framework to measure progress. To put this theory into practice, we are launching a new Kaggle hackathon — “Measuring progress toward AGI: Cognitive abilities”. The hackathon encourages the community to design evaluations for five cognitive abilities where the evaluation gap is the largest: learning, metacognition, attention, executive functions and social cognition.
Participants can use Kaggle’s newly launched Community Benchmarks platform to build and test their evaluations against a lineup of frontier models.
We are offering a total prize pool of $200,000: $10,000 awards for the top two submissions in each of the five tracks, and $25,000 grand prizes for the four absolute best overall submissions. Submissions are open March 17 through April 16, and we’ll announce the results June 1. Head over to the Kaggle website to start building.