Long Site Banner
RESEARCH COMMUNITY

EvalEval Coalition

We are a research community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.

Current Projects

RESEARCH, INFRASTRUCTURE & ORGANIZATION

Benchmark Saturation

This project aims to investigate how to systematically characterize the complexity and behavior of AI benchmarks over time, with the overarching goal of informing more robust benchmark design. The ...

Learn more →

Evaluation Cards

This project addresses the need for a structured and systematic approach to documenting AI model evaluations through the creation of "evaluation cards," focusing specifically on technical base syst...

Learn more →

Evaluation Harness and Tutorials

The Eleuther Harness Tutorials project is designed to lower the barrier to entry for using the LM Evaluation Harness, making it easier for researchers and practitioners to onboard, evaluate, and co...

Learn more →

Latest Research

BLOG & PUBLICATIONS

Resources

TOOLS, DATASETS & DOCUMENTATION

Evaluation Card

A brief summary of the project and how to fill out the card.

Learn more →

LM Evaluation Harness

Add a description of the lm-evaluation-harness.

Learn more →

Join Our Community

Researchers, practitioners, and students are welcome to contribute to our mission.

CONTACT INFO

EMAIL
evalevalpc@googlegroups.com
JOIN US
Join our slack community!
WORKING GROUPS
Research
Infrastructure
Organization
HOSTED BY
HuggingFace
University of Edinburgh
EleutherAI

FOLLOW US