Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

A workshop co-located with NeurIPS 2024

Date: December 15, 2024

Room: MTG 16


Workshop Overview

Generative AI systems are becoming increasingly prevalent in society, producing text, images, audio, and video content with far-reaching implications. While the NeurIPS Broader Impact statement has notably shifted norms for AI publications to consider negative societal impact, no standard exists for approaching these impact assessments.

This workshop addresses this critical gap by bringing together experts on evaluation science and practitioners who develop and analyze technical systems. We will share existing findings, develop future directions for effective community-driven evaluations, and create comprehensive frameworks for documenting and standardizing evaluation practices.

Key Focus: Breadth of Participation

A key focus of this workshop is broadening the expertise involved in shaping evaluations. Involving all participants and stakeholders in a system, not just Machine Learning and AI experts, can yield wide benefits. By encouraging collaboration among experts, practitioners, and the wider community, the workshop aims to create more comprehensive evaluations and develop AI community resources and policy recommendations.

Workshop Objectives

  1. Share existing findings and methodologies with the NeurIPS community
  2. Collectively develop future directions for effective community-built evaluations
  3. Address barriers to broader adoption of social impact evaluation of Generative AI systems
  4. Develop policy recommendations for investment in future directions for social impact evaluations
  5. Create a framework for documenting and standardizing evaluation practices

Contents: