Workshop Schedule (Tentative)

Time Session Description
09:00 - 09:30 Welcome and Introduction
  • Opening Remarks
  • Overview of Workshop Structure and Objectives
09:30 - 10:30 Opening Panel: Reflections on the Landscape
  • Panel Discussion on AI Evaluation Challenges
  • Panelists: Abeba Birhane, Su Lin Blodgett, Abigail Jacobs, Lee Wan Sie
  • Topics:
    • Underlying frameworks and incentive structures
    • Defining robust evaluations and contextual challenges
    • Multimodal evaluation needs (text, images, audio, video)
10:30 - 11:15 Oral Session 1: Provocations and Ethics in AI Evaluation
  • "Provocation: Who benefits from 'inclusion' in Generative AI?"
  • "(Mis)use of nude images in machine learning research"
  • "Evaluating Refusal"
11:15 - 11:30 Break
11:30 - 12:15 Oral Session 2: Multimodal and Cross-Cultural Evaluation Methods
  • "JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark"
  • "Critical human-AI use scenarios and interaction modes for societal impact evaluations"
  • "Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models"
12:15 - 14:15 Lunch and Poster Session
  • 12:15 - 12:45 PM: Lunch setup and networking
  • 12:45 - 14:15 PM: Poster presentations
14:15 - 15:00 Oral Session 3: Systematic Approaches to AI Impact Assessment
  • "GenAI Evaluation Maturity Framework (GEMF)"
  • "AIR-Bench 2024: Safety Evaluation Based on Risk Categories"
  • "Evaluating Generative AI Systems is a Social Science Measurement Challenge"
15:00 - 15:45 Group Activity: Building Evaluation Frameworks
  • Breakout Groups on Key Social Impact Categories
  • Activities include:
    • Choosing Evaluations: Selecting relevant evaluations from a large repository
    • Reviewing Tools and Datasets: Assessment of current tools and gaps
    • Evaluating Reliability and Validity: Exploring construct validity and ranking methods
15:45 - 16:00 Break
16:00 - 17:45 What's Next? Coalition Development
  • Overview of Ongoing Social Impact Projects
  • Interactive Discussion: Launching the Coalition on Social Impact Evaluation
  • Topics include:
    • Developing criteria for evaluating evaluations
    • Creating proposed documentation standards
    • Building out resource repositories
    • Conducting reviews and publishing annual scorecards
    • Establishing next steps for collaboration and coalition goals
17:45 - 18:00 Closing Remarks
  • Summary of key insights and next steps