09:00 - 09:30 |
Welcome and Introduction |
- Opening Remarks
- Overview of Workshop Structure and Objectives
|
09:30 - 10:30 |
Opening Panel: Reflections on the Landscape |
- Panel Discussion on AI Evaluation Challenges
- Panelists: Abeba Birhane, Su Lin Blodgett, Abigail Jacobs, Lee Wan Sie
- Topics:
- Underlying frameworks and incentive structures
- Defining robust evaluations and contextual challenges
- Multimodal evaluation needs (text, images, audio, video)
|
10:30 - 11:15 |
Oral Session 1: Provocations and Ethics in AI Evaluation |
- "Provocation: Who benefits from 'inclusion' in Generative AI?"
- "(Mis)use of nude images in machine learning research"
- "Evaluating Refusal"
|
11:15 - 11:30 |
Break |
|
11:30 - 12:15 |
Oral Session 2: Multimodal and Cross-Cultural Evaluation Methods |
- "JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark"
- "Critical human-AI use scenarios and interaction modes for societal impact evaluations"
- "Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models"
|
12:15 - 14:15 |
Lunch and Poster Session |
- 12:15 - 12:45 PM: Lunch setup and networking
- 12:45 - 14:15 PM: Poster presentations
|
14:15 - 15:00 |
Oral Session 3: Systematic Approaches to AI Impact Assessment |
- "GenAI Evaluation Maturity Framework (GEMF)"
- "AIR-Bench 2024: Safety Evaluation Based on Risk Categories"
- "Evaluating Generative AI Systems is a Social Science Measurement Challenge"
|
15:00 - 15:45 |
Group Activity: Building Evaluation Frameworks |
- Breakout Groups on Key Social Impact Categories
- Activities include:
- Choosing Evaluations: Selecting relevant evaluations from a large repository
- Reviewing Tools and Datasets: Assessment of current tools and gaps
- Evaluating Reliability and Validity: Exploring construct validity and ranking methods
|
15:45 - 16:00 |
Break |
|
16:00 - 17:45 |
What's Next? Coalition Development |
- Overview of Ongoing Social Impact Projects
- Interactive Discussion: Launching the Coalition on Social Impact Evaluation
- Topics include:
- Developing criteria for evaluating evaluations
- Creating proposed documentation standards
- Building out resource repositories
- Conducting reviews and publishing annual scorecards
- Establishing next steps for collaboration and coalition goals
|
17:45 - 18:00 |
Closing Remarks |
- Summary of key insights and next steps
|