09:00 - 09:30 |
Welcome and Introduction |
- Opening remarks
- Overview of workshop structure and objectives
|
09:30 - 11:00 |
Reflections on the Landscape |
- Collaborative reflection on the existing landscape
- Talks, panels, and breakouts by modality (text, images, audio, video, and multimodal data)
- Topics:
- Underlying frameworks
- Contextualization challenges
- Defining robust evaluations
- Incentive structures
|
11:00 - 11:15 |
Break |
|
11:15 - 12:45 |
Talks + Provocations |
- Invited speakers on current technical evaluations for base models across all modalities
- Key social impact categories covered:
- Bias and stereotyping
- Cultural values
- Performance disparities
- Privacy
- Financial and environmental costs
- Data moderator labor
- Presentations of accepted provocations
|
12:45 - 13:45 |
Lunch Break |
|
13:45 - 15:45 |
Group Activity |
- Participants break into groups focusing on key social impact categories
- Activities include:
- Choosing Evaluations: Determining how to select evaluations from a large repository
- Reviewing Tools and Datasets: Assessing existing artifacts and identifying gaps
- Examining construct reliability, validity, and ranking methodologies
|
15:45 - 16:00 |
Break |
|
16:00 - 17:45 |
What's Next? Documentation + Resources |
- Develop policy guidance highlighting impact categories, subcategories, and modalities requiring further investment
- Discussions on:
- Documenting Methods: Creating a proposed framework for documenting evaluations
- Developing Shareable Resources: Improving evaluation repository and conceptualizing improved resources
- Underlying Frameworks: Examining foundational frameworks influencing evaluations
- Contextualization Challenges: Identifying challenges in contextualizing evaluations across different contexts
- Defining Robust Evaluations: Establishing criteria for robust and meaningful evaluations
|
17:45 - 18:00 |
Closing Remarks |
|