Accepted Papers
Accepted Papers
The following tiny papers have been accepted to the EvalEval Workshop at NeurIPS 2024:
Oral Presentations
-
Provocation: Who benefits from “inclusion” in Generative AI?
Samantha Dalal, Siobhan Mackenzie Hall, Nari Johnson -
(Mis)use of Nude Images in Machine Learning Research
Arshia Arya, Princessa Cintaqia, Deepak Kumar, Allison McDonald, Lucy Qin, Elissa M Redmiles -
Evaluating Refusal
Shira Abramovich, Anna Ma -
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark
Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa -
Critical human-AI use scenarios and interaction modes for societal impact evaluations
Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung -
Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models
Luxi He, Xiangyu Qi, Inyoung Cheong, Prateek Mittal, Danqi Chen, Peter Henderson -
GenAI Evaluation Maturity Framework (GEMF)
Yilin Zhang, Frank J Kanayet -
AIR-Bench 2024: Safety Evaluation Based on Risk Categories
Kevin Klyman -
Evaluating Generative AI Systems is a Social Science Measurement Challenge
Hanna Wallach, Meera Desai, Nicholas J Pangakis, A. Feder Cooper, Angelina Wang, Solon Barocas, Alexandra Chouldechova, Chad Atalla, Su Lin Blodgett, Emily Corvi, P. Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu, Stefanie Reed, Emily Sheng, Dan Vann, Jennifer Wortman Vaughan, Matthew Vogel, Hannah Washington, Abigail Z Jacobs.
Poster Presentations
-
Evaluations Using Wikipedia without Data Contamination: From Trusting Articles to Trusting Edit Processes
Lucie-Aimée Kaffee, Isaac Johnson -
Can Vision-Language Models Replace Human Annotators: A Case Study with CelebA Dataset
Haoming Lu, Feifei Zhong -
Using Scenario-Writing for Identifying and Mitigating Impacts of Generative AI
Kimon Kieslich, Nicholas Diakopoulos, Natali Helberger -
Troubling taxonomies in GenAI evaluation
Glen Berman, Ned Cooper, Wesley Deng, Ben Hutchinson -
Is ETHICS about ethics? Evaluating the ETHICS benchmark
Leif Hancox-Li, Borhane Blili-Hamelin -
Provocation on Expertise in Social Impact Evaluations for Generative AI (and Beyond)
Zoe Kahn, Nitin Kohli -
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Suhas Hariharan, Zainab Ali Majid, Jaime Raldua Veuthey, Jacob Haimes -
Contamination Report for Multilingual Benchmarks
Sanchit Ahuja, Varun Gumma, Sunayana Sitaram -
Towards Leveraging News Media to Support Impact Assessment of AI Technologies
Mowafak Allaham, Kimon Kieslich, Nicholas Diakopoulos -
Motivations for Reframing Large Language Model Benchmarking for Legal Applications
Riya Ranjan, Megan Ma -
A Framework for Evaluating LLMs Under Task Indeterminacy
Luke Guerdan, Hanna Wallach, Solon Barocas, Alexandra Chouldechova -
Dimensions of Generative AI Evaluation Design
P. Alex Dow, Jennifer Wortman Vaughan, Solon Barocas, Chad Atalla, Alexandra Chouldechova, Hanna Wallach -
Statistical Bias in Bias Benchmark Design
Hannah Powers, Ioana Baldini, Dennis Wei, Kristin Bennett -
Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models
Mazda Moayeri, Samyadeep Basu, Sriram Balasubramanian, Priyatham Kattakinda, Atoosa Chegini, Robert Brauneis, Soheil Feizi -
Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems
Emma Harvey, Emily Sheng, Su Lin Blodgett, Alexandra Chouldechova, Jean Garcia-Gathright, Alexandra Olteanu, Hanna Wallach -
Surveying Surveys: Surveys’ Role in Evaluating AI’s Labor Market Impact
Cassandra Duchan Solis -
Fairness Dynamics During Training
Krishna Patel, Nivedha Sivakumar, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff -
Democratic Perspectives and Institutional Capture of Crowdsourced Evaluations
parth sarin, Michelle Bao -
LLMs and Personalities: Inconsistencies Across Scales
Tosato Tommaso, Lemay David, Mahmood Hegazy, Irina Rish, Guillaume Dumas -
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
Nathaniel Demchak, Xin Guan, Zekun Wu, Ziyi Xu, Adriano Koshiyama, Emre Kazim