As the integration of (generative) AI technologies into news organisations continues apace, the need for robust, transparent, and consistent evaluation frameworks has become increasingly pressing. While the initial excitement and speculation around the transformative potential of these technologies has dominated the discourse, the hard realities of deploying these systems in high-stakes journalistic contexts are now coming into focus.

News teams are grappling with difficult questions around when, where and how to leverage AI-powered tools. What are the appropriate use cases? How can the accuracy, reliability and safety of AI-generated content be assured? What are the ethical considerations and potential unintended consequences that must be weighed? Crucially, many organisations lack clear, field-tested methodologies for systematically assessing the fitness of AI systems for various journalistic applications.

This panel will bring together four leading AI thinkers and practitioners from The New York Times, The Guardian and The Wall Street Journal to share insights, challenges and potential solutions from their organisation around the crucial task of evaluating AI technologies for newsroom deployment. We will explore frameworks for benchmarking AI systems across dimensions such as factual accuracy, editorial quality,  safety, and bias. We will also discuss the organisational processes, skill sets and resourcing required to sustain rigorous, ongoing evaluation efforts.

By shining a light on the difficulties and best practices in this space, this panel aims to catalyse a much-needed industry dialogue. Sharing stories of failure as well as success, we hope to equip other news organisations with ideas and inspiration to make thoughtful and responsible decisions about deploying AI technologies in their work.