Talks

5 May 2026 • Centre for AI Safety, Newcastle University

How reliable are current interpretability methods? Recent work on CoT monitoring and SAEs (co-presented with Toby Pullan)