Theo Farrell
Skip to main content

Talks

AI Safety North East

5 May 2026 Centre for AI Safety, Newcastle University

How reliable are current interpretability methods? Recent work on CoT monitoring and SAEs (co-presented with Toby Pullan)