My Blog

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

To reason, LMs must bind together entities in-context. How they do this is more complicated than was first thought.

23 min read · September 29, 2025

2025 · NLP Binding Interpretability AI ML
Enhancing Automated Interpretability Pipelines with Output-Centric Feature Descriptions

7 min read · January 18, 2025

2025 · NLP SAEs Interpretability AI ML
Using Sparse Autoencoders for Knowledge Erasure

Can we leverage SAEs to effectively erase knowledge from LLMs in a targeted way?

7 min read · January 17, 2025

2025 · NLP SAEs Interpretability AI ML