Here, I will follow Neel Nanda’s tutorial on How to Become A Mechanistic Interpretability Researcher.

Keep in mind

Do not just read things. Mech interp is a fundamentally empirical science.

Stage 1: Learning the ropes

I already have a decent understanding of linear algebra and all the mathematical tools needed.


  • Refer to Ferrando et al
  • Code yourself activation patching
  • linear probes
  • Using SAEs
  • Max Activating Dataset Examples