← Back

Writing

Long-form articles on AI research, mechanistic interpretability, and making large models smaller and faster.

Scaling Sparse Attention for Long-Context Reasoning

A deep dive into how sparse attention patterns can extend transformer context windows while preserving reasoning quality.