A collection of interesting things I’ve read over the past month. Not necessarily published in the last month, mind you.
Cosma Shalizi: “Attention”, “Transformers”, in Neural Network “Large Language Models”
- I suspect this article gained traction for the standard statistician scoffing at ML people re-discovering something invented 60 years ago. There is indeed plenty of that: attention seems to be a reinvention of Nadaraya-Watson kernel smoothing.1
- To me, though, the most interesting part is the discussion of non-identifiability in the attention mechanism. Ultimately, it seems likely the cute example interpretations in many transformers papers are effectively nonsense.
- So many interesting topics beyond the two in the previous bullets:
- LLMs as finite-order Markov models
- Lempel-Ziv compression as a comparison to modern Transformers
- Musings about attempts to reveal chatbot system prompts
- Gopnikism: Think of LLMs not as minds but as library catalogs
- Shalizi’s “Books to Read While the Algae Grow in Your Fur” posts are the inspiration for this post’s title.
Trung Phan: Why I love Bluey (and hate Cocomelon)
- As an ML practitioner, I’m generally pretty optimistic about both generative AI and test-driven optimization. But as a parent, the gap between Bluey—a show “created by an auteur with a unique and hilarious point of view”—and Cocomelon—“perfectly engineered digital junk food”—is so, so wide. I’m skeptical we’ll see AI-generated content at the Bluey level any time soon.
Hamel Husain: Building an Audience Through Technical Writing: Strategies and Mistakes
- I previously worried too much about growing the audience for this blog and it was miserable and soul-sucking. So this time I’m going to try a trick that underlies much of Husain’s advice: I’m not going to worry about growing audience; I’m just going to write things that interest me…for myself.
- Husain says he uses speech-to-text tools (plus an LLM) to draft content. Is this common? Is it really worth the trouble? I feel like typing isn’t my bottleneck for generating more content.
Arvind Narayanan, Benedikt Ströbl, and Sayash Kapoor: Is AI progress slowing down?
The authors’ main point is not to let the mental pendulum swing too wildly; they were previously skeptical about the power of LLM scaling laws and now they’re skeptical that scaling laws have died. Seems reasonable but not particularly interesting.
Narayanan, et al make another point, though, that I think is worth deeper thought. They claim scaling inference-time compute will have the biggest impact on tasks with verifiable answers, like coding and math. For things that require knowledge or creativity, like writing or translation, the impact of inference-time scaling will likely be negligible.
But, reasoning models have not yet been optimized to work with agentic systems. Or at least they had not been when this article was written back in December. Has this changed?
All in all, it’s an exciting time to be an ML practitioner. As the authors write:
With inference scaling, capability improvements will likely be uneven and less predictable, driven more by algorithmic advances than investment in hardware infrastructure. Many ideas that were discarded during the reign of LLMs, such as those from the old planning literature, are now back in the mix, and the scene seems intellectually more vibrant than in the last few years.
Footnotes
In his defense, Shalizi is clear that while he scoffs, he is also impressed that the ML people were able to get the thing to work and the results are clearly impressive.↩︎