Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
— Kunal
For reasons, I spent some time digging into GRPO with Claude's help. This was pretty fascinating, and I inverted the agentic workflow by asking Claude to draft me a text document that I would then code up. With a lot of help from Claude, I was able to brute force understanding GRPO and RL faster than I would have otherwise, and have some fun while doing it.
For learning more about simple NNs I'd been playing with a trivial model that should learn to count, and exploring from there. Claude recommended finetuning Qwen-2.5 to generate a random number, and then simply modifying a policy model to generate an ascending sequence of numbers.
Doing this was pretty enlightening and made it easier for me to learn: definitely not enough to be confident about policy updates, or have true intuition, but a much more mechanical understanding of the system. I suspect I'll be using this approach to learn faster in the future.
Particularly because I can repeatedly ask Claude without feeling guilty about wasting too much of its time.
I was really hoping to publish the first essay on this site site within January -- but couldn't quite finish it because I was struggling to write. Today I decided to do the first attempt on paper and that unlocked a lot of ideas and diagrams, helping me make something much more coherent -- and also explain what I've been thinking.
Tentatively hoping to publish by tomorrow before I get started on February's essay.
Another vibe-coded developer tool I've been hacking on for the past week is a way to easily take snapshots of code and experiment results, and record them into an orphan branch with git-lfs for heavier files.
I'm planning to use this extensively as I try to implement papers, do my own ablations and fun little experiments (such as building a transformer that can count numbers; or at least try to).
Finally finished reading The Notebook: I particularly liked the idea of notebooks being part of teh cognitive process (I just lived that today).
I also picked up KJ Parker's latest book Sister Svangerd and the Not Quite Dead: I generally love his books, though the protagonists all tend to be extremely self aware, smart, flawed -- and ultimately tragic. Only a few pages in so far, and I'm really enjoying the conception of EVIL as, basically, entropy -- and good as the constantly keeping it at bay, if only for a lifetime or two. Happy to find out that the next in the series will be out in May, making me much more excited about reading through this book.