Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
— Kunal
Kernel Work Log
2026-04-19
- Maarten Grootendorst's post about Gemma 4's architecture is pretty great, and gives me something to anchor on.
- I tried to run the model as is in sglang and immediately exhausted my tiny 1650Ti -- I suspect I may be able to do a little bit better once I get a bit more familiar with the model and inference.
- As a project Gemma4 seems like a fine resource to work on -- I'll learn multimodality, inference with MoE, and by choosing the smallest model I have some chance at making it work on cpu before I try doing GPU builds with it.
2026-04-18
- Decided to go with gemma4 and see how far I can get
2026-04-05
Picking this up again, with an attempt at implementing inference from scratch and seeing how far I can get.
General plans
-
picking a tiny model:
- run inference using an upstream library just to see it work,
- and have a baseline to compare against
- take notes as I do this, use it as an opportunity to learn the tools and inference in general
- try it out against different generations of hardware, prioritize recent hardware though
- apply papers as I go and see benchmarks move
-
then continuously refine
2025-11-16
Recording my attempts at working with Cuda, particularly relying on GPU Mode tutorials.