Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog. — Kunal

Kernel Work Log

Maarten Grootendorst's post about Gemma 4's architecture is pretty great, and gives me something to anchor on.
I tried to run the model as is in sglang and immediately exhausted my tiny 1650Ti -- I suspect I may be able to do a little bit better once I get a bit more familiar with the model and inference.
As a project Gemma4 seems like a fine resource to work on -- I'll learn multimodality, inference with MoE, and by choosing the smallest model I have some chance at making it work on cpu before I try doing GPU builds with it.

Picking this up again, with an attempt at implementing inference from scratch and seeing how far I can get.

General plans

Recording my attempts at working with Cuda, particularly relying on GPU Mode tutorials.