Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog. Kunal

2026-02-08

This has been a fairly busy week. I've settled on using this newsletter as my microblog, accumulating entries over the week (that I keep refining till Sunday), and then start all over again. It's slightly more coherent than having to manage different accounts.

Writing

Used some more travel time to continue iterating on my first essay for the year, djn. Writing the essay cleared my head a lot, particularly around vibe limits, or risk management -- it took several iterations to get the graph to a place that felt intuitive.

I've fallen out of practice with writing coherently, the essays are a good way to rebuild those muscles. This time around I'm trying to use both Claude and Codex as line editors; the last time I hired real editors but that felt a bit too expensive just for the sake of writing practice.

Given how painful writing can be I'm very annoyed it helps clear up how I think so much.

Collecting feedback from the bots

As I explore ways to learn faster I've been exploring ways to use agents to review and improve my writing, without directly rewriting anything. That way I can generally improve and make writing smoother again.

The Worst Possible Outcome would be if I internalized the ChatGPT voice, but I think I should be able to watch out for that and prevent it, and explicitly prompt the agents accordingly. So far the structural feedback has resonated well.

This week's experiment is asking Claude to help me write as a mix of some of the writers I look up to a lot.

typ.ing

A recent habit I've picked up is to quickly run through a typ.ing challenge or exercise before I start working as a quick warmup and brain reset. The accuracy and speed I hit also gives me quick feedback around how fresh and comfortable I happen to be at that point of time, which is useful.

The website is fairly wonderful and my favorite of a lot of different typing tutors / challenges I've used online: all the way from the venerable Gnu Typist to Typeracer and ZType which is satisfying on several levels. It's run by ZSA: their keyboards have always been tempting but I'm currently extremely satisfied by the Nuphy while I'm out and about and the Glove80 while I'm at a desk.

The reset exercise that prompted this post:

πŸ† Today’s typ.ing daily challenge:

🌟 Speed: 125wpm
🎯 Accuracy: 100.00%
πŸ₯‡ Position: 2 out of 87 players today
πŸ”₯ Streak: 9
πŸ“… typ.ing/daily

Learning from Claude, Flow Matching

Generally, I only trust I understand something if I can implement it.

My new favorite way to learn and internalize how a given system works is to get Claude to write out an execution plan for me. And then I go ahead and implement to the best of my ability and use Claude for debugging when I get stuck.

I used this to play with GRPO to some success with very limited time, and I've been doing the same with flow matching. It does mean I don't build as much debugging muscle up front as I'd normally do while struggling through a problem, but it significantly increases the amount of exploration I can do without repeatedly getting exhausted.

Asking for idiomatic ways to do things after writing them up also helps me improve much faster.

Learning from Claude, design constraints

I asked Claude to list general design constraints: a more up to date list of numbers everyone should know; and then to rearrange them so that everything was in GB/s so I could keep things straight in my head.

This is useful enough I wanted to make sure I captured it somewhere; though I want to play more with the numbers and results, so this week's letter it is. The rest of this section was written by Claude.

All numbers converted to GB/s of sustained throughput for a single axis of comparison. For latency-only items, effective throughput is derived from a typical access size.

The Table

Layer GB/s Category Source
B200 HBM3e 8,000 GPU Memory NVIDIA B200 Datasheet β€” 192GB HBM3e, 8 TB/s bandwidth
H100 SXM HBM3 3,350 GPU Memory NVIDIA H100 Product Page β€” 80GB HBM3, 3 TB/s+ bandwidth
A100 80GB HBM2e 2,000 GPU Memory NVIDIA A100 Datasheet (PDF) β€” 80GB HBM2e, 2,039 GB/s (SXM)
NVLink 5 (B200, per GPU) 1,800 GPU Interconnect NVIDIA B200 Datasheet β€” 1.8 TB/s bidirectional
NVLink 4 (H100, per GPU) 900 GPU Interconnect NVIDIA Hopper Architecture In-Depth β€” 900 GB/s bidirectional
NVLink 3 (A100, per GPU) 600 GPU Interconnect NVIDIA A100 Architecture Whitepaper (PDF) β€” 600 GB/s bidirectional
L1 cache (per core, x86) ~500 CPU Cache Jeff Dean / Peter Norvig Latency Numbers β€” 0.5ns per ref β‰ˆ ~500 GB/s at cache-line granularity
L2 cache (per core, x86) ~100–200 CPU Cache Jeff Dean / Peter Norvig Latency Numbers β€” ~7ns per ref
PCIe Gen5 x16 (duplex) 128 Bus Rambus PCIe 5.0 Overview β€” 32 GT/s Γ— 16 lanes, 128 GB/s aggregate duplex
PCIe Gen5 x16 (unidirectional) 64 Bus Rambus PCIe 5.0 Overview β€” 64 GB/s per direction
DDR5 server memory (8-ch) ~50–100 System Memory Typical 8-channel DDR5-4800 to DDR5-5600 server config; ~6.4 GB/s/channel Γ— 8
InfiniBand NDR 400G 50 Network (inter-node) NVIDIA DGX SuperPOD Cabling Guide β€” NDR Overview β€” 400 Gbps = 50 GB/s
NVMe SSD Gen5 sequential ~14 Storage WD_BLACK SN8100 Press Release β€” up to 14.9 GB/s read
100GbE network 12.5 Network (datacenter) 100 Gbps Γ· 8 = 12.5 GB/s (line rate)
NVMe SSD Gen4 sequential ~7 Storage Typical Gen4 x4 NVMe β€” 7 GB/s sequential read
25GbE network 3.1 Network (NIC) 25 Gbps Γ· 8 = 3.1 GB/s (line rate)
Protobuf parse throughput ~1 Serialization Estimated: 1KB in ~1ΞΌs; see Colin Scott's Latency Numbers for methodology
NVMe SSD random 4K reads ~0.25 Storage (random) Derived: ~16ΞΌs per 4KB IOP Γ— queue depth; see Jeff Dean Latency Numbers (updated SSD random read ~16ΞΌs)
HDD sequential ~0.2 Storage Typical 7200 RPM HDD sequential throughput
JSON parse throughput ~0.1 Serialization Estimated: 1KB in ~10ΞΌs; Beyond Latency Numbers Every Programmer Should Know
Single TCP flow, cross-region ~0.03–0.25 Network (WAN) Bandwidth-delay product limited: window_size / RTT. 40ms RTT with typical window sizes
HDD random 4K reads ~0.002 Storage (random) Derived: ~2–10ms seek per 4KB IOP; Jeff Dean Latency Numbers

Key Ratios

Ratio Value Architectural Implication
HBM (H100) vs InfiniBand NDR 67Γ— Tensor parallelism stays intra-node; pipeline/data parallelism goes inter-node
NVLink (H100) vs InfiniBand NDR 18Γ— Same as above β€” crossing node boundary drops ~1 order of magnitude
NVMe sequential vs HDD random 7,000Γ— SSDs changed everything for serving; random access on spinning disk is catastrophic
SSD sequential vs JSON parse 140Γ— If your hot path deserializes JSON, your serialization format is slower than your storage
L1 cache vs main memory ~500Γ— Cache-friendly data structures (contiguous arrays > linked lists) dominate performance
B200 HBM3e vs H100 HBM3 2.4Γ— Generational bandwidth improvement; keeps tensor cores fed at lower precision
NVLink 5 vs NVLink 4 2Γ— Blackwell doubles intra-node interconnect

Notes

Canonical References