AI Design Constraints

All numbers converted to GB/s of sustained throughput for a single axis of comparison. For latency-only items, effective throughput is derived from a typical access size.

The Table

Layer	GB/s	Category	Source
B200 HBM3e	8,000	GPU Memory	NVIDIA B200 Datasheet — 192GB HBM3e, 8 TB/s bandwidth
H100 SXM HBM3	3,350	GPU Memory	NVIDIA H100 Product Page — 80GB HBM3, 3 TB/s+ bandwidth
A100 80GB HBM2e	2,000	GPU Memory	NVIDIA A100 Datasheet (PDF) — 80GB HBM2e, 2,039 GB/s (SXM)
NVLink 5 (B200, per GPU)	1,800	GPU Interconnect	NVIDIA B200 Datasheet — 1.8 TB/s bidirectional
NVLink 4 (H100, per GPU)	900	GPU Interconnect	NVIDIA Hopper Architecture In-Depth — 900 GB/s bidirectional
NVLink 3 (A100, per GPU)	600	GPU Interconnect	NVIDIA A100 Architecture Whitepaper (PDF) — 600 GB/s bidirectional
L1 cache (per core, x86)	~500	CPU Cache	Jeff Dean / Peter Norvig Latency Numbers — 0.5ns per ref ≈ ~500 GB/s at cache-line granularity
L2 cache (per core, x86)	~100–200	CPU Cache	Jeff Dean / Peter Norvig Latency Numbers — ~7ns per ref
PCIe Gen5 x16 (duplex)	128	Bus	Rambus PCIe 5.0 Overview — 32 GT/s × 16 lanes, 128 GB/s aggregate duplex
PCIe Gen5 x16 (unidirectional)	64	Bus	Rambus PCIe 5.0 Overview — 64 GB/s per direction
DDR5 server memory (8-ch)	~50–100	System Memory	Typical 8-channel DDR5-4800 to DDR5-5600 server config; ~6.4 GB/s/channel × 8
InfiniBand NDR 400G	50	Network (inter-node)	NVIDIA DGX SuperPOD Cabling Guide — NDR Overview — 400 Gbps = 50 GB/s
NVMe SSD Gen5 sequential	~14	Storage	WD_BLACK SN8100 Press Release — up to 14.9 GB/s read
100GbE network	12.5	Network (datacenter)	100 Gbps ÷ 8 = 12.5 GB/s (line rate)
NVMe SSD Gen4 sequential	~7	Storage	Typical Gen4 x4 NVMe — 7 GB/s sequential read
25GbE network	3.1	Network (NIC)	25 Gbps ÷ 8 = 3.1 GB/s (line rate)
Protobuf parse throughput	~1	Serialization	Estimated: 1KB in ~1μs; see Colin Scott's Latency Numbers for methodology
NVMe SSD random 4K reads	~0.25	Storage (random)	Derived: ~16μs per 4KB IOP × queue depth; see Jeff Dean Latency Numbers (updated SSD random read ~16μs)
HDD sequential	~0.2	Storage	Typical 7200 RPM HDD sequential throughput
JSON parse throughput	~0.1	Serialization	Estimated: 1KB in ~10μs; Beyond Latency Numbers Every Programmer Should Know
Single TCP flow, cross-region	~0.03–0.25	Network (WAN)	Bandwidth-delay product limited: window_size / RTT. 40ms RTT with typical window sizes
HDD random 4K reads	~0.002	Storage (random)	Derived: ~2–10ms seek per 4KB IOP; Jeff Dean Latency Numbers

Key Ratios

Ratio	Value	Architectural Implication
HBM (H100) vs InfiniBand NDR	67×	Tensor parallelism stays intra-node; pipeline/data parallelism goes inter-node
NVLink (H100) vs InfiniBand NDR	18×	Same as above — crossing node boundary drops ~1 order of magnitude
NVMe sequential vs HDD random	7,000×	SSDs changed everything for serving; random access on spinning disk is catastrophic
SSD sequential vs JSON parse	140×	If your hot path deserializes JSON, your serialization format is slower than your storage
L1 cache vs main memory	~500×	Cache-friendly data structures (contiguous arrays > linked lists) dominate performance
B200 HBM3e vs H100 HBM3	2.4×	Generational bandwidth improvement; keeps tensor cores fed at lower precision
NVLink 5 vs NVLink 4	2×	Blackwell doubles intra-node interconnect

Notes

GPU memory bandwidth numbers are peak spec. Sustained bandwidth is typically 80–90% of peak.
PCIe Gen5 x16 shows 128 GB/s duplex (both directions simultaneously) and 64 GB/s per direction. Most GPU data transfer benchmarks report per-direction.
L1/L2 cache throughput numbers are rough estimates derived from latency × cache-line size, and vary significantly by microarchitecture. The order of magnitude is what matters.
Serialization numbers are application-dependent estimates for small payloads. Real-world performance varies with schema complexity and implementation.
Cross-region TCP throughput depends heavily on window size, congestion control algorithm, and RTT. The range shown reflects typical single-flow behavior.

Canonical References

Jeff Dean / Peter Norvig — Latency Numbers Every Programmer Should Know
Colin Scott — Interactive Latency Numbers
Beyond Latency Numbers — thundergolfer.com
NVIDIA Hopper Architecture — In-Depth Technical Blog
NVIDIA A100 Whitepaper — Ampere Architecture (PDF)
NVIDIA B200 Datasheet — Blackwell Datasheet (PDF)
NVIDIA DGX SuperPOD — NDR Cabling Guide