Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
— Kunal
inherently memory bound
memory bandwidth moves more slowly than flops: 80x vs 17x
normalized price of capacity & bandwidth are increasing over time
sram is insufficient, tried by groq
ddr is becoming cheaper
notes from gpt:
For modern systems design
tco, total cost of ownership
average power consumption
carbon dioxide equivalent emmisions
performance must be meaningful
must be delivered within datacenter capacity, constrained by power, space, co2e
power & co2e are first order targets
directions in the paper
high bandwidth flash
processing near memory
3d memory logic staciking
low latency interconnect
need a roofline based simulator to explore options
+ more sharding techniques
current hardware design doesn't match what decoding nees
aside: good llm design feels like it needs a very detailed & realistic constraint solver