Nanochat

35 points by bilsbie 14 hours ago

xnx an hour ago

22 hours ago | 256 comments: https://news.ycombinator.com/item?id=45569350

Tepix 5 hours ago

Amazingly, you can also do it on smaller hardware!

From the readme:

All code will run just fine on even a single GPU by omitting torchrun, and will produce ~identical results (code will automatically switch to gradient accumulation), but you'll have to wait 8 times longer. If your GPU(s) have less than 80GB, you'll have to tune some of the hyperparameters or you will OOM / run out of VRAM. Look for --device_batch_size in the scripts and reduce it until things fit. E.g. from 32 (default) to 16, 8, 4, 2, or even 1. Less than that you'll have to know a bit more what you're doing and get more creative.

ultimatefan1 3 hours ago

No seagull?

drcongo 2 hours ago

Pelican.

ChrisArchitect 13 hours ago

[flagged]

marmaglade 8 hours ago

It’s not, it’s a different blog post on the same thing
- ChrisArchitect 4 minutes ago
  
  The point is it's a duplicate discussion. Different article doesn't matter, especially when it hardly adds anything. The discussion is over there.
- Kiro 4 hours ago
  
  They always do that, linking to a thread for another article claiming it's a dupe.