Running big models on small hardware — local, efficient, offline-friendly.
1 tool in this category
Run 70B-parameter LLMs on a single 4GB GPU — AirLLM does layer-by-layer inference so huge models fit on modest hardware, with no quantization, distillation, or pruning required.