Local LLM Tools
Calculators built and verified against real local inference setups — not just theoretical numbers. Figure out what fits your GPU before you download a 15GB model and find out the hard way.
VRAM Calculator
Calculate exact VRAM needed for any model, quantization, and context length.
GGUF Size Estimator
soonEstimate GGUF file size before downloading, by quant level.
Inference Time Estimator
soonEstimate response time from tokens/sec and output length.
Context Window Cost Calculator
soonCompare local vs API cost across context lengths and providers.
Chat Template Formatter
soonWrap raw prompts in ChatML, Llama 3, Alpaca, or Qwen chat templates.
Token Counter
soonCount tokens across multiple tokenizers side by side.
Quant Format Picker
soonGet a quantization recommendation based on your VRAM and priorities.
System Prompt Token Budget
soonSee how much context window your system prompt consumes.