Local LLM Tools

Calculators built and verified against real local inference setups — not just theoretical numbers. Figure out what fits your GPU before you download a 15GB model and find out the hard way.

VRAM Calculator

Calculate exact VRAM needed for any model, quantization, and context length.

GGUF Size Estimator

soon

Estimate GGUF file size before downloading, by quant level.

Inference Time Estimator

soon

Estimate response time from tokens/sec and output length.

Context Window Cost Calculator

soon

Compare local vs API cost across context lengths and providers.

Chat Template Formatter

soon

Wrap raw prompts in ChatML, Llama 3, Alpaca, or Qwen chat templates.

Token Counter

soon

Count tokens across multiple tokenizers side by side.

Quant Format Picker

soon

Get a quantization recommendation based on your VRAM and priorities.

System Prompt Token Budget

soon

See how much context window your system prompt consumes.