Focused tools for engineers and financial analysts evaluating LLM inference infrastructure.
3 questions. Get GPU count, cost comparison, and a shareable summary.
All inputs, HF model lookup, optimisation tiers (FP8, prefix cache, llm-d), cost model, formulas.
Compare NVIDIA and AMD GPUs by VRAM, throughput, and price. Interactive bubble chart.
Cloud vs on-prem vs dedicated node costs. Drag the split slider to find your crossover point.
Model semantic routing economics. Compare API spend vs self-hosted. Calculate payback period.
Compare workloads side by side. Export to PDF, Excel, or Google Sheets.
Every time someone asked me how many GPUs they needed to serve an LLM, I opened the same spreadsheet and ran the same math. KV cache per user, available pool after weights, replica count, cost per hour — it took 20 minutes to explain.
I built gpu.calc so that conversation takes 3 questions instead of 20 minutes. The hard math runs automatically. The assumptions are visible and editable. The result is shareable.
— Vikas Grover, AI Infrastructure · LLM Inference
Edit cost inputs in the left panel → Cost inputs section
Compare workloads side by side. Save from the Advanced calculator.
All numbers editable · click any blue value to change it
— not run yet —
— not run yet —
LLM inference planning — compare GPU generations by memory, bandwidth, and cost efficiency
Enter your admin token to view analytics
Set token once: localStorage.setItem('admin_token','yourkey')