Ultra-fast LLM inference API powered by custom Language Processing Units (LPUs)
Best for: 14,400 req/day is enough for dev and low-traffic apps - start here before paying anything
Best for: At $0
The free tier covers 14,400 requests per day, which handles most prototyping needs. Paid inference runs $0.04 to $0.27 per 1M tokens depending on model - Llama 3.1 8B is cheapest, Mixtral and 70B models cost more. No monthly minimum required.
14,400 req/day is enough for dev and low-traffic apps - start here before paying anything.
At $0.04-0.27/1M tokens, run batch workloads on smaller models like Llama 8B to keep costs minimal.
Monitor token volume weekly - Groq has no burst bypass option beyond the free tier limits, so plan accordingly.
Free tier vs. $15/mo average
Groq's token prices are among the lowest for hosted inference, but the tradeoff is a smaller model catalog than OpenAI or Anthropic.
Latency-sensitive LLM API applications
Which plan fits you
Groq's token prices are among the lowest for hosted inference, but the tradeoff is a smaller model catalog than OpenAI or Anthropic.
ComparEdge is tracking Groq pricing. No changes recorded since monitoring began.
How does Groq pricing compare?
See how Groq's 2 pricing plans stack up against similar Large Language Models tools.