The lighter Sarvam-30B is designed for efficient, real-time applications. It supports a context window of up to 32,000 tokens and has been trained on 16 trillion tokens. The company said the model is optimised for “efficient thinking”, delivering stronger responses while using fewer tokens — a key factor in reducing inference costs in production environments.
In benchmarks shared at the launch, Sarvam-30B was evaluated against models including Gemma 27B, Mistral-32-24B, OLMo 31.32B, Nemotron-30B, Qwen-30B and GPT-OSS-20B across tasks such as Math500, HumanEval, MBPP, Live Code Bench v6 and MMLU, which test mathematical reasoning and functional correctness. The company indicated competitive performance across general reasoning and coding benchmarks.
On the AIME benchmark — which measures mathematical reasoning under varying compute “thinking budgets” — Sarvam-30B showed improved performance as compute allocation increased, positioning it alongside other 30B-class reasoning models.
Sarvam also introduced Sarvam-105B, a higher-parameter model aimed at more complex reasoning tasks. The model supports a context length of 128,000 tokens and, according to the company, performs on par with several frontier open and closed-source models in its category.
The launch marks Sarvam’s move into larger-parameter models at a time when Indian AI startups are seeking to build foundational capabilities domestically rather than rely solely on global APIs. As enterprises prioritise cost efficiency, controllability and data residency, mid- to large-parameter open models are emerging as a viable deployment alternative.
The Lightspeed and Peak XV Partners-backed startup did not disclose pricing but said both models are built for enterprise use cases including coding assistance, research, analytics and real-time AI agents.
