Sarvam Launches Open-Weight AI Models Built For Multilingual India

Sarvam has released two open-weight multilingual AI models—30B and 105B parameters—under an Apache 2.0 license, aiming to support India’s sovereign AI push and compete with global LLMs.

09 Mar 2026 17:31 IST

New Update

17-02-26 (32)Sarvam Launches Open-Weight AI Models

Sarvam has rolled out two multilingual large language models (LLMs), with 30 billion and 105 billion parameters, that were first introduced at the AI Impact Summit 2026 in New Delhi.

Advertisment

The models are now available under the Apache 2.0 open-source license, with weights downloadable via AIKosh and Hugging Face. Developers can also access them through Sarvam’s Indus AI chatbot app and its API developer dashboard.

The release comes as India attempts to reduce reliance on global AI providers by supporting domestic models designed for Indian languages and local enterprise use cases. Sarvam’s models were trained using GPU infrastructure provided under the Rs 10,372-crore IndiaAI Mission, with data center support from Yotta and technical support from Nvidia.

According to the company, both models were built from scratch using large-scale datasets curated internally and designed to support reasoning tasks and conversational workflows.

Architecture Designed For Efficiency

Sarvam’s models use a mixture-of-experts (MoE) transformer architecture, which activates only a fraction of the total parameters during inference. This design helps reduce computational requirements while maintaining model performance.

The 30B model is designed primarily for conversational use cases and supports a 32,000-token context window. It powers Sarvam’s conversational agent platform called Samvaad.

The 105B model, which underpins the Indus AI assistant, supports a 128,000-token context window, allowing it to handle longer prompts and multi-step reasoning tasks often needed in enterprise workflows.

Advertisment

To improve efficiency, the 30B model uses Grouped Query Attention (GQA) to reduce memory usage during inference. The larger model adopts Multi-head Latent Attention (MLA)—a technique similar to the architecture used by DeepSeek-to enable efficient long-context processing.

Sarvam said the training dataset includes code, general web data, mathematics, specialised knowledge corpora, and multilingual content. A significant portion of the training effort was dedicated to building a multilingual corpus covering the 10 most widely spoken Indian languages.

“Building these models required developing end-to-end capability across data, training, inference, and product deployment. With that foundation in place, we are ready to scale to significantly larger and more capable models, including models specialised for coding, agentic, and multimodal conversational tasks,” Sarvam wrote in a blog post.

Benchmark Performance Against Global Models

Early benchmark tests suggest the larger 105B model scales efficiently, outperforming the smaller model on several general capability tests during training. When compared with models of similar size, Sarvam said the 105B model delivered results comparable to PT-OSS 120B and Qwen3-Next (80B) on general capabilities.

The model also showed strong results in agentic reasoning and task completion, outperforming DeepSeek R1, Gemini 2.5 Flash, and o4-mini on Tau 2 Bench, a benchmark designed to evaluate multi-step task completion. However, Sarvam acknowledged that the model may not lead in every category. Its performance on SWE-Bench Verified, a benchmark used to measure code-generation capabilities, trails some competing models.

The smaller 30B model showed mixed results. Compared with Nemotron 3 Nano 30B, Sarvam’s model performed slightly better in coding and agentic reasoning benchmarks but lagged behind in tests such as Live Code Bench v6 and BrowseComp. In terms of runtime efficiency, Sarvam said the 30B model delivers 20–40% higher token throughput per second compared with Qwen3, driven by kernel and code optimisations.

Advertisment

Optimising AI For Indian Languages

One of the distinguishing features of Sarvam’s models lies in their language design. The company developed its tokeniser from scratch to support all 22 scheduled Indian languages across 12 scripts. Tokenisers break text into smaller units that models use during training and inference, and their design significantly affects efficiency and performance.

Sarvam said its tokeniser demonstrated better results on fertility scores, a metric measuring how many tokens are needed to represent a word, allowing the system to encode Indic languages more efficiently than several existing open-source tokenisers.

The company also said both models were fine-tuned using datasets covering standard and India-specific safety scenarios, including adversarial prompts discovered through automated red-teaming. These prompts were paired with policy-aligned responses during supervised training.

Advertisment

The Debate Around “Sovereign AI”

The release of Sarvam’s models comes amid growing discussion around sovereign AI, an idea gaining traction among governments that want greater control over AI infrastructure and data. India’s approach, supported by the IndiaAI Mission, focuses on building domestic models while making them accessible to developers and enterprises.

However, some observers have questioned whether a model can be truly sovereign if its weights are released publicly. Open-weight models allow developers globally to modify and deploy them freely, which raises questions about how sovereignty should be defined in the context of AI.

For Sarvam, the strategy appears to focus on local language capability, infrastructure independence, and enterprise deployment, rather than restricting model access. As Indian AI startups push into increasingly competitive territory dominated by global players such as OpenAI, Google, and Anthropic, the next phase may depend less on model size and more on real-world adoption across enterprise and government use cases.

Advertisment