/ciol/media/media_files/2025/12/02/deepseek-2025-12-02-11-26-40.png)
DeepSeek, a China-based AI startup, has released two new reasoning-first models,V3.2 and V3.2-Special, that the company says rival the latest offerings from OpenAI and Google. The announcement highlights improved agentic reasoning, a new “thinking in tool-use” approach, and benchmark results that include gold-level performance on international math and informatics Olympiads.
DeepSeek positions V3.2 as the successor to its V3.2-Exp build and describes both models as “reasoning-first” systems intended for agentic workloads. The standard V3.2 is available on the company’s app, web, and API; the higher-compute V3.2-Speciale is for API users and, according to DeepSeek, focuses on maximal reasoning capacity. The company published a technical brief and release notes alongside the rollout.
Benchmarks and claims
DeepSeek claims that V3.2 delivers GPT-5-level performance on general tasks, while V3.2-Speciale matches or exceeds Gemini-3 Pro on complex reasoning benchmarks. Notably, DeepSeek reports gold-medal level results on 2025 competitions such as the International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI), and top scores on contest platforms such as CodeForces and other math contest benchmarks. These claims come from DeepSeek’s published test summaries and model pages. Independent verification by third parties will be important to validate these results at scale.
Thinking in tool-use
A headline feature in the release is what DeepSeek calls “thinking in tool-use”. The company describes a training pipeline that synthesises massive agent training data (over 1,800+ environments and 85k+ complex instructions) to help models reason about tool use within multi-step tasks. V3.2 is claimed to support tool thinking in both “thinking” and “non-thinking” modes; V3.2-Speciale is currently API-only and restricted from tool calls for community evaluation. These engineering choices signal an emphasis on agentic capabilities rather than only conversational fluency.
Availability and practical limits
V3.2 replaced the earlier V3.2-Experimental build as the default model on DeepSeek’s app and web clients; V3.2-Speciale is available for API access under a temporary endpoint (DeepSeek has noted an expiry for the special endpoint). The company warns that Speciale requires higher token usage and is currently API-only while community evaluation and research progress. Users should expect different cost and compute profiles between the two models.
China’s model landscape and hardware strategy
DeepSeek’s releases arrive amid broader shifts in China’s AI ecosystem. Chinese firms are pursuing both model and hardware strategies that reduce reliance on U.S. chip pipelines, for example, experimenting with domestic accelerators and sparse-attention mechanisms to cut compute costs. DeepSeek’s earlier technical report and community code releases have highlighted MoE architectures and sparse attention optimisations that aim to balance capability and cost. These design decisions help explain how the company positions lower-cost models that still aim to be competitive on reasoning benchmarks.
Model benchmark claims from vendors are useful signals, but they also require independent verification. Experts typically look for reproducible evaluation details, third-party leaderboard entries, and community audits before accepting head-to-head superiority claims. Some of DeepSeek’s benchmark wins (IMO, IOI, CodeForces) are eye-catching; CIOL reached out to independent researchers and engineers who advised cautious optimism until broader community replication is available. Public model cards, reproducible eval scripts, and open leaderboards will be the next crucial steps.
Future Considerations:
Third-party validation: Will independent benchmarkers reproduce the IMO/IOI/CodeForces results?
Tool-use rollout: When and how DeepSeek enables controlled tool access for Speciale will affect agent development workflows.
Hardware portability: Continued integration with China-native chips and frameworks could change deployment economics for local users.
Regulatory and geopolitical context: As Chinese firms push more powerful models, international scrutiny and export controls will stay relevant to how these systems are trained and deployed.
DeepSeek’s V3.2 and V3.2-Speciale are notable public steps in China’s open-source and research-forward AI trajectory. The company’s claims about reasoning and competition wins are significant if upheld, but the broader AI community will expect reproducible evaluations and transparent audit trails before revising leaderboards. For product teams and researchers, the new models are worth tracking both for their technical ideas (sparse attention, MoE, agent data synthesis) and for what they reveal about shifting cost-performance tradeoffs in model development.
/ciol/media/agency_attachments/c0E28gS06GM3VmrXNw5G.png)
Follow Us