How Chance AI Is Shaping the Next Interface for AI

Chance AI's Curiosity Lens turns visual sight into context-aware action. Dr Xi Zeng prioritises user autonomy, cultural fairness, and low-latency hybrid AI. NY startup targets India's diverse visual ecosystem as key learning ground.

author-image
Manisha Sharma
New Update
XI ZENG

For years, artificial intelligence has been trained to read, listen, and respond. The next frontier is sight. As visual data becomes central to how people navigate commerce, culture, and creativity, a new class of systems, visual agents – is emerging to interpret the world not just as images but as context.

Advertisment

Chance AI, headquartered in New York, sits at this intersection. Founded by Dr Xi Zeng, a former product director at OnePlus and AI leader at TikTok, the company is building what it calls a “visual agent”, technology designed to translate what users see into understanding and, eventually, informed action. With its Curiosity Lens gaining global traction, Chance AI is now exploring India, a market where visual language, cultural nuance, and digital behaviour collide at scale.

Speaking with CiOL, Dr Xi Zeng, founder of Chance AI, emphasised that visual AI should not be treated as a productivity shortcut but as a cognitive layer that supports human curiosity without overriding it.

He highlighted the importance of separating perception from decision-making, noting that visual understanding must remain probabilistic and transparent, especially as AI systems begin influencing creative and commercial outcomes. Zeng also pointed to cultural bias as one of the most underestimated risks in vision models, arguing that fairness in visual AI requires continuous regional validation rather than one-time dataset fixes.

In India, Zeng described the country not as an expansion checkbox, but as a learning ground, one that forces visual systems to contend with linguistic diversity, regional aesthetics, and dense real-world complexity. He stressed that user autonomy, latency control, and regulatory alignment will define which visual agents earn long-term trust.

Interview Excerpts

Chance AI positions the Curiosity Lens as a “visual agent” that converts sight into action. Technically, how do you define the boundary between perception and action, and what safeguards exist before the system suggests an action?

At Chance AI, we treat perception and action as two clearly separated layers. The perception layer focuses purely on understanding what the camera sees – objects, environments, styles, or context – using on-device vision models that respond in under 100 milliseconds.

Advertisment

Only after this visual understanding is established does the action layer activate in the cloud. This layer evaluates relevance, user intent, and historical interaction patterns before generating suggestions. Importantly, the system will not surface recommendations unless confidence exceeds 85% and is validated across multiple models. In real-world testing across more than 40 countries, our recognition accuracy has consistently exceeded 92%.

All actions—whether it’s exploring similar items, checking prices, or creating content—require explicit user input. Nothing happens automatically. Our total end-to-end latency stays under 800 milliseconds, ensuring responsiveness without compromising user control or privacy.

Vision models often struggle with cultural bias and fragile performance across regions. How does Chance AI ensure robustness and fairness in its visual interpretations?

Cultural robustness was a design priority for us from day one. Many vision systems are disproportionately trained on Western datasets; we intentionally built Chance AI to understand the visual diversity of the real world.

Our training data includes millions of authentic images contributed by global creators—Indian textiles, regional jewellery, African prints, local art forms—supplemented by carefully generated synthetic data where gaps exist. Every major category is reviewed by regional evaluators, and we require over 90% consensus before deploying updates.

We publish internal quarterly bias audits for partners and maintain transparent confidence scores within the app so users can see how certain the system is about an interpretation. Annual public reporting keeps us accountable. Fairness, for us, is not a checkbox—it’s a continuous measurement process grounded in transparency.

Advertisment

When visual discovery moves into recommendations or creative suggestions, how do you protect user autonomy and avoid manipulation?

Visual intelligence becomes powerful—and potentially problematic—when it shifts from observation to influence. That’s why we built explicit guardrails.

Chance AI never pushes suggestions by default. Users must actively opt in by tapping prompts like “Show ideas” or “Explore more.” All generated content is clearly labelled as AI-assisted, and commercial intent—if any—is disclosed. We do not allow hidden sales links or covert persuasion mechanisms.

Advertisment

Creative features such as poetry generation or cultural interpretations are clearly framed with context and disclaimers. Users can access, export, or delete their interaction history at any time. We also conduct independent audits to ensure these principles are consistently upheld. Curiosity should empower users, not steer them.

India is uniquely diverse, visually and linguistically. How is Chance AI localising the Curiosity Lens for Indian users?

India is not a market we are “adapting to”—it is a place we are learning from. Our localisation strategy goes far beyond language translation.

Advertisment

We are training models on millions of India-specific visuals, from regional scripts and street signage to fabrics, jewellery styles, festivals, and cinematic aesthetics. Accuracy benchmarks for Indian categories are already exceeding 95% in several domains.

We’re also working closely with design institutions, wedding industry partners, educators, and regional creators to ensure cultural nuance. A locally moderated review system involving over 75 Indian experts supports culturally sensitive outputs. By early 2026, users will see expanded Hindi voice support, regional language text outputs, and creator-focused sharing tools tailored for Indian platforms.

What are the biggest engineering constraints today, and how are you addressing them through architecture or algorithms?

Building a system that sees like a human and responds instantly comes with trade-offs. Our biggest challenges are maintaining ultra-low latency, preserving privacy, and managing compute costs at scale.

Currently, lightweight vision tasks run on-device, while more complex reasoning happens in the cloud. Over the last year, we’ve reduced latency by over 40% by compressing models and improving edge-cloud coordination. We’re steadily shifting more processing onto devices, and by mid-2026, we expect roughly 70% of workflows to run locally.

This hybrid architecture allows us to deliver fast, private, and scalable experiences without compromising accuracy or user trust.

As visual agents intersect with commerce, culture, and creativity, what ecosystem standards are needed for safe adoption?

Visual AI sits at the intersection of technology, culture, and regulation. For it to scale responsibly, three standards are essential.

First, transparent attribution—users and creators should always know how content was derived and from what visual source. Second, culturally aware benchmarks that account for regional diversity, not just global averages. Third, interoperability with regulatory frameworks such as India’s DPDP Act and GDPR.

We support visible provenance markers, creator credit mechanisms, and cross-platform compatibility. Shared standards ensure that visual AI enhances creativity and commerce without eroding trust. When curiosity travels safely, ecosystems grow sustainably.