How a Pune-Based Startup Is Tackling India’s Deepfake Security Challenge

A Pune-based deeptech startup is using AI-powered forensics to detect deepfakes, secure digital evidence, and help law enforcement and enterprises respond to synthetic media threats.

author-image
Manisha Sharma
New Update
Deepfake Security Challenge

Deepfakes are no longer a future risk. They are already influencing fraud investigations, court proceedings, and digital trust across banking, media, and government. As synthetic media becomes cheaper and more accessible, the challenge is no longer just detection but explainability, scale, and legal defensibility.

Advertisment

Founded in 2023, pi-labs operates at this intersection. Its AI-powered cyber forensics platform is already deployed across Indian law enforcement, defence, and BFSI institutions, focusing on deepfake detection, AI model security, and forensic intelligence.

At the center of this effort is Ankush Tiwari, founder and CEO of pi-labs, a serial entrepreneur previously known for co-founding Mobiliya. Along with Abhijeet Zilpelwar, CTO, pi-labs, and Raghu Sesha Iyengar, Chief Scientist, pi-labs, the company is building what it calls an AI-native security stack for a world flooded with synthetic content.

In a conversation with CiOL, Ankush Tiwari, Founder and CEO of pi-labs, said deepfakes have moved from being a media concern to a systemic security challenge. He noted that without explainable, forensically sound AI systems, enterprises and government agencies risk losing trust in digital evidence itself.

How do you evaluate Authentify’s resilience against deepfakes, and how often do you benchmark?

Evaluation Strategy: We evaluate resilience not just by testing against known academic datasets but by testing against the generators themselves.

Diverse Attack Vectors: Our models are trained and tested on a heterogeneous mix of data sources. This includes "Curated Data" (academic benchmarks), "Open Source Models", and "Commercial APIs.

Advertisment

Adversarial Hardening: We actively subject our models to adversarial attacks. In our facial forensics research, we use gradient-based attack algorithms to generate adversarial samples and test with those. For audio, we test robustness against noise addition and encoding artefacts like compression to ensure resilience in real-world conditions.

Benchmarking Frequency: Benchmarking is continuous and integrated into the research lifecycle. We test performance on held-out test sets from a variety of sources, validating good generalisation. We benchmark against SOTA methods every 2-3 weeks.  This includes previous and newly published open-source datasets for video, image and audio deepfake detection.

What defences against model poisoning and backdoors are most practical for BFSI today?

Based on our architecture, the most practical defences for high-stakes environments like BFSI (Banking, Financial Services, and Insurance) are robustness via augmentation and source attribution:

Data Augmentation as Defence: To prevent the model from learning brittle features (which are susceptible to poisoning), we employ aggressive data augmentation during training. This includes random perturbations, JPEG/MP3 compression, and variable frame rates. This forces the model to learn fundamental "fingerprints" of forgery rather than easily poisoned surface-level noise.

Source Attribution Layer: We don't just ask, "Is this fake?"; we ask, "Who made this?" By identifying the specific generative tool (e.g., "This is a FaceFusion swap"), we add a layer of verification that is harder to bypass via simple poisoning attacks. If an attacker tries to poison the "Real/Fake" decision boundary, the "Source Attribution" head often still flags the anomaly.

Advertisment

Dual-Head Consistency: We utilise a "constraint loss" during training to enforce consistency between the "Real/Fake" verdict and the "Source" verdict. If a sample is classified as "real" but the source head detects "ElevenLabs" artefacts, the system can flag this inconsistency, acting as a failsafe against manipulation.

How do you ensure explainability of black box AI outputs so that they are legally admissible?

We move away from opaque "black box" scores by providing granular, evidence-based reporting:

Advertisment

Temporal Localisation (Chunk-wise Analysis): Instead of a single score for a whole video, Authentify breaks media into small chunks (0.5s for video, 6s for audio). We provide a report showing exactly which seconds of the clip are manipulated. This is crucial for legal admissibility, as it allows investigators to isolate the exact moment of fraud.

Source Fingerprinting: Our models map input media to specific "fingerprints" in a latent space. Being able to say, "This audio contains spectral artefacts consistent with the Vocos vocoder" provides a forensic scientific basis for the decision, rather than just a probability score.

Visual Interpretability: For mobile applications, we visualise these chunk-wise predictions on a timeline, allowing non-experts (like judges or journalists) to intuitively see the distribution of "fake" segments.

Advertisment

How do you balance accuracy, latency, and cost? What to keep in mind while scaling the detection engine nationally?

Balancing the Triad:

Balancing accuracy, latency, and cost comes down to using the right depth of AI at the right place. For high-risk or evidentiary cases, we run deep video and audio analysis on powerful GPU systems to maximise accuracy. For real-time needs, we use lighter, optimised versions of the same models closer to the source (edge or local servers). This hybrid approach ensures fast response where needed, without compromising forensic reliability when it matters most.

Cost efficiency at scale is achieved by using infrastructure intelligently rather than throwing hardware at the problem. Multiple detection models (audio, video, image, case-specific) run concurrently on the same GPU through dynamic scheduling and optimised inference engines. Techniques like quantisation and pruning allow strong performance even on smaller GPUs, dramatically lowering hardware and operating costs while keeping accuracy high.

When scaling nationally, three things matter most: bandwidth, privacy, and throughput. Instead of sending raw media to the cloud, only alerts, metadata, or flagged segments are transmitted. Sensitive data is processed on-premise to meet data sovereignty and privacy requirements. Duplicate media is automatically identified and skipped unless policies or models change. Together, these measures enable the system to handle national-scale volumes reliably, securely, and at a sustainable cost.

Which standards or APIs should India prioritise to enable consistent cross-agency sharing?

To enable interoperability (e.g., between a bank's VideoKYC provider and a law enforcement agency), we prioritise:

Modular REST APIs: Authentify is exposed via standard REST interfaces that accept standard media types (MP4, WAV) and return standardised JSON reports.

Structured Forensic Reports: The API response includes not just a Boolean "Fake" flag but detailed metadata: timestamped confidence scores per chunk, source attribution labels, and overall aggregate scores. Prioritising a standard schema for this "forensic metadata" allows different agencies to ingest and trust the analysis without needing access to the raw proprietary models.

What proactive controls, like watermarking and provenance, can realistically be adopted at scale?

While watermarking (like C2PA) is the ideal "proactive" control, it currently lacks universal adoption. Authentify focuses on "Passive Provenance" (Source Attribution) as the realistic alternative for today:

Forensic Fingerprinting: Since malicious actors strip watermarks, we rely on the intrinsic artefacts left by the generative model itself. Our research shows that every generator (GAN, diffusion, TTS engine) leaves a unique "fingerprint" in the latent space.

Model Identification as Provenance: By identifying that a video was generated by "Runway Gen-2" or audio by "ElevenLabs", we effectively reconstruct the provenance of the file post-hoc, without relying on the creator to embed a watermark. This "detect-and-attribute" approach is the only scalable control available for the vast ocean of unmarked synthetic content currently in circulation.