Perplexity Introduces BrowseSafe to Secure AI Browsing from Hidden Threats

Perplexity launches BrowseSafe, a real-time HTML scanner detecting malicious prompt injections targeting AI browser agents, plus the open-source BrowseSafe-Bench benchmark.

03 Dec 2025 17:00 IST

New Update

As AI moves deeper into the browser to execute actions, not just answer questions, the web is evolving into an agent-driven environment. Perplexity says that shift calls for stronger guardrails to protect users from malicious prompts hidden across web pages. To address this, the company has introduced BrowseSafe, a real-time HTML scanning model designed to detect unsafe instructions targeting AI assistants inside the browser.

Advertisment

ss1

ss2

Real-Time Protection Inside the Browser

BrowseSafe focuses on one core task: identifying whether a webpage contains instructions intended to manipulate an AI agent’s behaviour. Large models can detect such threats, but often with high compute cost and latency. BrowseSafe, fine-tuned specifically for detection, scans full pages without slowing user browsing. Perplexity is also releasing BrowseSafe-Bench, an evaluation suite intended to improve and validate these protections.

Prompt injection embeds hidden instructions in web content to override the AI’s original intent. Because agents read entire pages, including comments, templates, and long footers, attackers can quietly redirect an assistant’s actions. These threats may appear in well-written or multilingual text or be buried inside HTML elements that aren’t visible to users but are still parsed by agents.

Testing Against Real-World Web Scenarios

BrowseSafe-Bench includes 14,719 examples designed to reflect production browsing. Sample variations include:

11 different attack types
9 injection placements — from hidden fields to visible paragraphs
Multiple linguistic styles, from direct commands to indirect suggestions

This variety helps benchmark how effectively models detect harmful signals across different web structures.

Perplexity considers all web content untrusted. The assistant operates in a secure environment, while features like browsing, emails, or file handling require scanning before any action is taken. BrowseSafe adds to a layered approach: scanning raw content, restricting tool permissions, and prompting for user confirmation when sensitive actions are requested — building safety in without limiting usability.

Advertisment

What Makes Attacks Harder to Detect

Evaluation on BrowseSafe-Bench reveals:

Direct attacks are easiest to identify
Multilingual and indirect attacks are significantly harder
Attacks in visible page areas not hidden elements, pose more challenges

These findings highlight where additional training and model refinement can improve defensive reliability.

BrowseSafe and the accompanying benchmark are open-source, allowing developers to strengthen their own autonomous agents without building new protection systems from scratch. The model runs locally and evaluates every page in real time, while chunking and parallel scanning let agents handle large, untrusted sites efficiently.