/ciol/media/media_files/2025/12/03/image-2025-12-03-17-19-19.png)
As AI moves deeper into the browser to execute actions, not just answer questions, the web is evolving into an agent-driven environment. Perplexity says that shift calls for stronger guardrails to protect users from malicious prompts hidden across web pages. To address this, the company has introduced BrowseSafe, a real-time HTML scanning model designed to detect unsafe instructions targeting AI assistants inside the browser.
/filters:format(webp)/ciol/media/media_files/2025/12/03/ss1-2025-12-03-17-24-44.jpeg)
/filters:format(webp)/ciol/media/media_files/2025/12/03/ss2-2025-12-03-17-24-44.jpeg)
Real-Time Protection Inside the Browser
BrowseSafe focuses on one core task: identifying whether a webpage contains instructions intended to manipulate an AI agent’s behaviour. Large models can detect such threats, but often with high compute cost and latency. BrowseSafe, fine-tuned specifically for detection, scans full pages without slowing user browsing. Perplexity is also releasing BrowseSafe-Bench, an evaluation suite intended to improve and validate these protections.
Prompt injection embeds hidden instructions in web content to override the AI’s original intent. Because agents read entire pages, including comments, templates, and long footers, attackers can quietly redirect an assistant’s actions. These threats may appear in well-written or multilingual text or be buried inside HTML elements that aren’t visible to users but are still parsed by agents.
Testing Against Real-World Web Scenarios
BrowseSafe-Bench includes 14,719 examples designed to reflect production browsing. Sample variations include:
11 different attack types
9 injection placements — from hidden fields to visible paragraphs
Multiple linguistic styles, from direct commands to indirect suggestions
This variety helps benchmark how effectively models detect harmful signals across different web structures.
Perplexity considers all web content untrusted. The assistant operates in a secure environment, while features like browsing, emails, or file handling require scanning before any action is taken. BrowseSafe adds to a layered approach: scanning raw content, restricting tool permissions, and prompting for user confirmation when sensitive actions are requested — building safety in without limiting usability.
What Makes Attacks Harder to Detect
Evaluation on BrowseSafe-Bench reveals:
Direct attacks are easiest to identify
Multilingual and indirect attacks are significantly harder
Attacks in visible page areas not hidden elements, pose more challenges
These findings highlight where additional training and model refinement can improve defensive reliability.
BrowseSafe and the accompanying benchmark are open-source, allowing developers to strengthen their own autonomous agents without building new protection systems from scratch. The model runs locally and evaluates every page in real time, while chunking and parallel scanning let agents handle large, untrusted sites efficiently.
/ciol/media/agency_attachments/c0E28gS06GM3VmrXNw5G.png)
Follow Us