Google Adds Layered Defenses to Chrome to Block Indirect Prompt Injection Threats

Google on Monday announced a set of new security features in Chrome, following the company’s addition of agentic artificial intelligence (AI) capabilities to the web browser. To that end, the tech giant said it has implemented layered defenses to make it harder for bad actors to exploit indirect prompt injections that arise as a result of exposure to untrusted web content and inflict harm. Chief
[[{“value”:”

Google on Monday announced a set of new security features in Chrome, following the company’s addition of agentic artificial intelligence (AI) capabilities to the web browser.

To that end, the tech giant said it has implemented layered defenses to make it harder for bad actors to exploit indirect prompt injections that arise as a result of exposure to untrusted web content and inflict harm.

Chief among the features is a User Alignment Critic, which uses a second model to independently evaluate the agent’s actions in a manner that’s isolated from malicious prompts. This approach complements Google’s existing techniques, like spotlighting, which instruct the model to stick to user and system instructions rather than abiding by what’s embedded in a web page.

“The User Alignment Critic runs after the planning is complete to double-check each proposed action,” Google said. “Its primary focus is task alignment: determining whether the proposed action serves the user’s stated goal. If the action is misaligned, the Alignment Critic will veto it.”

The component is designed to view only metadata about the proposed action and is prevented from accessing any untrustworthy web content, thereby ensuring that it is not poisoned through malicious prompts that may be included in a website. With the User Alignment Critic, the idea is to provide safeguards against any malicious attempts to exfiltrate data or hijack the intended goals to carry out the attacker’s bidding.

“When an action is rejected, the Critic provides feedback to the planning model to re-formulate its plan, and the planner can return control to the user if there are repeated failures,” Nathan Parker from the Chrome security team said.

Google is also enforcing what’s called Agent Origin Sets to ensure that the agent only has access to data from origins that are relevant to the task at hand or data sources the user has opted to share with the agent. This aims to address site isolation bypasses where a compromised agent can interact with arbitrary sites and enable it to exfiltrate data from logged-in sites.

Cybersecurity

This is implemented by means of a gating function that determines which origins are related to the task and categorizes them into two sets –

  • Read-only origins, from which Google’s Gemini AI model is permitted to consume content
  • Read-writable origins, to which the agent can type or click on in addition to reading from

“This delineation enforces that only data from a limited set of origins is available to the agent, and this data can only be passed on to the writable origins,” Google explained. “This bounds the threat vector of cross-origin data leaks.”

Similar to the User Alignment Critic, the gating function is not exposed to untrusted web content. The planner is also required to obtain the gating function’s approval before adding new origins, although it can use context from the web pages a user has explicitly shared in a session.

Another key pillar underpinning the new security architecture relates to transparency and user control, allowing the agent to create a work log for user observability and request their explicit approval before navigating to sensitive sites, such as banking and healthcare portals, permitting sign-ins via Google Password Manager, or completing web actions like purchases, payments, or sending messages.

Lastly, the agent also checks each page for indirect prompt injections and operates alongside Safe Browsing and on-device scam detection to block potentially suspicious content.

“This prompt-injection classifier runs in parallel to the planning model’s inference, and will prevent actions from being taken based on content that the classifier determined has intentionally targeted the model to do something unaligned with the user’s goal,” Google said.

To further incentivize research and poke holes in the system, the company said it will pay up to $20,000 for demonstrations that result in a breach of the security boundaries. These include indirect prompt injections that allow an attacker to –

  • Carry out rogue actions without confirmation
  • Exfiltrate sensitive data without an effective opportunity for user approval
  • Bypass a mitigation that should have ideally prevented the attack from succeeding in the first place

“By extending some core principles like origin-isolation and layered defenses, and introducing a trusted-model architecture, we’re building a secure foundation for Gemini’s agentic experiences in Chrome,” Google said. “We remain committed to continuous innovation and collaboration with the security community to ensure Chrome users can explore this new era of the web safely.”

Cybersecurity

The announcement follows research from Gartner that called on enterprises to block the use of agentic AI browsers until the associated risks, such as indirect prompt injections, erroneous agent actions, and data loss, can be appropriately managed.

The research also warns of a possible scenario where employees “might be tempted to use AI browsers and automate certain tasks that are mandatory, repetitive, and less interesting.” This could cover cases where an individual dodges mandatory cybersecurity training by instructing the AI browser to complete it on their behalf.

“Agentic browsers, or what many call AI browsers, have the potential to transform how users interact with websites and automate transactions while introducing critical cybersecurity risks,” the advisory firm said. “CISOs must block all AI browsers in the foreseeable future to minimize risk exposure.”

The development comes as the U.S. National Cyber Security Centre (NCSC) said that large language models (LLMs) may suffer from a persistent class of vulnerability known as prompt injection and that the problem can never be resolved in its entirety.

“Current large language models (LLMs) simply do not enforce a security boundary between instructions and data inside a prompt,” said David C, NCSC technical director for Platforms Research. “Design protections need to therefore focus more on deterministic (non-LLM) safeguards that constrain the actions of the system, rather than just attempting to prevent malicious content reaching the LLM.”

Found this article interesting? Follow us on Google News, Twitter and LinkedIn to read more exclusive content we post.

“}]] The Hacker News 

Total
0
Shares
Previous Post

STAC6565 Targets Canada in 80% of Attacks as Gold Blade Deploys QWCrypt Ransomware

Next Post

How to Streamline Zero Trust Using the Shared Signals Framework

Related Posts

Chinese TA415 Uses VS Code Remote Tunnels to Spy on U.S. Economic Policy Experts

A China-aligned threat actor known as TA415 has been attributed to spear-phishing campaigns targeting the U.S. government, think tanks, and academic organizations utilizing U.S.-China economic-themed lures. "In this activity, the group masqueraded as the current Chair of the Select Committee on Strategic Competition between the United States and the Chinese Communist Party (CCP), as well as the
Read More

New Android Trojan “Datzbro” Tricking Elderly with AI-Generated Facebook Travel Events

Cybersecurity researchers have flagged a previously undocumented Android banking trojan called Datzbro that can conduct device takeover (DTO) attacks and perform fraudulent transactions by preying on the elderly. Dutch mobile security company ThreatFabric said it discovered the campaign in August 2025 after users in Australia reported scammers managing Facebook groups promoting "active senior
Read More

Experts Warn of Widespread SonicWall VPN Compromise Impacting Over 100 Accounts

Cybersecurity company Huntress on Friday warned of "widespread compromise" of SonicWall SSL VPN devices to access multiple customer environments. "Threat actors are authenticating into multiple accounts rapidly across compromised devices," it said. "The speed and scale of these attacks imply that the attackers appear to control valid credentials rather than brute-forcing." A significant chunk of
Read More