Google to implement multi-layered defense in its generative AI systems.

June 23, 2025Cybersecurity News

TL;DR

Google has implemented a multi-layered defense strategy to secure its generative AI systems (like Gemini) from evolving threats, particularly indirect prompt injection attacks. These attacks involve embedding malicious instructions within external data sources—such as emails, documents, or calendar invites—to manipulate AI into exfiltrating sensitive data or performing unauthorized actions. Unlike direct prompt injections, where attackers input malicious commands explicitly, indirect injections exploit trusted content to bypass defenses.

Key Security Measures

Google’s approach combines model hardening, purpose-built machine learning defenses, and system-level safeguards to create overlapping layers of protection:

Model Resilience Enhancements

Gemini 2.5 models are trained with adversarial data to inherently resist indirect prompt injections. Security thought reinforcement (spotlighting) inserts markers into untrusted data to steer models away from hidden adversarial instructions.

Real-Time Threat Detection

Prompt injection content classifiers: Proprietary ML models scan inputs (e.g., emails, files) to filter out malicious instructions before processing. Markdown sanitization & URL redaction: Removes suspicious URLs using Google Safe Browsing and blocks external image rendering to prevent exploits like EchoLeak.

User-Centric Safeguards

User confirmation framework: Requires explicit approval for high-risk actions (e.g., data sharing). End-user notifications: Alerts users about detected prompt injection attempts.

AI-Specific Security Ecosystem

Google Cloud’s AI Protection suite extends these defenses. It automatically catalogs AI assets (models, data, applications) for visibility. “Model Armor” inspects and sanitizes prompts/responses, enforces RBAC, and filters harmful content. And it integrates with Security Command Center and Mandiant intelligence to detect attack paths and recommend remediations.

Defense Strategy Philosophy

This layered architecture—spanning model training, input/output sanitization, and user controls—deliberately increases the cost and complexity of attacks. Adversaries must overcome multiple independent barriers, forcing them toward more detectable or resource-intensive methods. The approach prioritizes proactive threat mitigation while maintaining usability, reflecting Google’s broader investment in AI red-teaming, vulnerability research, and adversarial training.

These measures address critical vulnerabilities in agentic AI systems, where indirect prompt injections pose unique risks due to their subtlety and exploitation of trusted channels

Google

Last updated on June 23, 2025

United States	~59% of ransomware attacks globally Thousands per year
Poland	1,000+ per week
Russia	Highest cybercrime threat level
China	Thousands per year
India	115% surge in attacks Q2 2024
Ukraine	Significant surge since 2022
Brazil	Among top countries for blocked attacks
Mexico	65% of businesses hit in 2024
Germany	High targeted rate (EU)
France	High targeted rate (EU)

AS Name	ASN
Bharat Sanchar Nigam Ltd	9829
No.31,Jin-rong Street	4134
CHINA UNICOM China169 Backbone	4837
DigitalOcean, LLC	14061
HUAWEI INTERNATIONAL PTE. LTD.	136907
Amazon.com, Inc.	14618
Alibaba (US) Technology Co., Ltd.	45102
Google LLC	396982
Amazon.com, Inc.	16509
3xK Tech GmbH	200373

IP Address	Notable Exploits/Context
104.238.159.149	SharePoint zero-day, broad exploitation
107.191.58.76	SharePoint zero-day, government targets
96.9.125.147	SharePoint, previously Ivanti exploits
139.162.47.194	Exploits on CitrixBleed 2
38.180.148.215	CitrixBleed 2 campaigns
185.224.128.17	High activity, Netherlands
89.248.163.200	High activity, Netherlands
15.235.218.150	Associated with APT, active C2
45.9.148.114	Associated with C2, malicious netflow
91.107.150.184	C2 infrastructure, recent IoC