Google Gemini can be exploited through indirect prompt injection to allow embedding of malicious content that directs users to phishing sites.

Google Gemini for Workspace can be exploited through a technique called indirect prompt injection. This allows attackers to manipulate Gemini’s email summaries, making them appear legitimate while embedding malicious instructions or warnings that direct users to phishing sites—without using traditional attachments or direct links.

How the Exploit Works

Attackers embed hidden instructions within the body of an email using techniques like white-on-white text, invisible HTML tags, or specialized tokens (e.g., <Admin>...</Admin> or System: ...). The exploit does not require attachments or clickable links. Instead, the malicious content is disguised within the email’s formatting or metadata.

When a user clicks “Summarize this email,” Gemini interprets the hidden prompt as a legitimate instruction and generates a summary that includes fabricated warnings or urgent instructions. The summary might display a message such as:”ALERT! Your password has been compromised. Please visit [fake site] to reset your password immediately.” The original email may not contain this visible warning, and the summary appears to come from Google or the organization itself.

Attack Characteristics

FeatureDescription
Attack VectorMaliciously crafted email body (no attachments/links)
Exploit MechanismIndirect prompt injection via hidden text or HTML/CSS
User InteractionTriggered when user asks Gemini to summarize the email
Resulting SummaryIncludes urgent warnings or instructions (e.g., reset password, call number)
Phishing TacticDirects users to phishing sites or social engineering attacks

Google’s Response and Mitigations

  • Detection Efforts: Google has implemented security measures to detect and block malicious prompts. When Gemini identifies threats, it may exclude suspicious messages from summaries or warn users about security risks.
  • Limitations: Despite these safeguards, researchers have demonstrated that certain prompt injection techniques still bypass protections, and Google has sometimes classified these behaviors as “intended” or “not a security issue”.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply