LegalPwn exploits AI models by using legitimate legal language to trick them into misclassifying malicious software as safe code.

A recent cybersecurity breakthrough has revealed a significant vulnerability in several of today’s most popular generative AI tools. The novel “LegalPwn” attack, developed by researchers at Pangea Labs, demonstrates how attackers can trick artificial intelligence models like ChatGPT, Google Gemini, GitHub Copilot, Meta’s Llama, and xAI’s Grok into misclassifying malicious software as safe code by cleverly disguising it within seemingly legitimate legal language.

How LegalPwn Works

The LegalPwn technique leverages an advanced form of prompt injection—an adversarial tactic wherein malicious actors embed harmful code within text crafted to appear as legal disclaimers, confidentiality notices, compliance mandates, or terms of service agreements. Because generative AI systems are programmed to treat such legitimate-sounding language with a high degree of trust, these attacks are particularly effective at bypassing built-in safety and security filters. Even when explicit prompts are employed to enhance vigilance, the models frequently fail to identify the underlying threat if it is cloaked within convincing legal jargon.

Research Findings and AI Vulnerabilities

During their assessments, researchers tested twelve leading AI platforms and found the majority were susceptible to LegalPwn. Notably, only a few—including Anthropic’s Claude 3.5 Sonnet and Microsoft’s Phi 4—were able to withstand this type of attack. In practical demonstrations, attackers successfully used LegalPwn to have AI-powered developer tools and code assistants misclassify malicious payloads, such as backdoors or reverse shells, as benign utilities like calculators. While human analysts spotted the threats correctly, the AI solutions were consistently deceived by the legal wrapper.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply