New TokenBreak attack bypasses LLM protective guardrails.

New TokenBreak attack bypasses LLM protective guardrails.

A newly discovered cyber attack technique, called TokenBreak, targets the tokenization process of text classification models, particularly those used as protective guardrails for large language models (LLMs). The attack exploits how certain tokenizers break down and interpret text, allowing adversaries to bypass content moderation, safety, toxicity, and spam detection systems with minimal changes to input text.