Researchers Show Prompt Injection Can Break Apple Intelligence

Security researchers recently showed exactly how a clever prompt injection attack managed to bypass the built-in safety protections of Apple Intelligence. The method tricked the on-device system into running unauthorized commands. Apple has since fixed the vulnerability, but these new findings give us a clear look at how hackers can manipulate local AI models even when strict safety filters are actively in place.

How the attack tricked the system

The attackers used a two-step process (via Apple Insider) to get past the input and output filters that Apple built into its system. First, they used a trick involving Unicode characters. They wrote harmful text backward, but applied a special right-to-left override character.

Don’t miss the best of The Mac Observer

Set us as a preferred source and our Apple reporting ranks higher in your Google Search results and Discover feed — one tap, no account changes.

Add as a preferred
source on Google

Or get it by email

This made the text look normal on a screen, but the raw text remained reversed. Because the system filters only looked at the raw text, they did not recognize the harmful words and let the request pass through.

After sneaking past the filters, the researchers used a technique called Neural Exec. This method essentially rewrote the core instructions of the model. By combining these two steps, the attackers forced the system to ignore its basic safety rules and execute whatever instructions they wanted. In their tests, this approach worked 76 percent of the time.

Apple relies on a series of checks to keep its on-device Apple Intelligence functionality safe. An input filter checks your question for bad content. If it passes, the model generates an answer, which an output filter then checks. The researchers simply made the bad content invisible to those outer layers while giving orders to the model in the middle.

They reported this to Apple in October 2025. The company updated its software to block the attack, releasing fixes in iOS 26.4 and macOS 26.4. While the fix is live, the research shows how tricky it is to secure AI models running locally on phones. Attackers will keep finding ways to hide their instructions in plain sight.

Researchers Show Prompt Injection Can Break Apple Intelligence

How the attack tricked the system

Don’t miss the best of The Mac Observer

Apple is fixing the blind spots in local models

Discussion

How the attack tricked the system

Don’t miss the best of The Mac Observer

Apple is fixing the blind spots in local models

Discussion

Related Articles

Apple Canceled M2 Extreme and M3 Extreme Chips Planned for the Mac Pro

PlayStation Accused of Using Social Media Bots to Defend Digital-Only Future

Apple Uses New Court Ruling to Pause Epic Games Case

Apple Music and Apple One Plans Get More Expensive