"Can you jailbreak this AI?"

May 06, 2025

"Can you jailbreak this AI?"

That seemingly innocent prompt could be the start of something dangerous.

As generative AI continues to evolve, malicious users are finding new ways to exploit it — using adversarial prompts and jailbreak techniques to bypass content safeguards and generate harmful or policy-violating outputs.

At Microsoft, we’re committed to building AI that’s not only powerful, but also secure and responsible. That’s why we’ve introduced features like Prompt Shields in Azure AI Content Safety — designed to detect and block input attacks before they can influence model behaviour.

What actions can you detect and stop with Prompt Shields?

✅Prompt Injection – Attempts to override system rules or deceive the model

✅Encoded Prompts – Obfuscated language to bypass safety filters

✅Persona Spoofing – Role-play attacks to elicit unauthorised responses

✅Document Exploits – Embedded malicious instructions in uploaded content

✅Information Misuse – Prompts designed to extract sensitive data or trigger disallowed outputs

Want to learn more about our Prompt Shields?

📘https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection

Find me on LinkedIn: José Lázaro | LinkedIn

Would love to hear your thoughts:

🧠 How are you thinking about AI safety in your organisation?

Leave your comment! 💬

#MicrosoftAI #ContentSafety #ResponsibleAI #CyberSecurity #AzureAI #TrustworthyAI #GenerativeAI #AIethics #SIEM #XDR #SOC #Microsoft #MicrosoftSecurity #MSPartnerUK

Microsoft Security Sentinel

Discussion about this post