Update resources in AI Red Teaming Roadmap (#8570)

* Update why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md

* Update prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md

* Update generative-models@3XJ-g0KvHP75U18mxCqgw.md

* Update prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md

* Update jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md

* Update countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md

* Update forums@Smncq-n1OlnLAY27AFQOO.md

* Update lab-environments@MmwwRK4I9aRH_ha7duPqf.md

* Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md

* Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md

* Update industry-credentials@HHjsFR6wRDqUd66PMDE_7.md

* Update agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md

* Update responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md

* Update benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md

* Update adversarial-examples@xjlttOti-_laPRn8a2fVy.md

* Update large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md

* Update introduction@HFJIYcI16OMyM77fAw9af.md

* Update ethical-considerations@1gyuEV519LjN-KpROoVwv.md

* Update role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md

* Update threat-modeling@RDOaTBWP3aIJPUp_kcafm.md

* Update direct@5zHow4KZVpfhch5Aabeft.md

* Update indirect@3_gJRtJSdm2iAfkwmcv0e.md

* Update model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md

* Update model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md

* Update unauthorized-access@DQeOavZCoXpF3k_qRDABs.md

* Update data-poisoning@nD0_64ELEeJSN-0aZiR7i.md

* Update model-inversion@iE5PcswBHnu_EBFIacib0.md

* Update code-injection@vhBu5x8INTtqvx6vcYAhE.md

* Update remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md

* Update api-protection@Tszl26iNBnQBdBEWOueDA.md

* Update authentication@J7gjlt2MBx7lOkOnfGvPF.md

* Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md

* Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md

* Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md

* Update automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md

* Update specialized-courses@s1xKK8HL5-QGZpcutiuvj.md
pull/8555/merge
David Willis-Owen 2 days ago committed by GitHub
parent 2937923fb1
commit 80a0caba2f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 1
      src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md
  2. 3
      src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md
  3. 1
      src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md
  4. 2
      src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md
  5. 2
      src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md
  6. 3
      src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md
  7. 2
      src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md
  8. 1
      src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md
  9. 3
      src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md
  10. 1
      src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md
  11. 2
      src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md
  12. 1
      src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md
  13. 4
      src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md
  14. 4
      src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md
  15. 2
      src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md
  16. 4
      src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md
  17. 1
      src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md
  18. 2
      src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md
  19. 3
      src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md
  20. 2
      src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md
  21. 1
      src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md
  22. 2
      src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md
  23. 1
      src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md
  24. 2
      src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md
  25. 2
      src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md
  26. 2
      src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md
  27. 4
      src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md
  28. 2
      src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md
  29. 1
      src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md
  30. 1
      src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md
  31. 4
      src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md
  32. 4
      src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md
  33. 5
      src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md

@ -4,7 +4,6 @@ A core AI Red Teaming activity involves generating adversarial examples – inpu
Learn more from the following resources: Learn more from the following resources:
- [@article@Adversarial Examples Explained (OpenAI Blog)](https://openai.com/research/adversarial-examples)
- [@guide@Adversarial Examples – Interpretable Machine Learning Book](https://christophm.github.io/interpretable-ml-book/adversarial.html) - [@guide@Adversarial Examples – Interpretable Machine Learning Book](https://christophm.github.io/interpretable-ml-book/adversarial.html)
- [@guide@Adversarial Testing for Generative AI](https://developers.google.com/machine-learning/guides/adv-testing) - [@guide@Adversarial Testing for Generative AI](https://developers.google.com/machine-learning/guides/adv-testing)
- [@video@How AI Can Be Tricked With Adversarial Attacks](https://www.youtube.com/watch?v=J3X_JWQkvo8?v=MPcfoQBDY0w) - [@video@How AI Can Be Tricked With Adversarial Attacks](https://www.youtube.com/watch?v=J3X_JWQkvo8?v=MPcfoQBDY0w)

@ -5,5 +5,4 @@ As AI agents capable of autonomous action become more common, AI Red Teamers mus
Learn more from the following resources: Learn more from the following resources:
- [@article@AI Agents - Learn Prompting](https://learnprompting.org/docs/intermediate/ai_agents) - [@article@AI Agents - Learn Prompting](https://learnprompting.org/docs/intermediate/ai_agents)
- [@article@Reasoning models don't always say what they think](https://www.anthropic.com/research/reasoning-models-dont-always-say-what-they-think) - [@article@EmbraceTheRed](https://embracethered.com/)
- [@course@Certified AI Red Team Operator – Autonomous Systems (CAIRTO-AS) from Tonex, Inc.](https://niccs.cisa.gov/education-training/catalog/tonex-inc/certified-ai-red-team-operator-autonomous-systems-cairto)

@ -4,7 +4,6 @@ AI Red Teamers rigorously test the security of APIs providing access to AI model
Learn more from the following resources: Learn more from the following resources:
- [@article@API Protection for AI Factories: The First Step to AI Security](https://www.f5.com/company/blog/api-security-for-ai-factories)
- [@article@Securing APIs with AI for Advanced Threat Protection](https://adevait.com/artificial-intelligence/securing-apis-with-ai) - [@article@Securing APIs with AI for Advanced Threat Protection](https://adevait.com/artificial-intelligence/securing-apis-with-ai)
- [@article@Securing Machine Learning APIs (IBM)](https://developer.ibm.com/articles/se-securing-machine-learning-apis/) - [@article@Securing Machine Learning APIs (IBM)](https://developer.ibm.com/articles/se-securing-machine-learning-apis/)
- [@guide@OWASP API Security Project (Top 10 2023)](https://owasp.org/www-project-api-security/) - [@guide@OWASP API Security Project (Top 10 2023)](https://owasp.org/www-project-api-security/)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/) - [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/)
- [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization) - [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization)
- [@video@How JWTs are used for Authentication (and how to bypass it)](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3Dexample_video_panel_url?v=3OpQi65s_ME) - [@article@JWT Attacks](https://portswigger.net/web-security/jwt)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@Automation Testing vs. Manual Testing: Which is the better approach?](https://www.opkey.com/blog/automation-testing-vs-manual-testing-which-is-better) - [@article@Automation Testing vs. Manual Testing: Which is the better approach?](https://www.opkey.com/blog/automation-testing-vs-manual-testing-which-is-better)
- [@article@Manual Testing vs Automated Testing: What's the Difference?](https://www.leapwork.com/blog/manual-vs-automated-testing) - [@article@Manual Testing vs Automated Testing: What's the Difference?](https://www.leapwork.com/blog/manual-vs-automated-testing)
- [@guide@LLM red teaming guide (open source)](https://www.promptfoo.dev/docs/red-team/) - [@tool@Spikee](https://spikee.ai)

@ -1,9 +1,10 @@
# Benchmark Datasets # Benchmark Datasets
AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models. AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like HackAprompt, SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models.
Learn more from the following resources: Learn more from the following resources:
- [@dataset@HackAPrompt Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset)
- [@dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset](https://github.com/cysecbench/dataset) - [@dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset](https://github.com/cysecbench/dataset)
- [@dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security](https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html) - [@dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security](https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html)
- [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity](https://arxiv.org/abs/2412.20787) - [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity](https://arxiv.org/abs/2412.20787)

@ -5,5 +5,5 @@ AI Red Teamers test for code injection vulnerabilities specifically in the conte
Learn more from the following resources: Learn more from the following resources:
- [@article@Code Injection in LLM Applications](https://neuraltrust.ai/blog/code-injection-in-llms) - [@article@Code Injection in LLM Applications](https://neuraltrust.ai/blog/code-injection-in-llms)
- [@docs@Secure Plugin Sandboxing (OpenAI Plugins)](https://platform.openai.com/docs/plugins/production/security-requirements) - [@article@Code Injection](https://learnprompting.org/docs/prompt_hacking/offensive_measures/code_injection)
- [@guide@Code Injection](https://owasp.org/www-community/attacks/Code_Injection) - [@guide@Code Injection](https://owasp.org/www-community/attacks/Code_Injection)

@ -4,6 +4,7 @@ AI Red Teamers must also understand and test defenses against prompt hacking. Th
Learn more from the following resources: Learn more from the following resources:
- [@article@Prompt Hacking Defensive Measures](https://learnprompting.org/docs/prompt_hacking/defensive_measures/introduction)
- [@article@Mitigating Prompt Injection Attacks (NCC Group Research)](https://research.nccgroup.com/2023/12/01/mitigating-prompt-injection-attacks/) - [@article@Mitigating Prompt Injection Attacks (NCC Group Research)](https://research.nccgroup.com/2023/12/01/mitigating-prompt-injection-attacks/)
- [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection) - [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection)
- [@article@Prompt Injection: Impact, How It Works & 4 Defense Measures](https://www.tigera.io/learn/guides/llm-security/prompt-injection/) - [@article@Prompt Injection: Impact, How It Works & 4 Defense Measures](https://www.tigera.io/learn/guides/llm-security/prompt-injection/)

@ -4,7 +4,6 @@ Capture The Flag competitions increasingly include AI/ML security challenges. Pa
Learn more from the following resources: Learn more from the following resources:
- [@article@Capture the flag (cybersecurity)](https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurity) - [@platform@HackAPrompt](https://www.hackaprompt.com/)
- [@article@Progress from our Frontier Red Team](https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team) - [@article@Progress from our Frontier Red Team](https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team)
- [@platform@CTFtime.org](https://ctftime.org/) - [@platform@CTFtime.org](https://ctftime.org/)
- [@platform@picoCTF](https://picoctf.org/)

@ -5,6 +5,5 @@ AI Red Teamers simulate data poisoning attacks by evaluating how introducing man
Learn more from the following resources: Learn more from the following resources:
- [@article@AI Poisoning](https://www.aiblade.net/p/ai-poisoning-is-it-really-a-threat) - [@article@AI Poisoning](https://www.aiblade.net/p/ai-poisoning-is-it-really-a-threat)
- [@article@Data Poisoning Attacks in ML (Towards Data Science)](https://towardsdatascience.com/data-poisoning-attacks-in-machine-learning-542169587b7f)
- [@paper@Detecting and Preventing Data Poisoning Attacks on AI Models](https://arxiv.org/abs/2503.09302) - [@paper@Detecting and Preventing Data Poisoning Attacks on AI Models](https://arxiv.org/abs/2503.09302)
- [@paper@Poisoning Web-Scale Training Data (arXiv)](https://arxiv.org/abs/2310.12818) - [@paper@Poisoning Web-Scale Training Data (arXiv)](https://arxiv.org/abs/2310.12818)

@ -4,6 +4,6 @@ Direct injection attacks occur when malicious instructions are inserted directly
Learn more from the following resources: Learn more from the following resources:
- [@article@Prompt Injection](https://learnprompting.org/docs/prompt_hacking/injection?srsltid=AfmBOooOKRzLT0Hn2PNdAa69Fietniztfds6Fo1PO8WuIyyXjbLb6XgI)
- [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection) - [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection)
- [@article@Prompt Injection Cheat Sheet (FlowGPT)](https://flowgpt.com/p/prompt-injection-cheat-sheet) - [@article@Prompt Injection Cheat Sheet (FlowGPT)](https://flowgpt.com/p/prompt-injection-cheat-sheet)
- [@report@OpenAI GPT-4 System Card](https://openai.com/research/gpt-4-system-card)

@ -7,4 +7,3 @@ Learn more from the following resources:
- [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/) - [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/)
- [@article@Responsible AI assessment - Responsible AI | Coursera](https://www.coursera.org/learn/ai-security) - [@article@Responsible AI assessment - Responsible AI | Coursera](https://www.coursera.org/learn/ai-security)
- [@guide@Responsible AI Principles (Microsoft)](https://www.microsoft.com/en-us/ai/responsible-ai) - [@guide@Responsible AI Principles (Microsoft)](https://www.microsoft.com/en-us/ai/responsible-ai)
- [@video@Questions to Guide AI Red-Teaming (CMU SEI)](https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=928382)

@ -4,7 +4,7 @@ Engaging in online forums, mailing lists, Discord servers, or subreddits dedicat
Learn more from the following resources: Learn more from the following resources:
- [@community@List of Cybersecurity Discord Servers](https://www.dfir.training/dfir-groups/discord?category[0]=17&category_children=1) - [@community@LearnPrompting Prompt Hacking Discord](https://discord.com/channels/1046228027434086460/1349689482651369492)
- [@community@Reddit - r/MachineLearning](https://www.reddit.com/r/MachineLearning/) - [@community@Reddit - r/ChatGPTJailbreak](https://www.reddit.com/r/ChatGPTJailbreak/)
- [@community@Reddit - r/artificial](https://www.reddit.com/r/artificial/) - [@community@Reddit - r/artificial](https://www.reddit.com/r/artificial/)
- [@community@Reddit - r/cybersecurity](https://www.reddit.com/r/cybersecurity/) - [@community@Reddit - r/cybersecurity](https://www.reddit.com/r/cybersecurity/)

@ -4,6 +4,6 @@ AI Red Teamers focus heavily on generative models (like GANs and LLMs) due to th
Learn more from the following resources: Learn more from the following resources:
- [@article@An Introduction to Generative Models](https://www.mongodb.com/resources/basics/artificial-intelligence/generative-models) - [@article@What is Generative AI?](https://learnprompting.org/docs/basics/generative_ai)
- [@course@Generative AI for Beginners](https://microsoft.github.io/generative-ai-for-beginners/) - [@course@Introduction to Generative AI](https://learnprompting.org/courses/intro-to-gen-ai)
- [@guide@Generative AI beginner's guide](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview) - [@guide@Generative AI beginner's guide](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@paper@The Practical Application of Indirect Prompt Injection Attacks](https://www.researchgate.net/publication/382692833_The_Practical_Application_of_Indirect_Prompt_Injection_Attacks_From_Academia_to_Industry) - [@paper@The Practical Application of Indirect Prompt Injection Attacks](https://www.researchgate.net/publication/382692833_The_Practical_Application_of_Indirect_Prompt_Injection_Attacks_From_Academia_to_Industry)
- [@article@How to Prevent Indirect Prompt Injection Attacks](https://www.cobalt.io/blog/how-to-prevent-indirect-prompt-injection-attacks) - [@article@How to Prevent Indirect Prompt Injection Attacks](https://www.cobalt.io/blog/how-to-prevent-indirect-prompt-injection-attacks)
- [@article@Jailbreaks via Indirect Injection (Practical AI Safety Newsletter)](https://newsletter.practicalai.safety/p/jailbreaks-via-indirect-injection) - [@article@Indirect Prompt Injection Data Exfiltration](https://embracethered.com/blog/posts/2024/chatgpt-macos-app-persistent-data-exfiltration/)

@ -4,5 +4,5 @@ Beyond formal certifications, recognition in the AI Red Teaming field comes from
Learn more from the following resources: Learn more from the following resources:
- [@community@DEF CON - Wikipedia (Mentions Black Badge)](https://en.wikipedia.org/wiki/DEF_CON#Black_Badge) - [@platform@HackAPrompt](https://hackaprompt.com)
- [@community@HackAPrompt (Learn Prompting)](https://learnprompting.org/hackaprompt) - [@platform@RedTeam Arena](https://redarena.ai)

@ -7,4 +7,3 @@ Learn more from the following resources:
- [@article@A Guide to AI Red Teaming](https://hiddenlayer.com/innovation-hub/a-guide-to-ai-red-teaming/) - [@article@A Guide to AI Red Teaming](https://hiddenlayer.com/innovation-hub/a-guide-to-ai-red-teaming/)
- [@article@What is AI Red Teaming? (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming) - [@article@What is AI Red Teaming? (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming)
- [@article@What is AI Red Teaming? The Complete Guide](https://mindgard.ai/blog/what-is-ai-red-teaming) - [@article@What is AI Red Teaming? The Complete Guide](https://mindgard.ai/blog/what-is-ai-red-teaming)
- [@podcast@Red Team Podcast - AI Red Teaming Insights & Defense Strategies](https://mindgard.ai/podcast/red-team)

@ -5,5 +5,5 @@ Jailbreaking is a specific category of prompt hacking where the AI Red Teamer ai
Learn more from the following resources: Learn more from the following resources:
- [@article@InjectPrompt (David Willis-Owen)](https://injectprompt.com) - [@article@InjectPrompt (David Willis-Owen)](https://injectprompt.com)
- [@guide@Prompt Hacking Guide - Learn Prompting](https://learnprompting.org/docs/category/prompt-hacking) - [@guide@Jailbreaking Guide - Learn Prompting](https://learnprompting.org/docs/prompt_hacking/jailbreaking)
- [@paper@Jailbroken: How Does LLM Safety Training Fail? (arXiv)](https://arxiv.org/abs/2307.02483) - [@paper@Jailbroken: How Does LLM Safety Training Fail? (arXiv)](https://arxiv.org/abs/2307.02483)

@ -4,7 +4,8 @@ AI Red Teamers need environments to practice attacking vulnerable systems safely
Learn more from the following resources: Learn more from the following resources:
- [@platform@HackAPrompt Playground](https://learnprompting.org/hackaprompt-playground)
- [@platform@InjectPrompt Playground](https://playground.injectprompt.com/)
- [@platform@Gandalf AI Prompt Injection Lab](https://gandalf.lakera.ai/) - [@platform@Gandalf AI Prompt Injection Lab](https://gandalf.lakera.ai/)
- [@platform@Hack The Box: Hacking Labs](https://www.hackthebox.com/hacker/hacking-labs) - [@platform@Hack The Box: Hacking Labs](https://www.hackthebox.com/hacker/hacking-labs)
- [@platform@TryHackMe: Learn Cyber Security](https://tryhackme.com/) - [@platform@TryHackMe: Learn Cyber Security](https://tryhackme.com/)
- [@platform@VulnHub](https://www.vulnhub.com/)

@ -5,5 +5,5 @@ LLMs are a primary target for AI Red Teaming. Understanding their architecture (
Learn more from the following resources: Learn more from the following resources:
- [@article@What is an LLM (large language model)?](https://www.cloudflare.com/learning/ai/what-is-large-language-model/) - [@article@What is an LLM (large language model)?](https://www.cloudflare.com/learning/ai/what-is-large-language-model/)
- [@guide@Introduction to LLMs - Learn Prompting](https://learnprompting.org/docs/intro_to_llms) - [@guide@ChatGPT For Everyone](https://learnprompting.org/courses/chatgpt-for-everyone)
- [@guide@What Are Large Language Models? A Beginner's Guide for 2025](https://www.kdnuggets.com/large-language-models-beginners-guide-2025) - [@guide@What Are Large Language Models? A Beginner's Guide for 2025](https://www.kdnuggets.com/large-language-models-beginners-guide-2025)

@ -4,7 +4,6 @@ AI Red Teamers perform model inversion tests to assess if an attacker can recons
Learn more from the following resources: Learn more from the following resources:
- [@article@Model Inversion Attacks for ML (Medium)](https://medium.com/@ODSC/model-inversion-attacks-for-machine-learning-ff407a1b10d1)
- [@article@Model inversion and membership inference: Understanding new AI security risks](https://www.hoganlovells.com/en/publications/model-inversion-and-membership-inference-understanding-new-ai-security-risks-and-mitigating-vulnerabilities) - [@article@Model inversion and membership inference: Understanding new AI security risks](https://www.hoganlovells.com/en/publications/model-inversion-and-membership-inference-understanding-new-ai-security-risks-and-mitigating-vulnerabilities)
- [@paper@Extracting Training Data from LLMs (arXiv)](https://arxiv.org/abs/2012.07805) - [@paper@Extracting Training Data from LLMs (arXiv)](https://arxiv.org/abs/2012.07805)
- [@paper@Model Inversion Attacks: A Survey of Approaches and Countermeasures](https://arxiv.org/html/2411.10023v1) - [@paper@Model Inversion Attacks: A Survey of Approaches and Countermeasures](https://arxiv.org/html/2411.10023v1)

@ -5,5 +5,5 @@ This category covers attacks and tests targeting the AI model itself, beyond the
Learn more from the following resources: Learn more from the following resources:
- [@article@AI Security Risks Uncovered: What You Must Know in 2025](https://ttms.com/uk/ai-security-risks-explained-what-you-need-to-know-in-2025/) - [@article@AI Security Risks Uncovered: What You Must Know in 2025](https://ttms.com/uk/ai-security-risks-explained-what-you-need-to-know-in-2025/)
- [@article@Attacking AI Models (Trail of Bits Blog Series)](https://blog.trailofbits.com/category/ai-security/) - [@article@Weaknesses in Modern AI](https://insights.sei.cmu.edu/blog/weaknesses-and-vulnerabilities-in-modern-ai-why-security-and-safety-are-so-challenging/)
- [@report@AI and ML Vulnerabilities (CNAS Report)](https://www.cnas.org/publications/reports/understanding-and-mitigating-ai-vulnerabilities) - [@report@AI and ML Vulnerabilities (CNAS Report)](https://www.cnas.org/publications/reports/understanding-and-mitigating-ai-vulnerabilities)

@ -6,5 +6,4 @@ Learn more from the following resources:
- [@article@A Playbook for Securing AI Model Weights](https://www.rand.org/pubs/research_briefs/RBA2849-1.html) - [@article@A Playbook for Securing AI Model Weights](https://www.rand.org/pubs/research_briefs/RBA2849-1.html)
- [@article@How to Steal a Machine Learning Model (SkyCryptor)](https://skycryptor.com/blog/how-to-steal-a-machine-learning-model) - [@article@How to Steal a Machine Learning Model (SkyCryptor)](https://skycryptor.com/blog/how-to-steal-a-machine-learning-model)
- [@paper@Defense Against Model Stealing (Microsoft Research)](https://www.microsoft.com/en-us/research/publication/defense-against-model-stealing-attacks/)
- [@paper@On the Limitations of Model Stealing with Uncertainty Quantification Models](https://openreview.net/pdf?id=ONRFHoUzNk) - [@paper@On the Limitations of Model Stealing with Uncertainty Quantification Models](https://openreview.net/pdf?id=ONRFHoUzNk)

@ -4,8 +4,6 @@ For AI Red Teamers, prompt engineering is both a tool and a target. It's a tool
Learn more from the following resources: Learn more from the following resources:
- [@article@Introduction to Prompt Engineering](https://www.datacamp.com/tutorial/introduction-prompt-engineering)
- [@article@System Prompts - InjectPrompt](https://www.injectprompt.com/t/system-prompts) - [@article@System Prompts - InjectPrompt](https://www.injectprompt.com/t/system-prompts)
- [@course@Introduction to Prompt Engineering](https://learnprompting.org/courses/intro-to-prompt-engineering) - [@course@Introduction to Prompt Engineering](https://learnprompting.org/courses/intro-to-prompt-engineering)
- [@guide@Prompt Engineering Guide](https://learnprompting.org/docs/prompt-engineering)
- [@guide@The Ultimate Guide to Red Teaming LLMs and Adversarial Prompts (Kili Technology)](https://kili-technology.com/large-language-models-llms/red-teaming-llms-and-adversarial-prompts) - [@guide@The Ultimate Guide to Red Teaming LLMs and Adversarial Prompts (Kili Technology)](https://kili-technology.com/large-language-models-llms/red-teaming-llms-and-adversarial-prompts)

@ -5,5 +5,5 @@ Prompt hacking is a core technique for AI Red Teamers targeting LLMs. It involve
Learn more from the following resources: Learn more from the following resources:
- [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking) - [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking)
- [@guide@Prompt Hacking Guide](https://learnprompting.org/docs/category/prompt-hacking) - [@guide@Prompt Hacking Guide](https://learnprompting.org/docs/prompt_hacking/introduction)
- [@paper@SoK: Prompt Hacking of LLMs (arXiv 2023)](https://arxiv.org/abs/2311.05544) - [@paper@SoK: Prompt Hacking of LLMs (arXiv 2023)](https://arxiv.org/abs/2311.05544)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@Exploiting LLMs with Code Execution (GitHub Gist)](https://gist.github.com/coolaj86/6f4f7b30129b0251f61fa7baaa881516) - [@article@Exploiting LLMs with Code Execution (GitHub Gist)](https://gist.github.com/coolaj86/6f4f7b30129b0251f61fa7baaa881516)
- [@article@What is remote code execution?](https://www.cloudflare.com/learning/security/what-is-remote-code-execution/) - [@article@What is remote code execution?](https://www.cloudflare.com/learning/security/what-is-remote-code-execution/)
- [@video@DEFCON 31 - AI Village - Hacking an LLM embedded system (agent) - Johann Rehberger](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3D6u04C1N69ks?v=1FfYnF2GXVU)

@ -4,6 +4,6 @@ A critical practice for AI Red Teamers is responsible disclosure: privately repo
Learn more from the following resources: Learn more from the following resources:
- [@guide@Responsible Disclosure of AI Vulnerabilities](https://www.preamble.com/blog/responsible-disclosure-of-ai-vulnerabilities) - [@guide@0din.ai Policy](https://0din.ai/policy)
- [@guide@Vulnerability Disclosure Program](https://www.cisa.gov/resources-tools/programs/vulnerability-disclosure-program-vdp) - [@guide@Huntr Guidelines](https://huntr.com/guidelines)
- [@policy@Google Vulnerability Reward Program (VRP)](https://bughunters.google.com/) - [@policy@Google Vulnerability Reward Program (VRP)](https://bughunters.google.com/)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@The Complete Guide to Red Teaming: Process, Benefits & More](https://mindgard.ai/blog/red-teaming) - [@article@The Complete Guide to Red Teaming: Process, Benefits & More](https://mindgard.ai/blog/red-teaming)
- [@article@The Complete Red Teaming Checklist [PDF]: 5 Key Steps - Mindgard AI](https://mindgard.ai/blog/red-teaming-checklist) - [@article@The Complete Red Teaming Checklist [PDF]: 5 Key Steps - Mindgard AI](https://mindgard.ai/blog/red-teaming-checklist)
- [@article@What is AI Red Teaming? - Learn Prompting](https://learnprompting.org/docs/category/ai-red-teaming) - [@article@Red Teaming in Defending AI Systems](https://protectai.com/blog/expanding-role-red-teaming-defending-ai-systems)

@ -6,5 +6,4 @@ Learn more from the following resources:
- [@course@AI Red Teaming Courses - Learn Prompting](https://learnprompting.org/blog/ai-red-teaming-courses) - [@course@AI Red Teaming Courses - Learn Prompting](https://learnprompting.org/blog/ai-red-teaming-courses)
- [@course@AI Security | Coursera](https://www.coursera.org/learn/ai-security) - [@course@AI Security | Coursera](https://www.coursera.org/learn/ai-security)
- [@course@Exploring Adversarial Machine Learning](https://www.nvidia.com/en-us/training/instructor-led-workshops/exploring-adversarial-machine-learning/)
- [@course@Free Online Cyber Security Courses with Certificates in 2025](https://www.eccouncil.org/cybersecurity-exchange/cyber-novice/free-cybersecurity-courses-beginners/) - [@course@Free Online Cyber Security Courses with Certificates in 2025](https://www.eccouncil.org/cybersecurity-exchange/cyber-novice/free-cybersecurity-courses-beginners/)

@ -7,4 +7,3 @@ Learn more from the following resources:
- [@article@Core Components of AI Red Team Exercises (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming) - [@article@Core Components of AI Red Team Exercises (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming)
- [@guide@Threat Modeling Process](https://owasp.org/www-community/Threat_Modeling_Process) - [@guide@Threat Modeling Process](https://owasp.org/www-community/Threat_Modeling_Process)
- [@guide@Threat Modeling](https://owasp.org/www-community/Threat_Modeling) - [@guide@Threat Modeling](https://owasp.org/www-community/Threat_Modeling)
- [@video@How Microsoft Approaches AI Red Teaming (MS Build)](https://learn.microsoft.com/en-us/events/build-may-2023/breakout-responsible-ai-red-teaming/)

@ -4,6 +4,6 @@ AI Red Teamers test if vulnerabilities in the AI system or its interfaces allow
Learn more from the following resources: Learn more from the following resources:
- [@article@Unauthorized Data Access via LLMs (Security Boulevard)](https://securityboulevard.com/2023/11/unauthorized-data-access-via-llms/) - [@article@Defending Model Files from Unauthorized Access](https://developer.nvidia.com/blog/defending-ai-model-files-from-unauthorized-access-with-canaries/)
- [@guide@OWASP API Security Project](https://owasp.org/www-project-api-security/) - [@guide@OWASP API Security Project](https://owasp.org/www-project-api-security/)
- [@paper@AI System Abuse Cases (Harvard Belfer Center)](https://www.belfercenter.org/publication/ai-system-abuse-cases) - [@article@Detecting Unauthorized Usage](https://www.unr.edu/digital-learning/instructional-strategies/understanding-and-integrating-generative-ai-in-teaching/how-can-i-detect-unauthorized-ai-usage)

@ -5,5 +5,5 @@ White-box testing in AI Red Teaming grants the tester full access to the model's
Learn more from the following resources: Learn more from the following resources:
- [@article@Black-Box, Gray Box, and White-Box Penetration Testing](https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/black-box-gray-box-and-white-box-penetration-testing-importance-and-uses/) - [@article@Black-Box, Gray Box, and White-Box Penetration Testing](https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/black-box-gray-box-and-white-box-penetration-testing-importance-and-uses/)
- [@article@White-Box Adversarial Examples (OpenAI Blog)](https://openai.com/research/adversarial-robustness-toolbox) - [@article@What is White Box Penetration Testing](https://www.getastra.com/blog/security-audit/white-box-penetration-testing/)
- [@guide@LLM red teaming guide (open source)](https://www.promptfoo.dev/docs/red-team/) - [@article@The Art of White Box Pentesting](https://infosecwriteups.com/cracking-the-code-the-art-of-white-box-pentesting-de296bc22c67)

@ -1,3 +1,8 @@
# Why Red Team AI Systems? # Why Red Team AI Systems?
AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment. AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment.
Learn more from the following resources:
- [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking)
- [@article@Prompt Hacking Offensive Measures](https://learnprompting.org/docs/prompt_hacking/offensive_measures/introduction)

Loading…
Cancel
Save