Update resources in AI Red Teaming Roadmap (#8570)

* Update why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md

* Update prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md

* Update generative-models@3XJ-g0KvHP75U18mxCqgw.md

* Update prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md

* Update jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md

* Update countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md

* Update forums@Smncq-n1OlnLAY27AFQOO.md

* Update lab-environments@MmwwRK4I9aRH_ha7duPqf.md

* Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md

* Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md

* Update industry-credentials@HHjsFR6wRDqUd66PMDE_7.md

* Update agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md

* Update responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md

* Update benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md

* Update adversarial-examples@xjlttOti-_laPRn8a2fVy.md

* Update large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md

* Update introduction@HFJIYcI16OMyM77fAw9af.md

* Update ethical-considerations@1gyuEV519LjN-KpROoVwv.md

* Update role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md

* Update threat-modeling@RDOaTBWP3aIJPUp_kcafm.md

* Update direct@5zHow4KZVpfhch5Aabeft.md

* Update indirect@3_gJRtJSdm2iAfkwmcv0e.md

* Update model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md

* Update model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md

* Update unauthorized-access@DQeOavZCoXpF3k_qRDABs.md

* Update data-poisoning@nD0_64ELEeJSN-0aZiR7i.md

* Update model-inversion@iE5PcswBHnu_EBFIacib0.md

* Update code-injection@vhBu5x8INTtqvx6vcYAhE.md

* Update remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md

* Update api-protection@Tszl26iNBnQBdBEWOueDA.md

* Update authentication@J7gjlt2MBx7lOkOnfGvPF.md

* Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md

* Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md

* Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md

* Update automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md

* Update specialized-courses@s1xKK8HL5-QGZpcutiuvj.md
master
David Willis-Owen 1 day ago committed by GitHub
parent 2937923fb1
commit 80a0caba2f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 1
      src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md
  2. 3
      src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md
  3. 1
      src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md
  4. 2
      src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md
  5. 2
      src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md
  6. 3
      src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md
  7. 2
      src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md
  8. 1
      src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md
  9. 3
      src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md
  10. 1
      src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md
  11. 2
      src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md
  12. 1
      src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md
  13. 4
      src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md
  14. 4
      src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md
  15. 2
      src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md
  16. 4
      src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md
  17. 1
      src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md
  18. 2
      src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md
  19. 3
      src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md
  20. 2
      src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md
  21. 1
      src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md
  22. 2
      src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md
  23. 1
      src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md
  24. 2
      src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md
  25. 2
      src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md
  26. 2
      src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md
  27. 4
      src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md
  28. 2
      src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md
  29. 1
      src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md
  30. 1
      src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md
  31. 4
      src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md
  32. 4
      src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md
  33. 5
      src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md

@ -4,7 +4,6 @@ A core AI Red Teaming activity involves generating adversarial examples – inpu
Learn more from the following resources:
- [@article@Adversarial Examples Explained (OpenAI Blog)](https://openai.com/research/adversarial-examples)
- [@guide@Adversarial Examples – Interpretable Machine Learning Book](https://christophm.github.io/interpretable-ml-book/adversarial.html)
- [@guide@Adversarial Testing for Generative AI](https://developers.google.com/machine-learning/guides/adv-testing)
- [@video@How AI Can Be Tricked With Adversarial Attacks](https://www.youtube.com/watch?v=J3X_JWQkvo8?v=MPcfoQBDY0w)

@ -5,5 +5,4 @@ As AI agents capable of autonomous action become more common, AI Red Teamers mus
Learn more from the following resources:
- [@article@AI Agents - Learn Prompting](https://learnprompting.org/docs/intermediate/ai_agents)
- [@article@Reasoning models don't always say what they think](https://www.anthropic.com/research/reasoning-models-dont-always-say-what-they-think)
- [@course@Certified AI Red Team Operator – Autonomous Systems (CAIRTO-AS) from Tonex, Inc.](https://niccs.cisa.gov/education-training/catalog/tonex-inc/certified-ai-red-team-operator-autonomous-systems-cairto)
- [@article@EmbraceTheRed](https://embracethered.com/)

@ -4,7 +4,6 @@ AI Red Teamers rigorously test the security of APIs providing access to AI model
Learn more from the following resources:
- [@article@API Protection for AI Factories: The First Step to AI Security](https://www.f5.com/company/blog/api-security-for-ai-factories)
- [@article@Securing APIs with AI for Advanced Threat Protection](https://adevait.com/artificial-intelligence/securing-apis-with-ai)
- [@article@Securing Machine Learning APIs (IBM)](https://developer.ibm.com/articles/se-securing-machine-learning-apis/)
- [@guide@OWASP API Security Project (Top 10 2023)](https://owasp.org/www-project-api-security/)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/)
- [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization)
- [@video@How JWTs are used for Authentication (and how to bypass it)](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3Dexample_video_panel_url?v=3OpQi65s_ME)
- [@article@JWT Attacks](https://portswigger.net/web-security/jwt)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@Automation Testing vs. Manual Testing: Which is the better approach?](https://www.opkey.com/blog/automation-testing-vs-manual-testing-which-is-better)
- [@article@Manual Testing vs Automated Testing: What's the Difference?](https://www.leapwork.com/blog/manual-vs-automated-testing)
- [@guide@LLM red teaming guide (open source)](https://www.promptfoo.dev/docs/red-team/)
- [@tool@Spikee](https://spikee.ai)

@ -1,9 +1,10 @@
# Benchmark Datasets
AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models.
AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like HackAprompt, SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models.
Learn more from the following resources:
- [@dataset@HackAPrompt Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset)
- [@dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset](https://github.com/cysecbench/dataset)
- [@dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security](https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html)
- [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity](https://arxiv.org/abs/2412.20787)

@ -5,5 +5,5 @@ AI Red Teamers test for code injection vulnerabilities specifically in the conte
Learn more from the following resources:
- [@article@Code Injection in LLM Applications](https://neuraltrust.ai/blog/code-injection-in-llms)
- [@docs@Secure Plugin Sandboxing (OpenAI Plugins)](https://platform.openai.com/docs/plugins/production/security-requirements)
- [@article@Code Injection](https://learnprompting.org/docs/prompt_hacking/offensive_measures/code_injection)
- [@guide@Code Injection](https://owasp.org/www-community/attacks/Code_Injection)

@ -4,6 +4,7 @@ AI Red Teamers must also understand and test defenses against prompt hacking. Th
Learn more from the following resources:
- [@article@Prompt Hacking Defensive Measures](https://learnprompting.org/docs/prompt_hacking/defensive_measures/introduction)
- [@article@Mitigating Prompt Injection Attacks (NCC Group Research)](https://research.nccgroup.com/2023/12/01/mitigating-prompt-injection-attacks/)
- [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection)
- [@article@Prompt Injection: Impact, How It Works & 4 Defense Measures](https://www.tigera.io/learn/guides/llm-security/prompt-injection/)

@ -4,7 +4,6 @@ Capture The Flag competitions increasingly include AI/ML security challenges. Pa
Learn more from the following resources:
- [@article@Capture the flag (cybersecurity)](https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurity)
- [@platform@HackAPrompt](https://www.hackaprompt.com/)
- [@article@Progress from our Frontier Red Team](https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team)
- [@platform@CTFtime.org](https://ctftime.org/)
- [@platform@picoCTF](https://picoctf.org/)

@ -5,6 +5,5 @@ AI Red Teamers simulate data poisoning attacks by evaluating how introducing man
Learn more from the following resources:
- [@article@AI Poisoning](https://www.aiblade.net/p/ai-poisoning-is-it-really-a-threat)
- [@article@Data Poisoning Attacks in ML (Towards Data Science)](https://towardsdatascience.com/data-poisoning-attacks-in-machine-learning-542169587b7f)
- [@paper@Detecting and Preventing Data Poisoning Attacks on AI Models](https://arxiv.org/abs/2503.09302)
- [@paper@Poisoning Web-Scale Training Data (arXiv)](https://arxiv.org/abs/2310.12818)

@ -4,6 +4,6 @@ Direct injection attacks occur when malicious instructions are inserted directly
Learn more from the following resources:
- [@article@Prompt Injection](https://learnprompting.org/docs/prompt_hacking/injection?srsltid=AfmBOooOKRzLT0Hn2PNdAa69Fietniztfds6Fo1PO8WuIyyXjbLb6XgI)
- [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection)
- [@article@Prompt Injection Cheat Sheet (FlowGPT)](https://flowgpt.com/p/prompt-injection-cheat-sheet)
- [@report@OpenAI GPT-4 System Card](https://openai.com/research/gpt-4-system-card)

@ -7,4 +7,3 @@ Learn more from the following resources:
- [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/)
- [@article@Responsible AI assessment - Responsible AI | Coursera](https://www.coursera.org/learn/ai-security)
- [@guide@Responsible AI Principles (Microsoft)](https://www.microsoft.com/en-us/ai/responsible-ai)
- [@video@Questions to Guide AI Red-Teaming (CMU SEI)](https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=928382)

@ -4,7 +4,7 @@ Engaging in online forums, mailing lists, Discord servers, or subreddits dedicat
Learn more from the following resources:
- [@community@List of Cybersecurity Discord Servers](https://www.dfir.training/dfir-groups/discord?category[0]=17&category_children=1)
- [@community@Reddit - r/MachineLearning](https://www.reddit.com/r/MachineLearning/)
- [@community@LearnPrompting Prompt Hacking Discord](https://discord.com/channels/1046228027434086460/1349689482651369492)
- [@community@Reddit - r/ChatGPTJailbreak](https://www.reddit.com/r/ChatGPTJailbreak/)
- [@community@Reddit - r/artificial](https://www.reddit.com/r/artificial/)
- [@community@Reddit - r/cybersecurity](https://www.reddit.com/r/cybersecurity/)

@ -4,6 +4,6 @@ AI Red Teamers focus heavily on generative models (like GANs and LLMs) due to th
Learn more from the following resources:
- [@article@An Introduction to Generative Models](https://www.mongodb.com/resources/basics/artificial-intelligence/generative-models)
- [@course@Generative AI for Beginners](https://microsoft.github.io/generative-ai-for-beginners/)
- [@article@What is Generative AI?](https://learnprompting.org/docs/basics/generative_ai)
- [@course@Introduction to Generative AI](https://learnprompting.org/courses/intro-to-gen-ai)
- [@guide@Generative AI beginner's guide](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@paper@The Practical Application of Indirect Prompt Injection Attacks](https://www.researchgate.net/publication/382692833_The_Practical_Application_of_Indirect_Prompt_Injection_Attacks_From_Academia_to_Industry)
- [@article@How to Prevent Indirect Prompt Injection Attacks](https://www.cobalt.io/blog/how-to-prevent-indirect-prompt-injection-attacks)
- [@article@Jailbreaks via Indirect Injection (Practical AI Safety Newsletter)](https://newsletter.practicalai.safety/p/jailbreaks-via-indirect-injection)
- [@article@Indirect Prompt Injection Data Exfiltration](https://embracethered.com/blog/posts/2024/chatgpt-macos-app-persistent-data-exfiltration/)

@ -4,5 +4,5 @@ Beyond formal certifications, recognition in the AI Red Teaming field comes from
Learn more from the following resources:
- [@community@DEF CON - Wikipedia (Mentions Black Badge)](https://en.wikipedia.org/wiki/DEF_CON#Black_Badge)
- [@community@HackAPrompt (Learn Prompting)](https://learnprompting.org/hackaprompt)
- [@platform@HackAPrompt](https://hackaprompt.com)
- [@platform@RedTeam Arena](https://redarena.ai)

@ -7,4 +7,3 @@ Learn more from the following resources:
- [@article@A Guide to AI Red Teaming](https://hiddenlayer.com/innovation-hub/a-guide-to-ai-red-teaming/)
- [@article@What is AI Red Teaming? (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming)
- [@article@What is AI Red Teaming? The Complete Guide](https://mindgard.ai/blog/what-is-ai-red-teaming)
- [@podcast@Red Team Podcast - AI Red Teaming Insights & Defense Strategies](https://mindgard.ai/podcast/red-team)

@ -5,5 +5,5 @@ Jailbreaking is a specific category of prompt hacking where the AI Red Teamer ai
Learn more from the following resources:
- [@article@InjectPrompt (David Willis-Owen)](https://injectprompt.com)
- [@guide@Prompt Hacking Guide - Learn Prompting](https://learnprompting.org/docs/category/prompt-hacking)
- [@guide@Jailbreaking Guide - Learn Prompting](https://learnprompting.org/docs/prompt_hacking/jailbreaking)
- [@paper@Jailbroken: How Does LLM Safety Training Fail? (arXiv)](https://arxiv.org/abs/2307.02483)

@ -4,7 +4,8 @@ AI Red Teamers need environments to practice attacking vulnerable systems safely
Learn more from the following resources:
- [@platform@HackAPrompt Playground](https://learnprompting.org/hackaprompt-playground)
- [@platform@InjectPrompt Playground](https://playground.injectprompt.com/)
- [@platform@Gandalf AI Prompt Injection Lab](https://gandalf.lakera.ai/)
- [@platform@Hack The Box: Hacking Labs](https://www.hackthebox.com/hacker/hacking-labs)
- [@platform@TryHackMe: Learn Cyber Security](https://tryhackme.com/)
- [@platform@VulnHub](https://www.vulnhub.com/)

@ -5,5 +5,5 @@ LLMs are a primary target for AI Red Teaming. Understanding their architecture (
Learn more from the following resources:
- [@article@What is an LLM (large language model)?](https://www.cloudflare.com/learning/ai/what-is-large-language-model/)
- [@guide@Introduction to LLMs - Learn Prompting](https://learnprompting.org/docs/intro_to_llms)
- [@guide@ChatGPT For Everyone](https://learnprompting.org/courses/chatgpt-for-everyone)
- [@guide@What Are Large Language Models? A Beginner's Guide for 2025](https://www.kdnuggets.com/large-language-models-beginners-guide-2025)

@ -4,7 +4,6 @@ AI Red Teamers perform model inversion tests to assess if an attacker can recons
Learn more from the following resources:
- [@article@Model Inversion Attacks for ML (Medium)](https://medium.com/@ODSC/model-inversion-attacks-for-machine-learning-ff407a1b10d1)
- [@article@Model inversion and membership inference: Understanding new AI security risks](https://www.hoganlovells.com/en/publications/model-inversion-and-membership-inference-understanding-new-ai-security-risks-and-mitigating-vulnerabilities)
- [@paper@Extracting Training Data from LLMs (arXiv)](https://arxiv.org/abs/2012.07805)
- [@paper@Model Inversion Attacks: A Survey of Approaches and Countermeasures](https://arxiv.org/html/2411.10023v1)

@ -5,5 +5,5 @@ This category covers attacks and tests targeting the AI model itself, beyond the
Learn more from the following resources:
- [@article@AI Security Risks Uncovered: What You Must Know in 2025](https://ttms.com/uk/ai-security-risks-explained-what-you-need-to-know-in-2025/)
- [@article@Attacking AI Models (Trail of Bits Blog Series)](https://blog.trailofbits.com/category/ai-security/)
- [@article@Weaknesses in Modern AI](https://insights.sei.cmu.edu/blog/weaknesses-and-vulnerabilities-in-modern-ai-why-security-and-safety-are-so-challenging/)
- [@report@AI and ML Vulnerabilities (CNAS Report)](https://www.cnas.org/publications/reports/understanding-and-mitigating-ai-vulnerabilities)

@ -6,5 +6,4 @@ Learn more from the following resources:
- [@article@A Playbook for Securing AI Model Weights](https://www.rand.org/pubs/research_briefs/RBA2849-1.html)
- [@article@How to Steal a Machine Learning Model (SkyCryptor)](https://skycryptor.com/blog/how-to-steal-a-machine-learning-model)
- [@paper@Defense Against Model Stealing (Microsoft Research)](https://www.microsoft.com/en-us/research/publication/defense-against-model-stealing-attacks/)
- [@paper@On the Limitations of Model Stealing with Uncertainty Quantification Models](https://openreview.net/pdf?id=ONRFHoUzNk)

@ -4,8 +4,6 @@ For AI Red Teamers, prompt engineering is both a tool and a target. It's a tool
Learn more from the following resources:
- [@article@Introduction to Prompt Engineering](https://www.datacamp.com/tutorial/introduction-prompt-engineering)
- [@article@System Prompts - InjectPrompt](https://www.injectprompt.com/t/system-prompts)
- [@course@Introduction to Prompt Engineering](https://learnprompting.org/courses/intro-to-prompt-engineering)
- [@guide@Prompt Engineering Guide](https://learnprompting.org/docs/prompt-engineering)
- [@guide@The Ultimate Guide to Red Teaming LLMs and Adversarial Prompts (Kili Technology)](https://kili-technology.com/large-language-models-llms/red-teaming-llms-and-adversarial-prompts)

@ -5,5 +5,5 @@ Prompt hacking is a core technique for AI Red Teamers targeting LLMs. It involve
Learn more from the following resources:
- [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking)
- [@guide@Prompt Hacking Guide](https://learnprompting.org/docs/category/prompt-hacking)
- [@guide@Prompt Hacking Guide](https://learnprompting.org/docs/prompt_hacking/introduction)
- [@paper@SoK: Prompt Hacking of LLMs (arXiv 2023)](https://arxiv.org/abs/2311.05544)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@Exploiting LLMs with Code Execution (GitHub Gist)](https://gist.github.com/coolaj86/6f4f7b30129b0251f61fa7baaa881516)
- [@article@What is remote code execution?](https://www.cloudflare.com/learning/security/what-is-remote-code-execution/)
- [@video@DEFCON 31 - AI Village - Hacking an LLM embedded system (agent) - Johann Rehberger](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3D6u04C1N69ks?v=1FfYnF2GXVU)

@ -4,6 +4,6 @@ A critical practice for AI Red Teamers is responsible disclosure: privately repo
Learn more from the following resources:
- [@guide@Responsible Disclosure of AI Vulnerabilities](https://www.preamble.com/blog/responsible-disclosure-of-ai-vulnerabilities)
- [@guide@Vulnerability Disclosure Program](https://www.cisa.gov/resources-tools/programs/vulnerability-disclosure-program-vdp)
- [@guide@0din.ai Policy](https://0din.ai/policy)
- [@guide@Huntr Guidelines](https://huntr.com/guidelines)
- [@policy@Google Vulnerability Reward Program (VRP)](https://bughunters.google.com/)

@ -6,4 +6,4 @@ Learn more from the following resources:
- [@article@The Complete Guide to Red Teaming: Process, Benefits & More](https://mindgard.ai/blog/red-teaming)
- [@article@The Complete Red Teaming Checklist [PDF]: 5 Key Steps - Mindgard AI](https://mindgard.ai/blog/red-teaming-checklist)
- [@article@What is AI Red Teaming? - Learn Prompting](https://learnprompting.org/docs/category/ai-red-teaming)
- [@article@Red Teaming in Defending AI Systems](https://protectai.com/blog/expanding-role-red-teaming-defending-ai-systems)

@ -6,5 +6,4 @@ Learn more from the following resources:
- [@course@AI Red Teaming Courses - Learn Prompting](https://learnprompting.org/blog/ai-red-teaming-courses)
- [@course@AI Security | Coursera](https://www.coursera.org/learn/ai-security)
- [@course@Exploring Adversarial Machine Learning](https://www.nvidia.com/en-us/training/instructor-led-workshops/exploring-adversarial-machine-learning/)
- [@course@Free Online Cyber Security Courses with Certificates in 2025](https://www.eccouncil.org/cybersecurity-exchange/cyber-novice/free-cybersecurity-courses-beginners/)

@ -7,4 +7,3 @@ Learn more from the following resources:
- [@article@Core Components of AI Red Team Exercises (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming)
- [@guide@Threat Modeling Process](https://owasp.org/www-community/Threat_Modeling_Process)
- [@guide@Threat Modeling](https://owasp.org/www-community/Threat_Modeling)
- [@video@How Microsoft Approaches AI Red Teaming (MS Build)](https://learn.microsoft.com/en-us/events/build-may-2023/breakout-responsible-ai-red-teaming/)

@ -4,6 +4,6 @@ AI Red Teamers test if vulnerabilities in the AI system or its interfaces allow
Learn more from the following resources:
- [@article@Unauthorized Data Access via LLMs (Security Boulevard)](https://securityboulevard.com/2023/11/unauthorized-data-access-via-llms/)
- [@article@Defending Model Files from Unauthorized Access](https://developer.nvidia.com/blog/defending-ai-model-files-from-unauthorized-access-with-canaries/)
- [@guide@OWASP API Security Project](https://owasp.org/www-project-api-security/)
- [@paper@AI System Abuse Cases (Harvard Belfer Center)](https://www.belfercenter.org/publication/ai-system-abuse-cases)
- [@article@Detecting Unauthorized Usage](https://www.unr.edu/digital-learning/instructional-strategies/understanding-and-integrating-generative-ai-in-teaching/how-can-i-detect-unauthorized-ai-usage)

@ -5,5 +5,5 @@ White-box testing in AI Red Teaming grants the tester full access to the model's
Learn more from the following resources:
- [@article@Black-Box, Gray Box, and White-Box Penetration Testing](https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/black-box-gray-box-and-white-box-penetration-testing-importance-and-uses/)
- [@article@White-Box Adversarial Examples (OpenAI Blog)](https://openai.com/research/adversarial-robustness-toolbox)
- [@guide@LLM red teaming guide (open source)](https://www.promptfoo.dev/docs/red-team/)
- [@article@What is White Box Penetration Testing](https://www.getastra.com/blog/security-audit/white-box-penetration-testing/)
- [@article@The Art of White Box Pentesting](https://infosecwriteups.com/cracking-the-code-the-art-of-white-box-pentesting-de296bc22c67)

@ -1,3 +1,8 @@
# Why Red Team AI Systems?
AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment.
Learn more from the following resources:
- [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking)
- [@article@Prompt Hacking Offensive Measures](https://learnprompting.org/docs/prompt_hacking/offensive_measures/introduction)

Loading…
Cancel
Save