From 80a0caba2f010dad16735b9b65df1fdb78c74e77 Mon Sep 17 00:00:00 2001 From: David Willis-Owen <100765093+davidwillisowen@users.noreply.github.com> Date: Mon, 28 Apr 2025 13:12:11 +0100 Subject: [PATCH] Update resources in AI Red Teaming Roadmap (#8570) * Update why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md * Update prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md * Update generative-models@3XJ-g0KvHP75U18mxCqgw.md * Update prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md * Update jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md * Update countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md * Update forums@Smncq-n1OlnLAY27AFQOO.md * Update lab-environments@MmwwRK4I9aRH_ha7duPqf.md * Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md * Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md * Update industry-credentials@HHjsFR6wRDqUd66PMDE_7.md * Update agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md * Update responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md * Update benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md * Update adversarial-examples@xjlttOti-_laPRn8a2fVy.md * Update large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md * Update introduction@HFJIYcI16OMyM77fAw9af.md * Update ethical-considerations@1gyuEV519LjN-KpROoVwv.md * Update role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md * Update threat-modeling@RDOaTBWP3aIJPUp_kcafm.md * Update direct@5zHow4KZVpfhch5Aabeft.md * Update indirect@3_gJRtJSdm2iAfkwmcv0e.md * Update model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md * Update model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md * Update unauthorized-access@DQeOavZCoXpF3k_qRDABs.md * Update data-poisoning@nD0_64ELEeJSN-0aZiR7i.md * Update model-inversion@iE5PcswBHnu_EBFIacib0.md * Update code-injection@vhBu5x8INTtqvx6vcYAhE.md * Update remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md * Update api-protection@Tszl26iNBnQBdBEWOueDA.md * Update authentication@J7gjlt2MBx7lOkOnfGvPF.md * Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md * Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md * Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md * Update automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md * Update specialized-courses@s1xKK8HL5-QGZpcutiuvj.md --- .../content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md | 1 - .../content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md | 3 +-- .../content/api-protection@Tszl26iNBnQBdBEWOueDA.md | 1 - .../content/authentication@J7gjlt2MBx7lOkOnfGvPF.md | 2 +- .../content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md | 2 +- .../content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md | 3 ++- .../content/code-injection@vhBu5x8INTtqvx6vcYAhE.md | 2 +- .../content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md | 1 + .../content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md | 3 +-- .../content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md | 1 - .../ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md | 2 +- .../ethical-considerations@1gyuEV519LjN-KpROoVwv.md | 1 - .../ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md | 4 ++-- .../content/generative-models@3XJ-g0KvHP75U18mxCqgw.md | 4 ++-- .../content/indirect@3_gJRtJSdm2iAfkwmcv0e.md | 2 +- .../content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md | 4 ++-- .../content/introduction@HFJIYcI16OMyM77fAw9af.md | 1 - .../content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md | 2 +- .../content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md | 3 ++- .../content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md | 2 +- .../content/model-inversion@iE5PcswBHnu_EBFIacib0.md | 1 - .../content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md | 2 +- .../content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md | 1 - .../content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md | 2 -- .../content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md | 2 +- .../content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md | 2 +- .../responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md | 4 ++-- .../content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md | 2 +- .../content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md | 1 - .../content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md | 1 - .../content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md | 4 ++-- .../content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md | 4 ++-- .../why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md | 7 ++++++- 33 files changed, 36 insertions(+), 41 deletions(-) diff --git a/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md b/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md index 4eda151d6..4cc1c5f78 100644 --- a/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md +++ b/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md @@ -4,7 +4,6 @@ A core AI Red Teaming activity involves generating adversarial examples – inpu Learn more from the following resources: -- [@article@Adversarial Examples Explained (OpenAI Blog)](https://openai.com/research/adversarial-examples) - [@guide@Adversarial Examples – Interpretable Machine Learning Book](https://christophm.github.io/interpretable-ml-book/adversarial.html) - [@guide@Adversarial Testing for Generative AI](https://developers.google.com/machine-learning/guides/adv-testing) - [@video@How AI Can Be Tricked With Adversarial Attacks](https://www.youtube.com/watch?v=J3X_JWQkvo8?v=MPcfoQBDY0w) diff --git a/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md b/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md index 9f41962db..a6a86529d 100644 --- a/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md +++ b/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md @@ -5,5 +5,4 @@ As AI agents capable of autonomous action become more common, AI Red Teamers mus Learn more from the following resources: - [@article@AI Agents - Learn Prompting](https://learnprompting.org/docs/intermediate/ai_agents) -- [@article@Reasoning models don't always say what they think](https://www.anthropic.com/research/reasoning-models-dont-always-say-what-they-think) -- [@course@Certified AI Red Team Operator – Autonomous Systems (CAIRTO-AS) from Tonex, Inc.](https://niccs.cisa.gov/education-training/catalog/tonex-inc/certified-ai-red-team-operator-autonomous-systems-cairto) +- [@article@EmbraceTheRed](https://embracethered.com/) diff --git a/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md b/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md index f9350a365..3dffaa0ee 100644 --- a/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md +++ b/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md @@ -4,7 +4,6 @@ AI Red Teamers rigorously test the security of APIs providing access to AI model Learn more from the following resources: -- [@article@API Protection for AI Factories: The First Step to AI Security](https://www.f5.com/company/blog/api-security-for-ai-factories) - [@article@Securing APIs with AI for Advanced Threat Protection](https://adevait.com/artificial-intelligence/securing-apis-with-ai) - [@article@Securing Machine Learning APIs (IBM)](https://developer.ibm.com/articles/se-securing-machine-learning-apis/) - [@guide@OWASP API Security Project (Top 10 2023)](https://owasp.org/www-project-api-security/) diff --git a/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md b/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md index 9b93f5b5c..fe9fab550 100644 --- a/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md +++ b/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md @@ -6,4 +6,4 @@ Learn more from the following resources: - [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/) - [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization) -- [@video@How JWTs are used for Authentication (and how to bypass it)](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3Dexample_video_panel_url?v=3OpQi65s_ME) +- [@article@JWT Attacks](https://portswigger.net/web-security/jwt) diff --git a/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md b/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md index d83fb7e8e..790d8d4d1 100644 --- a/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md +++ b/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md @@ -6,4 +6,4 @@ Learn more from the following resources: - [@article@Automation Testing vs. Manual Testing: Which is the better approach?](https://www.opkey.com/blog/automation-testing-vs-manual-testing-which-is-better) - [@article@Manual Testing vs Automated Testing: What's the Difference?](https://www.leapwork.com/blog/manual-vs-automated-testing) -- [@guide@LLM red teaming guide (open source)](https://www.promptfoo.dev/docs/red-team/) +- [@tool@Spikee](https://spikee.ai) diff --git a/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md b/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md index 6525bf1e2..b4ede08a6 100644 --- a/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md +++ b/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md @@ -1,9 +1,10 @@ # Benchmark Datasets -AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models. +AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like HackAprompt, SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models. Learn more from the following resources: +- [@dataset@HackAPrompt Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset) - [@dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset](https://github.com/cysecbench/dataset) - [@dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security](https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html) - [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity](https://arxiv.org/abs/2412.20787) diff --git a/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md b/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md index 3b5431bc4..16f13a0cc 100644 --- a/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md +++ b/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md @@ -5,5 +5,5 @@ AI Red Teamers test for code injection vulnerabilities specifically in the conte Learn more from the following resources: - [@article@Code Injection in LLM Applications](https://neuraltrust.ai/blog/code-injection-in-llms) -- [@docs@Secure Plugin Sandboxing (OpenAI Plugins)](https://platform.openai.com/docs/plugins/production/security-requirements) +- [@article@Code Injection](https://learnprompting.org/docs/prompt_hacking/offensive_measures/code_injection) - [@guide@Code Injection](https://owasp.org/www-community/attacks/Code_Injection) diff --git a/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md b/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md index 3cf4d616c..907fbbe79 100644 --- a/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md +++ b/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md @@ -4,6 +4,7 @@ AI Red Teamers must also understand and test defenses against prompt hacking. Th Learn more from the following resources: +- [@article@Prompt Hacking Defensive Measures](https://learnprompting.org/docs/prompt_hacking/defensive_measures/introduction) - [@article@Mitigating Prompt Injection Attacks (NCC Group Research)](https://research.nccgroup.com/2023/12/01/mitigating-prompt-injection-attacks/) - [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection) - [@article@Prompt Injection: Impact, How It Works & 4 Defense Measures](https://www.tigera.io/learn/guides/llm-security/prompt-injection/) diff --git a/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md b/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md index bca8cfb12..4785638b4 100644 --- a/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md +++ b/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md @@ -4,7 +4,6 @@ Capture The Flag competitions increasingly include AI/ML security challenges. Pa Learn more from the following resources: -- [@article@Capture the flag (cybersecurity)](https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurity) +- [@platform@HackAPrompt](https://www.hackaprompt.com/) - [@article@Progress from our Frontier Red Team](https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team) - [@platform@CTFtime.org](https://ctftime.org/) -- [@platform@picoCTF](https://picoctf.org/) diff --git a/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md b/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md index 057d5549f..8b0389524 100644 --- a/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md +++ b/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md @@ -5,6 +5,5 @@ AI Red Teamers simulate data poisoning attacks by evaluating how introducing man Learn more from the following resources: - [@article@AI Poisoning](https://www.aiblade.net/p/ai-poisoning-is-it-really-a-threat) -- [@article@Data Poisoning Attacks in ML (Towards Data Science)](https://towardsdatascience.com/data-poisoning-attacks-in-machine-learning-542169587b7f) - [@paper@Detecting and Preventing Data Poisoning Attacks on AI Models](https://arxiv.org/abs/2503.09302) - [@paper@Poisoning Web-Scale Training Data (arXiv)](https://arxiv.org/abs/2310.12818) diff --git a/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md b/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md index ea4d9bf05..9dd8801cf 100644 --- a/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md +++ b/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md @@ -4,6 +4,6 @@ Direct injection attacks occur when malicious instructions are inserted directly Learn more from the following resources: +- [@article@Prompt Injection](https://learnprompting.org/docs/prompt_hacking/injection?srsltid=AfmBOooOKRzLT0Hn2PNdAa69Fietniztfds6Fo1PO8WuIyyXjbLb6XgI) - [@article@Prompt Injection & the Rise of Prompt Attacks](https://www.lakera.ai/blog/guide-to-prompt-injection) - [@article@Prompt Injection Cheat Sheet (FlowGPT)](https://flowgpt.com/p/prompt-injection-cheat-sheet) -- [@report@OpenAI GPT-4 System Card](https://openai.com/research/gpt-4-system-card) diff --git a/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md b/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md index 6f1a1abc6..28c06c27e 100644 --- a/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md +++ b/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md @@ -7,4 +7,3 @@ Learn more from the following resources: - [@article@Red-Teaming in AI Testing: Stress Testing](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/) - [@article@Responsible AI assessment - Responsible AI | Coursera](https://www.coursera.org/learn/ai-security) - [@guide@Responsible AI Principles (Microsoft)](https://www.microsoft.com/en-us/ai/responsible-ai) -- [@video@Questions to Guide AI Red-Teaming (CMU SEI)](https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=928382) diff --git a/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md b/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md index b14193458..6b117d95c 100644 --- a/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md +++ b/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md @@ -4,7 +4,7 @@ Engaging in online forums, mailing lists, Discord servers, or subreddits dedicat Learn more from the following resources: -- [@community@List of Cybersecurity Discord Servers](https://www.dfir.training/dfir-groups/discord?category[0]=17&category_children=1) -- [@community@Reddit - r/MachineLearning](https://www.reddit.com/r/MachineLearning/) +- [@community@LearnPrompting Prompt Hacking Discord](https://discord.com/channels/1046228027434086460/1349689482651369492) +- [@community@Reddit - r/ChatGPTJailbreak](https://www.reddit.com/r/ChatGPTJailbreak/) - [@community@Reddit - r/artificial](https://www.reddit.com/r/artificial/) - [@community@Reddit - r/cybersecurity](https://www.reddit.com/r/cybersecurity/) diff --git a/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md b/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md index 1a734dc24..57f7be1a6 100644 --- a/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md +++ b/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md @@ -4,6 +4,6 @@ AI Red Teamers focus heavily on generative models (like GANs and LLMs) due to th Learn more from the following resources: -- [@article@An Introduction to Generative Models](https://www.mongodb.com/resources/basics/artificial-intelligence/generative-models) -- [@course@Generative AI for Beginners](https://microsoft.github.io/generative-ai-for-beginners/) +- [@article@What is Generative AI?](https://learnprompting.org/docs/basics/generative_ai) +- [@course@Introduction to Generative AI](https://learnprompting.org/courses/intro-to-gen-ai) - [@guide@Generative AI beginner's guide](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview) diff --git a/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md b/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md index 6b467d481..7d6d375ab 100644 --- a/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md +++ b/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md @@ -6,4 +6,4 @@ Learn more from the following resources: - [@paper@The Practical Application of Indirect Prompt Injection Attacks](https://www.researchgate.net/publication/382692833_The_Practical_Application_of_Indirect_Prompt_Injection_Attacks_From_Academia_to_Industry) - [@article@How to Prevent Indirect Prompt Injection Attacks](https://www.cobalt.io/blog/how-to-prevent-indirect-prompt-injection-attacks) -- [@article@Jailbreaks via Indirect Injection (Practical AI Safety Newsletter)](https://newsletter.practicalai.safety/p/jailbreaks-via-indirect-injection) +- [@article@Indirect Prompt Injection Data Exfiltration](https://embracethered.com/blog/posts/2024/chatgpt-macos-app-persistent-data-exfiltration/) diff --git a/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md b/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md index 2911e6d42..171e262c0 100644 --- a/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md +++ b/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md @@ -4,5 +4,5 @@ Beyond formal certifications, recognition in the AI Red Teaming field comes from Learn more from the following resources: -- [@community@DEF CON - Wikipedia (Mentions Black Badge)](https://en.wikipedia.org/wiki/DEF_CON#Black_Badge) -- [@community@HackAPrompt (Learn Prompting)](https://learnprompting.org/hackaprompt) +- [@platform@HackAPrompt](https://hackaprompt.com) +- [@platform@RedTeam Arena](https://redarena.ai) diff --git a/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md b/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md index 279bb5648..6e6b0d49a 100644 --- a/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md +++ b/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md @@ -7,4 +7,3 @@ Learn more from the following resources: - [@article@A Guide to AI Red Teaming](https://hiddenlayer.com/innovation-hub/a-guide-to-ai-red-teaming/) - [@article@What is AI Red Teaming? (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming) - [@article@What is AI Red Teaming? The Complete Guide](https://mindgard.ai/blog/what-is-ai-red-teaming) -- [@podcast@Red Team Podcast - AI Red Teaming Insights & Defense Strategies](https://mindgard.ai/podcast/red-team) diff --git a/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md b/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md index 3aad0a565..b1e5ed972 100644 --- a/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md +++ b/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md @@ -5,5 +5,5 @@ Jailbreaking is a specific category of prompt hacking where the AI Red Teamer ai Learn more from the following resources: - [@article@InjectPrompt (David Willis-Owen)](https://injectprompt.com) -- [@guide@Prompt Hacking Guide - Learn Prompting](https://learnprompting.org/docs/category/prompt-hacking) +- [@guide@Jailbreaking Guide - Learn Prompting](https://learnprompting.org/docs/prompt_hacking/jailbreaking) - [@paper@Jailbroken: How Does LLM Safety Training Fail? (arXiv)](https://arxiv.org/abs/2307.02483) diff --git a/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md b/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md index 1d96fbd04..d4db85a36 100644 --- a/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md +++ b/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md @@ -4,7 +4,8 @@ AI Red Teamers need environments to practice attacking vulnerable systems safely Learn more from the following resources: +- [@platform@HackAPrompt Playground](https://learnprompting.org/hackaprompt-playground) +- [@platform@InjectPrompt Playground](https://playground.injectprompt.com/) - [@platform@Gandalf AI Prompt Injection Lab](https://gandalf.lakera.ai/) - [@platform@Hack The Box: Hacking Labs](https://www.hackthebox.com/hacker/hacking-labs) - [@platform@TryHackMe: Learn Cyber Security](https://tryhackme.com/) -- [@platform@VulnHub](https://www.vulnhub.com/) diff --git a/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md b/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md index ed210695e..2498e557c 100644 --- a/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md +++ b/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md @@ -5,5 +5,5 @@ LLMs are a primary target for AI Red Teaming. Understanding their architecture ( Learn more from the following resources: - [@article@What is an LLM (large language model)?](https://www.cloudflare.com/learning/ai/what-is-large-language-model/) -- [@guide@Introduction to LLMs - Learn Prompting](https://learnprompting.org/docs/intro_to_llms) +- [@guide@ChatGPT For Everyone](https://learnprompting.org/courses/chatgpt-for-everyone) - [@guide@What Are Large Language Models? A Beginner's Guide for 2025](https://www.kdnuggets.com/large-language-models-beginners-guide-2025) diff --git a/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md b/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md index b3d3fb613..b7102eade 100644 --- a/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md +++ b/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md @@ -4,7 +4,6 @@ AI Red Teamers perform model inversion tests to assess if an attacker can recons Learn more from the following resources: -- [@article@Model Inversion Attacks for ML (Medium)](https://medium.com/@ODSC/model-inversion-attacks-for-machine-learning-ff407a1b10d1) - [@article@Model inversion and membership inference: Understanding new AI security risks](https://www.hoganlovells.com/en/publications/model-inversion-and-membership-inference-understanding-new-ai-security-risks-and-mitigating-vulnerabilities) - [@paper@Extracting Training Data from LLMs (arXiv)](https://arxiv.org/abs/2012.07805) - [@paper@Model Inversion Attacks: A Survey of Approaches and Countermeasures](https://arxiv.org/html/2411.10023v1) diff --git a/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md b/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md index 4e9be1fff..9836d240d 100644 --- a/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md +++ b/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md @@ -5,5 +5,5 @@ This category covers attacks and tests targeting the AI model itself, beyond the Learn more from the following resources: - [@article@AI Security Risks Uncovered: What You Must Know in 2025](https://ttms.com/uk/ai-security-risks-explained-what-you-need-to-know-in-2025/) -- [@article@Attacking AI Models (Trail of Bits Blog Series)](https://blog.trailofbits.com/category/ai-security/) +- [@article@Weaknesses in Modern AI](https://insights.sei.cmu.edu/blog/weaknesses-and-vulnerabilities-in-modern-ai-why-security-and-safety-are-so-challenging/) - [@report@AI and ML Vulnerabilities (CNAS Report)](https://www.cnas.org/publications/reports/understanding-and-mitigating-ai-vulnerabilities) diff --git a/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md b/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md index 0edbaedaf..1317cd00d 100644 --- a/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md +++ b/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md @@ -6,5 +6,4 @@ Learn more from the following resources: - [@article@A Playbook for Securing AI Model Weights](https://www.rand.org/pubs/research_briefs/RBA2849-1.html) - [@article@How to Steal a Machine Learning Model (SkyCryptor)](https://skycryptor.com/blog/how-to-steal-a-machine-learning-model) -- [@paper@Defense Against Model Stealing (Microsoft Research)](https://www.microsoft.com/en-us/research/publication/defense-against-model-stealing-attacks/) - [@paper@On the Limitations of Model Stealing with Uncertainty Quantification Models](https://openreview.net/pdf?id=ONRFHoUzNk) diff --git a/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md b/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md index a733cc512..3a19fa087 100644 --- a/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md +++ b/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md @@ -4,8 +4,6 @@ For AI Red Teamers, prompt engineering is both a tool and a target. It's a tool Learn more from the following resources: -- [@article@Introduction to Prompt Engineering](https://www.datacamp.com/tutorial/introduction-prompt-engineering) - [@article@System Prompts - InjectPrompt](https://www.injectprompt.com/t/system-prompts) - [@course@Introduction to Prompt Engineering](https://learnprompting.org/courses/intro-to-prompt-engineering) -- [@guide@Prompt Engineering Guide](https://learnprompting.org/docs/prompt-engineering) - [@guide@The Ultimate Guide to Red Teaming LLMs and Adversarial Prompts (Kili Technology)](https://kili-technology.com/large-language-models-llms/red-teaming-llms-and-adversarial-prompts) diff --git a/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md b/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md index 0ea470864..457fb580b 100644 --- a/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md +++ b/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md @@ -5,5 +5,5 @@ Prompt hacking is a core technique for AI Red Teamers targeting LLMs. It involve Learn more from the following resources: - [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking) -- [@guide@Prompt Hacking Guide](https://learnprompting.org/docs/category/prompt-hacking) +- [@guide@Prompt Hacking Guide](https://learnprompting.org/docs/prompt_hacking/introduction) - [@paper@SoK: Prompt Hacking of LLMs (arXiv 2023)](https://arxiv.org/abs/2311.05544) diff --git a/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md b/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md index 5d64329d8..74ab32839 100644 --- a/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md +++ b/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md @@ -6,4 +6,4 @@ Learn more from the following resources: - [@article@Exploiting LLMs with Code Execution (GitHub Gist)](https://gist.github.com/coolaj86/6f4f7b30129b0251f61fa7baaa881516) - [@article@What is remote code execution?](https://www.cloudflare.com/learning/security/what-is-remote-code-execution/) -- [@video@DEFCON 31 - AI Village - Hacking an LLM embedded system (agent) - Johann Rehberger](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3D6u04C1N69ks?v=1FfYnF2GXVU) + diff --git a/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md b/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md index 3863f9c4d..f0600d498 100644 --- a/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md +++ b/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md @@ -4,6 +4,6 @@ A critical practice for AI Red Teamers is responsible disclosure: privately repo Learn more from the following resources: -- [@guide@Responsible Disclosure of AI Vulnerabilities](https://www.preamble.com/blog/responsible-disclosure-of-ai-vulnerabilities) -- [@guide@Vulnerability Disclosure Program](https://www.cisa.gov/resources-tools/programs/vulnerability-disclosure-program-vdp) +- [@guide@0din.ai Policy](https://0din.ai/policy) +- [@guide@Huntr Guidelines](https://huntr.com/guidelines) - [@policy@Google Vulnerability Reward Program (VRP)](https://bughunters.google.com/) diff --git a/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md b/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md index 4626b20ea..0452708a9 100644 --- a/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md +++ b/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md @@ -6,4 +6,4 @@ Learn more from the following resources: - [@article@The Complete Guide to Red Teaming: Process, Benefits & More](https://mindgard.ai/blog/red-teaming) - [@article@The Complete Red Teaming Checklist [PDF]: 5 Key Steps - Mindgard AI](https://mindgard.ai/blog/red-teaming-checklist) -- [@article@What is AI Red Teaming? - Learn Prompting](https://learnprompting.org/docs/category/ai-red-teaming) +- [@article@Red Teaming in Defending AI Systems](https://protectai.com/blog/expanding-role-red-teaming-defending-ai-systems) diff --git a/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md b/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md index 0da1b7cdf..710bfd5bd 100644 --- a/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md +++ b/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md @@ -6,5 +6,4 @@ Learn more from the following resources: - [@course@AI Red Teaming Courses - Learn Prompting](https://learnprompting.org/blog/ai-red-teaming-courses) - [@course@AI Security | Coursera](https://www.coursera.org/learn/ai-security) -- [@course@Exploring Adversarial Machine Learning](https://www.nvidia.com/en-us/training/instructor-led-workshops/exploring-adversarial-machine-learning/) - [@course@Free Online Cyber Security Courses with Certificates in 2025](https://www.eccouncil.org/cybersecurity-exchange/cyber-novice/free-cybersecurity-courses-beginners/) diff --git a/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md b/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md index 3e77f9ac5..e34af4c83 100644 --- a/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md +++ b/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md @@ -7,4 +7,3 @@ Learn more from the following resources: - [@article@Core Components of AI Red Team Exercises (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming) - [@guide@Threat Modeling Process](https://owasp.org/www-community/Threat_Modeling_Process) - [@guide@Threat Modeling](https://owasp.org/www-community/Threat_Modeling) -- [@video@How Microsoft Approaches AI Red Teaming (MS Build)](https://learn.microsoft.com/en-us/events/build-may-2023/breakout-responsible-ai-red-teaming/) diff --git a/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md b/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md index b3bfb1d56..95ed651b1 100644 --- a/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md +++ b/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md @@ -4,6 +4,6 @@ AI Red Teamers test if vulnerabilities in the AI system or its interfaces allow Learn more from the following resources: -- [@article@Unauthorized Data Access via LLMs (Security Boulevard)](https://securityboulevard.com/2023/11/unauthorized-data-access-via-llms/) +- [@article@Defending Model Files from Unauthorized Access](https://developer.nvidia.com/blog/defending-ai-model-files-from-unauthorized-access-with-canaries/) - [@guide@OWASP API Security Project](https://owasp.org/www-project-api-security/) -- [@paper@AI System Abuse Cases (Harvard Belfer Center)](https://www.belfercenter.org/publication/ai-system-abuse-cases) +- [@article@Detecting Unauthorized Usage](https://www.unr.edu/digital-learning/instructional-strategies/understanding-and-integrating-generative-ai-in-teaching/how-can-i-detect-unauthorized-ai-usage) diff --git a/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md b/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md index 0b6a689ff..0f3190703 100644 --- a/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md +++ b/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md @@ -5,5 +5,5 @@ White-box testing in AI Red Teaming grants the tester full access to the model's Learn more from the following resources: - [@article@Black-Box, Gray Box, and White-Box Penetration Testing](https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/black-box-gray-box-and-white-box-penetration-testing-importance-and-uses/) -- [@article@White-Box Adversarial Examples (OpenAI Blog)](https://openai.com/research/adversarial-robustness-toolbox) -- [@guide@LLM red teaming guide (open source)](https://www.promptfoo.dev/docs/red-team/) +- [@article@What is White Box Penetration Testing](https://www.getastra.com/blog/security-audit/white-box-penetration-testing/) +- [@article@The Art of White Box Pentesting](https://infosecwriteups.com/cracking-the-code-the-art-of-white-box-pentesting-de296bc22c67) diff --git a/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md b/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md index b72392378..3f124aedf 100644 --- a/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md +++ b/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md @@ -1,3 +1,8 @@ # Why Red Team AI Systems? -AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment. \ No newline at end of file +AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment. + +Learn more from the following resources: + +- [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking) +- [@article@Prompt Hacking Offensive Measures](https://learnprompting.org/docs/prompt_hacking/offensive_measures/introduction)