From c83309b7db38f766bdc36d5e35996c5fc3b4beec Mon Sep 17 00:00:00 2001 From: David Willis-Owen <100765093+davidwillisowen@users.noreply.github.com> Date: Fri, 25 Apr 2025 14:57:24 +0100 Subject: [PATCH] AI Red Teaming Roadmap - Initial Commit (#8553) * Initial commit * Editing formatting --- .../advanced-techniques@soC-kcem1ISbnCQMa6BIB.md | 10 +++++++++- .../adversarial-examples@xjlttOti-_laPRn8a2fVy.md | 11 ++++++++++- .../adversarial-training@2Y0ZO-etpv3XIvunDLu-W.md | 10 +++++++++- .../agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md | 10 +++++++++- ...ai-security-fundamentals@R9DQNc0AyAQ2HLpP4HOk6.md | 10 +++++++++- .../content/api-protection@Tszl26iNBnQBdBEWOueDA.md | 11 ++++++++++- .../content/authentication@J7gjlt2MBx7lOkOnfGvPF.md | 10 +++++++++- .../content/authentication@JQ3bR8odXJfd-1RCEf3-Q.md | 10 +++++++++- .../automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md | 10 +++++++++- .../benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md | 10 +++++++++- .../black-box-testing@0bApnJTt-Z2IUf0X3OCYf.md | 10 +++++++++- .../content/code-injection@vhBu5x8INTtqvx6vcYAhE.md | 10 +++++++++- .../content/conferences@LuKnmd9nSz9yLbTU_5Yp2.md | 11 ++++++++++- ...y-integrity-availability@WZkIHZkV2qDYbYF9KBBRi.md | 10 +++++++++- .../continuous-monitoring@7Km0mFpHguHYPs5UhHTsM.md | 10 +++++++++- .../continuous-testing@65Lo60JQS5YlvvQ6KevXt.md | 10 +++++++++- .../content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md | 11 ++++++++++- .../content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md | 11 ++++++++++- .../custom-testing-scripts@C1zO2xC0AqyV53p2YEPWg.md | 10 +++++++++- .../content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md | 11 ++++++++++- .../content/direct@5zHow4KZVpfhch5Aabeft.md | 10 +++++++++- .../emerging-threats@-G8v_CNa8wO_g-46_RFQo.md | 10 +++++++++- .../ethical-considerations@1gyuEV519LjN-KpROoVwv.md | 11 ++++++++++- .../content/forums@Smncq-n1OlnLAY27AFQOO.md | 11 ++++++++++- .../generative-models@3XJ-g0KvHP75U18mxCqgw.md | 10 +++++++++- .../grey-box-testing@ZVNAMCP68XKRXVxF2-hBc.md | 10 +++++++++- .../content/indirect@3_gJRtJSdm2iAfkwmcv0e.md | 10 +++++++++- .../industry-credentials@HHjsFR6wRDqUd66PMDE_7.md | 9 ++++++++- .../industry-standards@WePO66_4-gNcSdE00WKmw.md | 11 ++++++++++- .../infrastructure-security@nhUKKWyBH80nyKfGT8ErC.md | 10 +++++++++- ...insecure-deserialization@aKzai0A8J55-OBXTnQih1.md | 11 ++++++++++- .../content/introduction@HFJIYcI16OMyM77fAw9af.md | 11 ++++++++++- .../jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md | 10 +++++++++- .../lab-environments@MmwwRK4I9aRH_ha7duPqf.md | 11 ++++++++++- .../large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md | 10 +++++++++- .../llm-security-testing@xJYTRbPxMn0Xs5ea0Ygn6.md | 10 +++++++++- .../content/model-inversion@iE5PcswBHnu_EBFIacib0.md | 11 ++++++++++- .../model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md | 10 +++++++++- .../model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md | 11 ++++++++++- .../monitoring-solutions@59lkLcoqV4gq7f8Zm0X2p.md | 12 +++++++++++- .../content/neural-networks@RuKzVhd1nZphCrlW1wZGL.md | 10 +++++++++- .../prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md | 12 +++++++++++- .../content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md | 10 +++++++++- .../prompt-injection@XOrAPDRhBvde9R-znEipH.md | 12 +++++++++++- .../red-team-simulations@DpYsL0du37n40toH33fIr.md | 10 +++++++++- .../reinforcement-learning@Xqzc4mOKsVzwaUxLGjHya.md | 11 ++++++++++- .../remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md | 10 +++++++++- .../content/reporting-tools@BLnfNlA0C4yzy1dvifjwx.md | 10 +++++++++- .../content/research-groups@ZlR03pM-sqVFZNhD1gMSJ.md | 11 ++++++++++- .../research-opportunities@VmaIHVsCpq2um_0cA33V3.md | 10 +++++++++- .../responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md | 10 +++++++++- .../content/risk-management@MupRvk_8Io2Hn7yEvU663.md | 10 +++++++++- .../robust-model-design@6gEHMhh6BGJI-ZYN27YPW.md | 10 +++++++++- .../role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md | 10 +++++++++- .../safety-filter-bypasses@j7uLLpt8MkZ1rqM7UBPW4.md | 10 +++++++++- .../specialized-courses@s1xKK8HL5-QGZpcutiuvj.md | 11 ++++++++++- .../supervised-learning@NvOJIv36Utpm7_kOZyr79.md | 10 +++++++++- .../testing-platforms@c8n8FcYKDOgPLQvV9xF5J.md | 12 +++++++++++- .../content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md | 11 ++++++++++- .../unauthorized-access@DQeOavZCoXpF3k_qRDABs.md | 10 +++++++++- .../unsupervised-learning@ZC0yKsu-CJC-LZKKo2pLD.md | 9 ++++++++- ...vulnerability-assessment@887lc3tWCRH-sOHSxWgWJ.md | 10 +++++++++- .../white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md | 10 +++++++++- .../why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md | 11 ++++++++++- 64 files changed, 601 insertions(+), 64 deletions(-) diff --git a/src/data/roadmaps/ai-red-teaming/content/advanced-techniques@soC-kcem1ISbnCQMa6BIB.md b/src/data/roadmaps/ai-red-teaming/content/advanced-techniques@soC-kcem1ISbnCQMa6BIB.md index d9fa0be71..252a8dd47 100644 --- a/src/data/roadmaps/ai-red-teaming/content/advanced-techniques@soC-kcem1ISbnCQMa6BIB.md +++ b/src/data/roadmaps/ai-red-teaming/content/advanced-techniques@soC-kcem1ISbnCQMa6BIB.md @@ -1 +1,9 @@ -# Advanced Techniques \ No newline at end of file +# Advanced Techniques + +The practice of AI Red Teaming itself will evolve. Future techniques may involve using AI adversaries to automatically discover complex vulnerabilities, developing more sophisticated methods for testing AI alignment and safety properties, simulating multi-agent system failures, and creating novel metrics for evaluating AI robustness against unknown future attacks. + +Learn more from the following resources: + +- [@article@AI red-teaming in critical infrastructure: Boosting security and trust in AI systems - DNV](https://www.dnv.com/article/ai-red-teaming-for-critical-infrastructure-industries/) - Discusses applying red teaming to complex systems. +- [@article@Advanced Techniques in AI Red Teaming for LLMs | NeuralTrust](https://neuraltrust.ai/blog/advanced-techniques-in-ai-red-teaming) - Discusses techniques like adversarial ML and automated threat intelligence for red teaming. +- [@paper@Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning - arXiv](https://arxiv.org/html/2412.18693v1) - Research on using RL for more advanced automated red teaming. diff --git a/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md b/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md index c05804125..3bc2d9f03 100644 --- a/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md +++ b/src/data/roadmaps/ai-red-teaming/content/adversarial-examples@xjlttOti-_laPRn8a2fVy.md @@ -1 +1,10 @@ -# Adversarial Examples \ No newline at end of file +# Adversarial Examples + +A core AI Red Teaming activity involves generating adversarial examples – inputs slightly perturbed to cause misclassification or bypass safety filters – to test model robustness. Red teamers use various techniques (gradient-based, optimization-based, or black-box methods) to find inputs that exploit model weaknesses, informing developers on how to harden the model. + +Learn more from the following resources: + +- [@article@Adversarial Examples Explained (OpenAI Blog)](https://openai.com/research/adversarial-examples) - Introduction by OpenAI. +- [@guide@Adversarial Examples – Interpretable Machine Learning Book](https://christophm.github.io/interpretable-ml-book/adversarial.html) - In-depth explanation and examples. +- [@guide@Adversarial Testing for Generative AI | Machine Learning - Google for Developers](https://developers.google.com/machine-learning/guides/adv-testing) - Google's guide on adversarial testing workflows. +- [@video@How AI Can Be Tricked With Adversarial Attacks - Two Minute Papers](https://www.youtube.com/watch?v=J3X_JWQkvo8?v=MPcfoQBDY0w) - Short video demonstrating adversarial examples. diff --git a/src/data/roadmaps/ai-red-teaming/content/adversarial-training@2Y0ZO-etpv3XIvunDLu-W.md b/src/data/roadmaps/ai-red-teaming/content/adversarial-training@2Y0ZO-etpv3XIvunDLu-W.md index d90b3d4fa..0da097c3e 100644 --- a/src/data/roadmaps/ai-red-teaming/content/adversarial-training@2Y0ZO-etpv3XIvunDLu-W.md +++ b/src/data/roadmaps/ai-red-teaming/content/adversarial-training@2Y0ZO-etpv3XIvunDLu-W.md @@ -1 +1,9 @@ -# Adversarial Training \ No newline at end of file +# Adversarial Training + +AI Red Teamers evaluate the effectiveness of adversarial training as a defense. They test if models trained on adversarial examples are truly robust or if new, unseen adversarial attacks can still bypass the hardened defenses. This helps refine the adversarial training process itself. + +Learn more from the following resources: + +- [@article@Model Robustness: Building Reliable AI Models - Encord](https://encord.com/blog/model-robustness-machine-learning-strategies/) (Discusses adversarial robustness) +- [@guide@Adversarial Testing for Generative AI | Google for Developers](https://developers.google.com/machine-learning/guides/adv-testing) - Covers the concept as part of testing. +- [@paper@Detecting and Preventing Data Poisoning Attacks on AI Models - arXiv](https://arxiv.org/abs/2503.09302) (Mentions adversarial training as defense) diff --git a/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md b/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md index 9b7c40449..218e6e19b 100644 --- a/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md +++ b/src/data/roadmaps/ai-red-teaming/content/agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md @@ -1 +1,9 @@ -# Agentic AI Security \ No newline at end of file +# Agentic AI Security + +As AI agents capable of autonomous action become more common, AI Red Teamers must test their unique security implications. This involves assessing risks related to goal hijacking, unintended actions through tool use, exploitation of planning mechanisms, and ensuring agents operate safely within their designated boundaries. + +Learn more from the following resources: + +- [@article@AI Agents - Learn Prompting](https://learnprompting.org/docs/intermediate/ai_agents) (Background on agents) +- [@article@Reasoning models don't always say what they think - Anthropic](https://www.anthropic.com/research/reasoning-models-dont-always-say-what-they-think) (Discusses agent alignment challenges) +- [@course@Certified AI Red Team Operator – Autonomous Systems (CAIRTO-AS) from Tonex, Inc.](https://niccs.cisa.gov/education-training/catalog/tonex-inc/certified-ai-red-team-operator-autonomous-systems-cairto) - Certification focusing on autonomous AI security. diff --git a/src/data/roadmaps/ai-red-teaming/content/ai-security-fundamentals@R9DQNc0AyAQ2HLpP4HOk6.md b/src/data/roadmaps/ai-red-teaming/content/ai-security-fundamentals@R9DQNc0AyAQ2HLpP4HOk6.md index 364370c2f..e9f85b418 100644 --- a/src/data/roadmaps/ai-red-teaming/content/ai-security-fundamentals@R9DQNc0AyAQ2HLpP4HOk6.md +++ b/src/data/roadmaps/ai-red-teaming/content/ai-security-fundamentals@R9DQNc0AyAQ2HLpP4HOk6.md @@ -1 +1,9 @@ -# AI Security Fundamentals \ No newline at end of file +# AI Security Fundamentals + +This covers the foundational concepts essential for AI Red Teaming, bridging traditional cybersecurity with AI-specific threats. An AI Red Teamer must understand common vulnerabilities in ML models (like evasion or poisoning), security risks in the AI lifecycle (from data collection to deployment), and how AI capabilities can be misused. This knowledge forms the basis for designing effective tests against AI systems. + +Learn more from the following resources: + +- [@article@Building Trustworthy AI: Contending with Data Poisoning - Nisos](https://nisos.com/research/building-trustworthy-ai/) - Explores data poisoning threats in AI/ML. +- [@article@What Is Adversarial AI in Machine Learning? - Palo Alto Networks](https://www.paloaltonetworks.co.uk/cyberpedia/what-are-adversarial-attacks-on-AI-Machine-Learning) - Overview of adversarial attacks targeting AI/ML systems. +- [@course@AI Security | Coursera](https://www.coursera.org/learn/ai-security) - Foundational course covering AI risks, governance, security, and privacy. diff --git a/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md b/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md index 97999a018..3dbaf0c47 100644 --- a/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md +++ b/src/data/roadmaps/ai-red-teaming/content/api-protection@Tszl26iNBnQBdBEWOueDA.md @@ -1 +1,10 @@ -# API Protection \ No newline at end of file +# API Protection + +AI Red Teamers rigorously test the security of APIs providing access to AI models. They probe for OWASP API Top 10 vulnerabilities like broken authentication/authorization, injection flaws, security misconfigurations, and lack of rate limiting, specifically evaluating how these could lead to misuse or compromise of the AI model itself. + +Learn more from the following resources: + +- [@article@API Protection for AI Factories: The First Step to AI Security - F5](https://www.f5.com/company/blog/api-security-for-ai-factories) - Discusses the criticality of API security for AI applications. +- [@article@Securing APIs with AI for Advanced Threat Protection | Adeva](https://adevait.com/artificial-intelligence/securing-apis-with-ai) - Discusses using AI for API security, implies testing these is needed. +- [@article@Securing Machine Learning APIs (IBM)](https://developer.ibm.com/articles/se-securing-machine-learning-apis/) - Best practices for protecting ML APIs. +- [@guide@OWASP API Security Project (Top 10 2023)](https://owasp.org/www-project-api-security/) - Essential checklist for API vulnerabilities. diff --git a/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md b/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md index 66a783b30..a30269a09 100644 --- a/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md +++ b/src/data/roadmaps/ai-red-teaming/content/authentication@J7gjlt2MBx7lOkOnfGvPF.md @@ -1 +1,9 @@ -# Authentication \ No newline at end of file +# Authentication + +AI Red Teamers test the authentication mechanisms controlling access to AI systems and APIs. They attempt to bypass logins, steal or replay API keys/tokens, exploit weak password policies, or find flaws in MFA implementations to gain unauthorized access to the AI model or its management interfaces. + +Learn more from the following resources: + +- [@article@Red-Teaming in AI Testing: Stress Testing - Labelvisor](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/) - Mentions testing authentication mechanisms in AI red teaming. +- [@article@What is Authentication vs Authorization? - Auth0](https://auth0.com/intro-to-iam/authentication-vs-authorization) - Foundational explanation. +- [@video@How JWTs are used for Authentication (and how to bypass it) - LiveOverflow](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3Dexample_video_panel_url?v=3OpQi65s_ME) - Covers common web authentication bypass techniques relevant to APIs. diff --git a/src/data/roadmaps/ai-red-teaming/content/authentication@JQ3bR8odXJfd-1RCEf3-Q.md b/src/data/roadmaps/ai-red-teaming/content/authentication@JQ3bR8odXJfd-1RCEf3-Q.md index 66a783b30..6252bc255 100644 --- a/src/data/roadmaps/ai-red-teaming/content/authentication@JQ3bR8odXJfd-1RCEf3-Q.md +++ b/src/data/roadmaps/ai-red-teaming/content/authentication@JQ3bR8odXJfd-1RCEf3-Q.md @@ -1 +1,9 @@ -# Authentication \ No newline at end of file +# Authorization + +AI Red Teamers test authorization controls to ensure that authenticated users can only access the AI features and data permitted by their roles/permissions. They attempt privilege escalation, try to access other users' data via the AI, or manipulate the AI to perform actions beyond its authorized scope. + +Learn more from the following resources: + +- [@article@What is Authentication vs Authorization? - Auth0](https://auth0.com/intro-to-iam/authentication-vs-authorization) - Foundational explanation. +- [@guide@Identity and access management (IAM) fundamental concepts - Learn Microsoft](https://learn.microsoft.com/en-us/entra/fundamentals/identity-fundamental-concepts) - Explains roles and permissions. +- [@guide@OWASP API Security Project](https://owasp.org/www-project-api-security/) (Covers Broken Object Level/Function Level Authorization) diff --git a/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md b/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md index 5dd1cbdc4..f13eaa86c 100644 --- a/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md +++ b/src/data/roadmaps/ai-red-teaming/content/automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md @@ -1 +1,9 @@ -# Automated vs Manual \ No newline at end of file +# Automated vs Manual Testing + +AI Red Teaming typically employs a blend of automated tools (for large-scale scanning, fuzzing prompts, generating basic adversarial examples) and manual human testing (for creative jailbreaking, complex multi-stage attacks, evaluating nuanced safety issues like bias). Automation provides scale, while manual testing provides depth and creativity needed to find novel vulnerabilities. + +Learn more from the following resources: + +- [@article@Automation Testing vs. Manual Testing: Which is the better approach? - Opkey](https://www.opkey.com/blog/automation-testing-vs-manual-testing-which-is-better) - General comparison. +- [@article@Manual Testing vs Automated Testing: What's the Difference? - Leapwork](https://www.leapwork.com/blog/manual-vs-automated-testing) - General comparison. +- [@guide@LLM red teaming guide (open source) - Promptfoo](https://www.promptfoo.dev/docs/red-team/) - Discusses using both automated generation and human ingenuity for red teaming. diff --git a/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md b/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md index f71b33c88..a5907327f 100644 --- a/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md +++ b/src/data/roadmaps/ai-red-teaming/content/benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md @@ -1 +1,9 @@ -# Benchmark Datasets \ No newline at end of file +# Benchmark Datasets + +AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models. + +Learn more from the following resources: + +- [@dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset - GitHub](https://github.com/cysecbench/dataset) - Dataset of cybersecurity prompts for benchmarking LLMs. +- [@dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security](https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html) - Using CTF challenges to evaluate LLMs. +- [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity - arXiv](https://arxiv.org/abs/2412.20787) - Benchmarking LLMs on cybersecurity tasks. diff --git a/src/data/roadmaps/ai-red-teaming/content/black-box-testing@0bApnJTt-Z2IUf0X3OCYf.md b/src/data/roadmaps/ai-red-teaming/content/black-box-testing@0bApnJTt-Z2IUf0X3OCYf.md index 46dc68bec..d86adb311 100644 --- a/src/data/roadmaps/ai-red-teaming/content/black-box-testing@0bApnJTt-Z2IUf0X3OCYf.md +++ b/src/data/roadmaps/ai-red-teaming/content/black-box-testing@0bApnJTt-Z2IUf0X3OCYf.md @@ -1 +1,9 @@ -# Black Box Testing \ No newline at end of file +# Black Box Testing + +In AI Red Teaming, black-box testing involves probing the AI system with inputs and observing outputs without any knowledge of the model's architecture, training data, or internal logic. This simulates an external attacker and is crucial for finding vulnerabilities exploitable through publicly accessible interfaces, such as prompt injection or safety bypasses discoverable via API interaction. + +Learn more from the following resources: + +- [@article@Black-Box, Gray Box, and White-Box Penetration Testing - EC-Council](https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/black-box-gray-box-and-white-box-penetration-testing-importance-and-uses/) - Comparison of testing types. +- [@article@What is Black Box Testing | Techniques & Examples - Imperva](https://www.imperva.com/learn/application-security/black-box-testing/) - General explanation. +- [@guide@LLM red teaming guide (open source) - Promptfoo](https://www.promptfoo.dev/docs/red-team/) - Contrasts black-box and white-box approaches for LLM red teaming. diff --git a/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md b/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md index 8127d97ea..6ac985369 100644 --- a/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md +++ b/src/data/roadmaps/ai-red-teaming/content/code-injection@vhBu5x8INTtqvx6vcYAhE.md @@ -1 +1,9 @@ -# Code Injection \ No newline at end of file +# Code Injection + +AI Red Teamers test for code injection vulnerabilities specifically in the context of AI applications. This involves probing whether user input, potentially manipulated via prompts, can lead to the execution of unintended code (e.g., SQL, OS commands, or script execution via generated code) within the application layer or connected systems, using the AI as a potential vector. + +Learn more from the following resources: + +- [@article@Code Injection in LLM Applications - NeuralTrust](https://neuraltrust.ai/blog/code-injection-in-llms) - Specifically discusses code injection risks involving LLMs. +- [@docs@Secure Plugin Sandboxing (OpenAI Plugins)](https://platform.openai.com/docs/plugins/production/security-requirements) - Context on preventing code injection via AI plugins. +- [@guide@Code Injection - OWASP Foundation](https://owasp.org/www-community/attacks/Code_Injection) - Foundational knowledge on code injection attacks. diff --git a/src/data/roadmaps/ai-red-teaming/content/conferences@LuKnmd9nSz9yLbTU_5Yp2.md b/src/data/roadmaps/ai-red-teaming/content/conferences@LuKnmd9nSz9yLbTU_5Yp2.md index 142b54db9..78914c0e6 100644 --- a/src/data/roadmaps/ai-red-teaming/content/conferences@LuKnmd9nSz9yLbTU_5Yp2.md +++ b/src/data/roadmaps/ai-red-teaming/content/conferences@LuKnmd9nSz9yLbTU_5Yp2.md @@ -1 +1,10 @@ -# Conferences \ No newline at end of file +# Conferences + +Attending major cybersecurity conferences (DEF CON, Black Hat, RSA) and increasingly specialized AI Safety/Security conferences allows AI Red Teamers to learn about cutting-edge research, network with peers, and discover new tools and attack/defense techniques. + +Learn more from the following resources: + +- [@conference@Black Hat Events](https://www.blackhat.com/) - Professional security conference with AI tracks. +- [@conference@DEF CON Hacking Conference](https://defcon.org/) - Major hacking conference with relevant villages/talks. +- [@conference@Global Conference on AI, Security and Ethics 2025 - UNIDIR](https://unidir.org/event/global-conference-on-ai-security-and-ethics-2025/) - Example of a specialized AI security/ethics conference. +- [@conference@RSA Conference](https://www.rsaconference.com/) - Large industry conference covering AI security. diff --git a/src/data/roadmaps/ai-red-teaming/content/confidentiality-integrity-availability@WZkIHZkV2qDYbYF9KBBRi.md b/src/data/roadmaps/ai-red-teaming/content/confidentiality-integrity-availability@WZkIHZkV2qDYbYF9KBBRi.md index 7f87208ca..3ba4971e3 100644 --- a/src/data/roadmaps/ai-red-teaming/content/confidentiality-integrity-availability@WZkIHZkV2qDYbYF9KBBRi.md +++ b/src/data/roadmaps/ai-red-teaming/content/confidentiality-integrity-availability@WZkIHZkV2qDYbYF9KBBRi.md @@ -1 +1,9 @@ -# Confidentiality, Integrity, Availability \ No newline at end of file +# Confidentiality, Integrity, Availability + +The CIA Triad is directly applicable in AI Red Teaming. Confidentiality tests focus on preventing leakage of training data or proprietary model details. Integrity tests probe for susceptibility to data poisoning or model manipulation. Availability tests assess resilience against denial-of-service attacks targeting the AI model or its supporting infrastructure. + +Learn more from the following resources: + +- [@article@Confidentiality, Integrity, Availability: Key Examples - DataSunrise](https://www.datasunrise.com/knowledge-center/confidentiality-integrity-availability-examples/) - Explains CIA triad with examples, mentioning AI/ML relevance. +- [@article@The CIA Triad: Confidentiality, Integrity, Availability - Veeam](https://www.veeam.com/blog/cybersecurity-cia-triad-explained.html) - Breakdown of the three principles and how they apply. +- [@article@What's The CIA Triad? Confidentiality, Integrity, & Availability, Explained | Splunk](https://www.splunk.com/en_us/blog/learn/cia-triad-confidentiality-integrity-availability.html) - Detailed explanation of the triad, mentioning modern updates and AI context. diff --git a/src/data/roadmaps/ai-red-teaming/content/continuous-monitoring@7Km0mFpHguHYPs5UhHTsM.md b/src/data/roadmaps/ai-red-teaming/content/continuous-monitoring@7Km0mFpHguHYPs5UhHTsM.md index c8ee07c2f..9af55fdaa 100644 --- a/src/data/roadmaps/ai-red-teaming/content/continuous-monitoring@7Km0mFpHguHYPs5UhHTsM.md +++ b/src/data/roadmaps/ai-red-teaming/content/continuous-monitoring@7Km0mFpHguHYPs5UhHTsM.md @@ -1 +1,9 @@ -# Continuous Monitoring \ No newline at end of file +# Continuous Monitoring + +AI Red Teamers assess the effectiveness of continuous monitoring systems by attempting attacks and observing if detection mechanisms trigger appropriate alerts and responses. They test if monitoring covers AI-specific anomalies (like sudden shifts in output toxicity or unexpected resource consumption by the model) in addition to standard infrastructure monitoring. + +Learn more from the following resources: + +- [@article@Cyber Security Monitoring: 5 Key Components - BitSight Technologies](https://www.bitsight.com/blog/5-things-to-consider-building-continuous-security-monitoring-strategy) - Discusses key components of a monitoring strategy. +- [@article@Cyber Security Monitoring: Definition and Best Practices - SentinelOne](https://www.sentinelone.com/cybersecurity-101/cybersecurity/cyber-security-monitoring/) - Overview of monitoring types and techniques. +- [@article@Cybersecurity Monitoring: Definition, Tools & Best Practices - NordLayer](https://nordlayer.com/blog/cybersecurity-monitoring/) - General best practices adaptable to AI context. diff --git a/src/data/roadmaps/ai-red-teaming/content/continuous-testing@65Lo60JQS5YlvvQ6KevXt.md b/src/data/roadmaps/ai-red-teaming/content/continuous-testing@65Lo60JQS5YlvvQ6KevXt.md index 5a6bdbd23..117921143 100644 --- a/src/data/roadmaps/ai-red-teaming/content/continuous-testing@65Lo60JQS5YlvvQ6KevXt.md +++ b/src/data/roadmaps/ai-red-teaming/content/continuous-testing@65Lo60JQS5YlvvQ6KevXt.md @@ -1 +1,9 @@ -# Continuous Testing \ No newline at end of file +# Continuous Testing + +Applying continuous testing principles to AI security involves integrating automated red teaming checks into the development pipeline (CI/CD). This allows for regular, automated assessment of model safety, robustness, and alignment as the model or application code evolves, catching regressions or new vulnerabilities early. Tools facilitating Continuous Automated Red Teaming (CART) are emerging. + +Learn more from the following resources: + +- [@article@Continuous Automated Red Teaming (CART) - FireCompass](https://www.firecompass.com/continuous-automated-red-teaming/) - Explains the concept of CART. +- [@article@What is Continuous Penetration Testing? Process and Benefits - Qualysec Technologies](https://qualysec.com/continuous-penetration-testing/) - Related concept applied to pen testing. +- [@guide@What is Continuous Testing and How Does it Work? - Black Duck](https://www.blackduck.com/glossary/what-is-continuous-testing.html) - General definition and benefits. diff --git a/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md b/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md index 78a297b0c..d99d46307 100644 --- a/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md +++ b/src/data/roadmaps/ai-red-teaming/content/countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md @@ -1 +1,10 @@ -# Countermeasures \ No newline at end of file +# Countermeasures + +AI Red Teamers must also understand and test defenses against prompt hacking. This includes evaluating the effectiveness of input sanitization, output filtering, instruction demarcation (e.g., XML tagging), contextual awareness checks, model fine-tuning for resistance, and applying the principle of least privilege to LLM capabilities and tool access. + +Learn more from the following resources: + +- [@article@Mitigating Prompt Injection Attacks (NCC Group Research)](https://research.nccgroup.com/2023/12/01/mitigating-prompt-injection-attacks/) - Discusses various mitigation strategies and their effectiveness. +- [@article@Prompt Injection & the Rise of Prompt Attacks: All You Need to Know | Lakera](https://www.lakera.ai/blog/guide-to-prompt-injection) - Includes discussion on best practices for prevention. +- [@article@Prompt Injection: Impact, How It Works & 4 Defense Measures - Tigera](https://www.tigera.io/learn/guides/llm-security/prompt-injection/) - Covers defensive measures. +- [@guide@OpenAI Best Practices for Prompt Security](https://platform.openai.com/docs/guides/prompt-engineering/strategy-write-clear-instructions) - OpenAI’s recommendations to prevent prompt manipulation. diff --git a/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md b/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md index f46349f5e..9b4ff972a 100644 --- a/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md +++ b/src/data/roadmaps/ai-red-teaming/content/ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md @@ -1 +1,10 @@ -# CTF Challenges \ No newline at end of file +# CTF Challenges + +Capture The Flag competitions increasingly include AI/ML security challenges. Participating in CTFs (tracked on CTFtime) or platforms like picoCTF helps AI Red Teamers hone skills in reverse engineering, web exploitation, and cryptography applied to AI systems, including specialized AI safety CTFs. + +Learn more from the following resources: + +- [@article@Capture the flag (cybersecurity) - Wikipedia](https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurity)) - Overview of CTFs. +- [@article@Progress from our Frontier Red Team - Anthropic](https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team) - Mentions using CTFs (Cybench) for evaluating AI model security. +- [@platform@CTFtime.org](https://ctftime.org/) - Global CTF event tracker. +- [@platform@picoCTF](https://picoctf.org/) - Beginner-friendly CTF platform. diff --git a/src/data/roadmaps/ai-red-teaming/content/custom-testing-scripts@C1zO2xC0AqyV53p2YEPWg.md b/src/data/roadmaps/ai-red-teaming/content/custom-testing-scripts@C1zO2xC0AqyV53p2YEPWg.md index 8bab7b373..cde375b58 100644 --- a/src/data/roadmaps/ai-red-teaming/content/custom-testing-scripts@C1zO2xC0AqyV53p2YEPWg.md +++ b/src/data/roadmaps/ai-red-teaming/content/custom-testing-scripts@C1zO2xC0AqyV53p2YEPWg.md @@ -1 +1,9 @@ -# Custom Testing Scripts \ No newline at end of file +# Custom Testing Scripts + +AI Red Teamers frequently write custom scripts (often in Python) to automate bespoke attacks, interact with specific AI APIs, generate complex prompt sequences, parse model outputs at scale, or implement novel exploit techniques not found in standard tools. Proficiency in scripting is essential for advanced AI red teaming. + +Learn more from the following resources: + +- [@guide@Python for Cybersecurity: Key Use Cases and Tools - Panther](https://panther.com/blog/python-for-cybersecurity-key-use-cases-and-tools) - Discusses Python's role in automation, pen testing, etc. +- [@guide@Python for cybersecurity: use cases, tools and best practices - SoftTeco](https://softteco.com/blog/python-for-cybersecurity) - Covers using Python for various security tasks. +- [@tool@Scapy](https://scapy.net/) - Powerful Python library for packet manipulation. diff --git a/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md b/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md index 1121c2850..5a5b70078 100644 --- a/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md +++ b/src/data/roadmaps/ai-red-teaming/content/data-poisoning@nD0_64ELEeJSN-0aZiR7i.md @@ -1 +1,10 @@ -# Data Poisoning \ No newline at end of file +# Data Poisoning + +AI Red Teamers simulate data poisoning attacks by evaluating how introducing manipulated or mislabeled data into potential training or fine-tuning datasets could compromise the model. They assess the impact on model accuracy, fairness, or the potential creation of exploitable backdoors, informing defenses around data validation and provenance. + +Learn more from the following resources: + +- [@article@AI Poisoning - Is It Really A Threat? - AIBlade](https://www.aiblade.net/p/ai-poisoning-is-it-really-a-threat) - Detailed exploration of data poisoning attacks and impacts. +- [@article@Data Poisoning Attacks in ML (Towards Data Science)](https://towardsdatascience.com/data-poisoning-attacks-in-machine-learning-542169587b7f) - Overview of techniques. +- [@paper@Detecting and Preventing Data Poisoning Attacks on AI Models - arXiv](https://arxiv.org/abs/2503.09302) - Research on detection and prevention techniques. +- [@paper@Poisoning Web-Scale Training Data (arXiv)](https://arxiv.org/abs/2310.12818) - Analysis of poisoning risks in large datasets used for LLMs. diff --git a/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md b/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md index ed0419d8a..ebf1911eb 100644 --- a/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md +++ b/src/data/roadmaps/ai-red-teaming/content/direct@5zHow4KZVpfhch5Aabeft.md @@ -1 +1,9 @@ -# Direct \ No newline at end of file +# Direct Injection + +Direct injection attacks occur when malicious instructions are inserted directly into the prompt input field by the user interacting with the LLM. AI Red Teamers use this technique to assess if basic instructions like "Ignore previous prompt" can immediately compromise the model's safety or intended function, testing the robustness of the system prompt's influence. + +Learn more from the following resources: + +- [@article@Prompt Injection & the Rise of Prompt Attacks: All You Need to Know | Lakera](https://www.lakera.ai/blog/guide-to-prompt-injection) - Differentiates attack types. +- [@article@Prompt Injection Cheat Sheet (FlowGPT)](https://flowgpt.com/p/prompt-injection-cheat-sheet) - Collection of prompt injection examples often used in direct attacks. +- [@report@OpenAI GPT-4 System Card](https://openai.com/research/gpt-4-system-card) - Sections discuss how direct prompt attacks were tested during GPT-4 development. diff --git a/src/data/roadmaps/ai-red-teaming/content/emerging-threats@-G8v_CNa8wO_g-46_RFQo.md b/src/data/roadmaps/ai-red-teaming/content/emerging-threats@-G8v_CNa8wO_g-46_RFQo.md index 9502b7803..79b986295 100644 --- a/src/data/roadmaps/ai-red-teaming/content/emerging-threats@-G8v_CNa8wO_g-46_RFQo.md +++ b/src/data/roadmaps/ai-red-teaming/content/emerging-threats@-G8v_CNa8wO_g-46_RFQo.md @@ -1 +1,9 @@ -# Emerging Threats \ No newline at end of file +# Emerging Threats + +AI Red Teamers must stay informed about potential future threats enabled by more advanced AI, such as highly autonomous attack agents, AI-generated malware that evades detection, sophisticated deepfakes for social engineering, or large-scale exploitation of interconnected AI systems. Anticipating these helps shape current testing priorities. + +Learn more from the following resources: + +- [@article@AI Security Risks Uncovered: What You Must Know in 2025 - TTMS](https://ttms.com/uk/ai-security-risks-explained-what-you-need-to-know-in-2025/) - Discusses future AI-driven cyberattacks. +- [@article@Why Artificial Intelligence is the Future of Cybersecurity - Darktrace](https://www.darktrace.com/blog/why-artificial-intelligence-is-the-future-of-cybersecurity) - Covers AI misuse and the future threat landscape. +- [@report@AI Index 2024 - Stanford University](https://aiindex.stanford.edu/report/) - Annual report tracking AI capabilities and societal implications, including risks. diff --git a/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md b/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md index 6cd4aeeb8..1f239c486 100644 --- a/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md +++ b/src/data/roadmaps/ai-red-teaming/content/ethical-considerations@1gyuEV519LjN-KpROoVwv.md @@ -1 +1,10 @@ -# Ethical Considerations \ No newline at end of file +# Ethical Considerations + +Ethical conduct is crucial for AI Red Teamers. While simulating attacks, they must operate within strict legal and ethical boundaries defined by rules of engagement, focusing on improving safety without causing real harm or enabling misuse. This includes respecting data privacy, obtaining consent where necessary, responsibly disclosing vulnerabilities, and carefully considering the potential negative impacts of both the testing process and the AI capabilities being tested. The goal is discovery for defense, not exploitation. + +Learn more from the following resources: + +- [@article@Red-Teaming in AI Testing: Stress Testing - Labelvisor](https://www.labelvisor.com/red-teaming-abstract-competitive-testing-data-selection/) - Mentions balancing attack simulation with ethical constraints. +- [@article@Responsible AI assessment - Responsible AI | Coursera](https://www.coursera.org/learn/ai-security) (Module within AI Security course) +- [@guide@Responsible AI Principles (Microsoft)](https://www.microsoft.com/en-us/ai/responsible-ai) - Example of corporate responsible AI guidelines influencing ethical testing. +- [@video@Questions to Guide AI Red-Teaming (CMU SEI)](https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=928382) - Key questions and ethical guidelines for AI red teaming activities (video talk). diff --git a/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md b/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md index 34b48047f..d15e412af 100644 --- a/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md +++ b/src/data/roadmaps/ai-red-teaming/content/forums@Smncq-n1OlnLAY27AFQOO.md @@ -1 +1,10 @@ -# Forums \ No newline at end of file +# Forums + +Engaging in online forums, mailing lists, Discord servers, or subreddits dedicated to AI security, adversarial ML, prompt engineering, or general cybersecurity helps AI Red Teamers exchange knowledge, ask questions, learn about new tools/techniques, and find collaboration opportunities. + +Learn more from the following resources: + +- [@community@List of Cybersecurity Discord Servers - DFIR Training](https://www.dfir.training/dfir-groups/discord?category[0]=17&category_children=1) - List including relevant servers. +- [@community@Reddit - r/MachineLearning](https://www.reddit.com/r/MachineLearning/) - ML specific discussion. +- [@community@Reddit - r/artificial](https://www.reddit.com/r/artificial/) - General AI discussion. +- [@community@Reddit - r/cybersecurity](https://www.reddit.com/r/cybersecurity/) - General cybersecurity forum. diff --git a/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md b/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md index 71b3bf5c6..4a3225c4c 100644 --- a/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md +++ b/src/data/roadmaps/ai-red-teaming/content/generative-models@3XJ-g0KvHP75U18mxCqgw.md @@ -1 +1,9 @@ -# Generative Models \ No newline at end of file +# Generative Models + +AI Red Teamers focus heavily on generative models (like GANs and LLMs) due to their widespread use and unique risks. Understanding how they generate content is key to testing for issues like generating harmful/biased outputs, deepfakes, prompt injection vulnerabilities, or leaking sensitive information from their vast training data. + +Learn more from the following resources: + +- [@article@An Introduction to Generative Models | MongoDB](https://www.mongodb.com/resources/basics/artificial-intelligence/generative-models) - Explains basics and contrasts with discriminative models. +- [@course@Generative AI for Beginners - Microsoft Open Source](https://microsoft.github.io/generative-ai-for-beginners/) - Free course covering fundamentals. +- [@guide@Generative AI beginner's guide | Generative AI on Vertex AI - Google Cloud](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview) - Overview covering generative AI concepts and Google's platform context. diff --git a/src/data/roadmaps/ai-red-teaming/content/grey-box-testing@ZVNAMCP68XKRXVxF2-hBc.md b/src/data/roadmaps/ai-red-teaming/content/grey-box-testing@ZVNAMCP68XKRXVxF2-hBc.md index 665f0f3ea..08e5696c9 100644 --- a/src/data/roadmaps/ai-red-teaming/content/grey-box-testing@ZVNAMCP68XKRXVxF2-hBc.md +++ b/src/data/roadmaps/ai-red-teaming/content/grey-box-testing@ZVNAMCP68XKRXVxF2-hBc.md @@ -1 +1,9 @@ -# Grey Box Testing \ No newline at end of file +# Grey Box Testing + +Grey-box AI Red Teaming involves testing with partial knowledge of the system, such as knowing the model type (e.g., GPT-4), having access to some documentation, or understanding the general system architecture but not having full model weights or source code. This allows for more targeted testing than black-box while still simulating realistic external attacker scenarios where some information might be gleaned. + +Learn more from the following resources: + +- [@article@AI Transparency: Connecting AI Red Teaming and Compliance | SplxAI Blog](https://splx.ai/blog/ai-transparency-connecting-ai-red-teaming-and-compliance) - Discusses the value of moving towards gray-box testing in AI. +- [@article@Black-Box, Gray Box, and White-Box Penetration Testing - EC-Council](https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/black-box-gray-box-and-white-box-penetration-testing-importance-and-uses/) - Comparison of testing types. +- [@article@Understanding Black Box, White Box, and Grey Box Testing - Frugal Testing](https://www.frugaltesting.com/blog/understanding-black-box-white-box-and-grey-box-testing-in-software-testing) - General definitions. diff --git a/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md b/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md index 6e9982d2e..58aaddcbb 100644 --- a/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md +++ b/src/data/roadmaps/ai-red-teaming/content/indirect@3_gJRtJSdm2iAfkwmcv0e.md @@ -1 +1,9 @@ -# Indirect \ No newline at end of file +# Indirect Injection + +Indirect injection involves embedding malicious prompts within external data sources that the LLM processes, such as websites, documents, or emails. AI Red Teamers test this by poisoning data sources the AI might interact with (e.g., adding hidden instructions to a webpage summarized by the AI) to see if the AI executes unintended commands or leaks data when processing that source. + +Learn more from the following resources: + +- [@paper@The Practical Application of Indirect Prompt Injection Attacks - David Willis-Owen](https://www.researchgate.net/publication/382692833_The_Practical_Application_of_Indirect_Prompt_Injection_Attacks_From_Academia_to_Industry) - Discusses a standard methodology to test for indirect injection attacks. +- [@article@How to Prevent Indirect Prompt Injection Attacks - Cobalt](https://www.cobalt.io/blog/how-to-prevent-indirect-prompt-injection-attacks) - Explains indirect injection via external sources and mitigation. +- [@article@Jailbreaks via Indirect Injection (Practical AI Safety Newsletter)](https://newsletter.practicalai.safety/p/jailbreaks-via-indirect-injection) - Examples of indirect prompt injection impacting LLM agents. diff --git a/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md b/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md index 94e989f67..41027ca26 100644 --- a/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md +++ b/src/data/roadmaps/ai-red-teaming/content/industry-credentials@HHjsFR6wRDqUd66PMDE_7.md @@ -1 +1,8 @@ -# Industry Credentials \ No newline at end of file +# Industry Credentials + +Beyond formal certifications, recognition in the AI Red Teaming field comes from practical achievements like finding significant vulnerabilities (responsible disclosure), winning AI-focused CTFs or hackathons (like HackAPrompt), contributing to AI security research, or building open-source testing tools. + +Learn more from the following resources: + +- [@community@DEF CON - Wikipedia (Mentions Black Badge)](https://en.wikipedia.org/wiki/DEF_CON#Black_Badge) - Example of a high-prestige credential from CTFs. +- [@community@HackAPrompt (Learn Prompting)](https://learnprompting.org/hackaprompt) - Example of a major AI Red Teaming competition. diff --git a/src/data/roadmaps/ai-red-teaming/content/industry-standards@WePO66_4-gNcSdE00WKmw.md b/src/data/roadmaps/ai-red-teaming/content/industry-standards@WePO66_4-gNcSdE00WKmw.md index 318ae5c74..caa53a823 100644 --- a/src/data/roadmaps/ai-red-teaming/content/industry-standards@WePO66_4-gNcSdE00WKmw.md +++ b/src/data/roadmaps/ai-red-teaming/content/industry-standards@WePO66_4-gNcSdE00WKmw.md @@ -1 +1,10 @@ -# Industry Standards \ No newline at end of file +# Industry Standards + +As AI matures, AI Red Teamers will increasingly need to understand and test against emerging industry standards and regulations for AI safety, security, and risk management, such as the NIST AI RMF, ISO/IEC 42001, and sector-specific guidelines, ensuring AI systems meet compliance requirements. + +Learn more from the following resources: + +- [@article@ISO 42001: The New Compliance Standard for AI Management Systems - Bright Defense](https://www.brightdefense.com/resources/iso-42001-compliance/) - Overview of ISO 42001 requirements. +- [@article@ISO 42001: What it is & why it matters for AI management - IT Governance](https://www.itgovernance.co.uk/iso-42001) - Explanation of the standard. +- [@framework@NIST AI Risk Management Framework (AI RMF)](https://www.nist.gov/itl/ai-risk-management-framework) - Voluntary framework gaining wide adoption. +- [@standard@ISO/IEC 42001: Information technology — Artificial intelligence — Management system](https://www.iso.org/standard/81230.html) - International standard for AI management systems. diff --git a/src/data/roadmaps/ai-red-teaming/content/infrastructure-security@nhUKKWyBH80nyKfGT8ErC.md b/src/data/roadmaps/ai-red-teaming/content/infrastructure-security@nhUKKWyBH80nyKfGT8ErC.md index 3fcdd05d1..454ffd126 100644 --- a/src/data/roadmaps/ai-red-teaming/content/infrastructure-security@nhUKKWyBH80nyKfGT8ErC.md +++ b/src/data/roadmaps/ai-red-teaming/content/infrastructure-security@nhUKKWyBH80nyKfGT8ErC.md @@ -1 +1,9 @@ -# Infrastructure Security \ No newline at end of file +# Infrastructure Security + +AI Red Teamers assess the security posture of the infrastructure hosting AI models (cloud environments, servers, containers). They look for misconfigurations, unpatched systems, insecure network setups, or inadequate access controls that could allow compromise of the AI system or leakage of sensitive data/models. + +Learn more from the following resources: + +- [@article@AI Infrastructure Attacks (VentureBeat)](https://venturebeat.com/ai/understanding-ai-infrastructure-attacks/) - Discussion of attacks targeting AI infrastructure. +- [@guide@Network Infrastructure Security - Best Practices and Strategies - DataGuard](https://www.dataguard.com/blog/network-infrastructure-security-best-practices-and-strategies/) - General infra security practices applicable here. +- [@guide@Secure Deployment of ML Systems (NIST)](https://csrc.nist.gov/publications/detail/sp/800-218/final) - Guidelines including infrastructure security for ML. diff --git a/src/data/roadmaps/ai-red-teaming/content/insecure-deserialization@aKzai0A8J55-OBXTnQih1.md b/src/data/roadmaps/ai-red-teaming/content/insecure-deserialization@aKzai0A8J55-OBXTnQih1.md index ccabbd585..b6dc1981c 100644 --- a/src/data/roadmaps/ai-red-teaming/content/insecure-deserialization@aKzai0A8J55-OBXTnQih1.md +++ b/src/data/roadmaps/ai-red-teaming/content/insecure-deserialization@aKzai0A8J55-OBXTnQih1.md @@ -1 +1,10 @@ -# Insecure Deserialization \ No newline at end of file +# Insecure Deserialization + +AI Red Teamers investigate if serialized objects used by the AI system (e.g., for saving model states, configurations, or transmitting data) can be manipulated by an attacker. They test if crafting malicious serialized objects could lead to remote code execution or other exploits when the application deserializes the untrusted data. + +Learn more from the following resources: + +- [@article@Lightboard Lessons: OWASP Top 10 - Insecure Deserialization - DevCentral](https://community.f5.com/kb/technicalarticles/lightboard-lessons-owasp-top-10---insecure-deserialization/281509) - Video explanation. +- [@article@How Hugging Face Was Ethically Hacked](https://www.aiblade.net/p/how-hugging-face-was-ethically-hacked) - Hugging Face deserialization case study. +- [@article@OWASP TOP 10: Insecure Deserialization - Detectify Blog](https://blog.detectify.com/best-practices/owasp-top-10-insecure-deserialization/) - Overview within OWASP Top 10 context. +- [@guide@Insecure Deserialization - OWASP Foundation](https://owasp.org/www-community/vulnerabilities/Insecure_Deserialization) - Core explanation of the vulnerability. diff --git a/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md b/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md index f6ecaa676..7f3367c78 100644 --- a/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md +++ b/src/data/roadmaps/ai-red-teaming/content/introduction@HFJIYcI16OMyM77fAw9af.md @@ -1 +1,10 @@ -# Introduction \ No newline at end of file +# Introduction + +AI Red Teaming is the practice of simulating adversarial attacks against AI systems to proactively identify vulnerabilities, potential misuse scenarios, and failure modes before malicious actors do. Distinct from traditional cybersecurity red teaming, it focuses on the unique attack surfaces of AI models, such as prompt manipulation, data poisoning, model extraction, and evasion techniques. The primary goal for an AI Red Teamer is to test the robustness, safety, alignment, and fairness of AI systems, particularly complex ones like LLMs, by adopting an attacker's mindset to uncover hidden flaws and provide actionable feedback for improvement. + +Learn more from the following resources: + +- [@article@A Guide to AI Red Teaming - HiddenLayer](https://hiddenlayer.com/innovation-hub/a-guide-to-ai-red-teaming/) - Discusses AI red teaming concepts and contrasts with traditional methods. +- [@article@What is AI Red Teaming? (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming) - Overview of AI red teaming, its history, and key challenges. +- [@article@What is AI Red Teaming? The Complete Guide - Mindgard](https://mindgard.ai/blog/what-is-ai-red-teaming) - Guide covering AI red teaming processes, use cases, and benefits. +- [@podcast@Red Team Podcast | AI Red Teaming Insights & Defense Strategies - Mindgard](https://mindgard.ai/podcast/red-team) - Podcast series covering AI red teaming trends and strategies. diff --git a/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md b/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md index 6cd93ca54..a99eec9b1 100644 --- a/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md +++ b/src/data/roadmaps/ai-red-teaming/content/jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md @@ -1 +1,9 @@ -# Jailbreak Techniques \ No newline at end of file +# Jailbreak Techniques + +Jailbreaking is a specific category of prompt hacking where the AI Red Teamer aims to bypass the LLM's safety and alignment training. They use techniques like creating fictional scenarios, asking the model to simulate an unrestricted AI, or using complex instructions to trick the model into generating content that violates its own policies (e.g., generating harmful code, hate speech, or illegal instructions). + +Learn more from the following resources: + +- [@article@InjectPrompt (David Willis-Owen)](https://injectprompt.com) - Discusses jailbreaks for several LLMs +- [@guide@Prompt Hacking Guide - Learn Prompting](https://learnprompting.org/docs/category/prompt-hacking) - Covers jailbreaking strategies. +- [@paper@Jailbroken: How Does LLM Safety Training Fail? (arXiv)](https://arxiv.org/abs/2307.02483) - Research analyzing jailbreak failures. diff --git a/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md b/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md index dfa68396a..ebac531c3 100644 --- a/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md +++ b/src/data/roadmaps/ai-red-teaming/content/lab-environments@MmwwRK4I9aRH_ha7duPqf.md @@ -1 +1,10 @@ -# Lab Environments \ No newline at end of file +# Lab Environments + +AI Red Teamers need environments to practice attacking vulnerable systems safely. While traditional labs (HTB, THM, VulnHub) build general pentesting skills, platforms are emerging with labs specifically focused on AI/LLM vulnerabilities, prompt injection, or adversarial ML challenges. + +Learn more from the following resources: + +- [@platform@Gandalf AI Prompt Injection Lab](https://gandalf.lakera.ai/) - A popular web-based lab for prompt injection practice. +- [@platform@Hack The Box: Hacking Labs](https://www.hackthebox.com/hacker/hacking-labs) - General pentesting labs. +- [@platform@TryHackMe: Learn Cyber Security](https://tryhackme.com/) - Gamified cybersecurity training labs. +- [@platform@VulnHub](https://www.vulnhub.com/) - Provides vulnerable VM images for practice. diff --git a/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md b/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md index da3ae161d..b3b680485 100644 --- a/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md +++ b/src/data/roadmaps/ai-red-teaming/content/large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md @@ -1 +1,9 @@ -# Large Language Models \ No newline at end of file +# Large Language Models + +LLMs are a primary target for AI Red Teaming. Understanding their architecture (often Transformer-based), training processes (pre-training, fine-tuning), and capabilities (text generation, summarization, Q&A) is essential for identifying vulnerabilities like prompt injection, jailbreaking, data regurgitation, and emergent harmful behaviors specific to these large-scale models. + +Learn more from the following resources: + +- [@article@What is an LLM (large language model)? - Cloudflare](https://www.cloudflare.com/learning/ai/what-is-large-language-model/) - Concise explanation from Cloudflare. +- [@guide@Introduction to Large Language Models - Learn Prompting](https://learnprompting.org/docs/intro_to_llms) - Learn Prompting's introduction. +- [@guide@What Are Large Language Models? A Beginner's Guide for 2025 - KDnuggets](https://www.kdnuggets.com/large-language-models-beginners-guide-2025) - Overview of LLMs, how they work, strengths, and limitations. diff --git a/src/data/roadmaps/ai-red-teaming/content/llm-security-testing@xJYTRbPxMn0Xs5ea0Ygn6.md b/src/data/roadmaps/ai-red-teaming/content/llm-security-testing@xJYTRbPxMn0Xs5ea0Ygn6.md index ccd9745dd..d4edd7b4e 100644 --- a/src/data/roadmaps/ai-red-teaming/content/llm-security-testing@xJYTRbPxMn0Xs5ea0Ygn6.md +++ b/src/data/roadmaps/ai-red-teaming/content/llm-security-testing@xJYTRbPxMn0Xs5ea0Ygn6.md @@ -1 +1,9 @@ -# LLM Security Testing \ No newline at end of file +# LLM Security Testing + +The core application area for many AI Red Teamers today involves specifically testing Large Language Models for vulnerabilities like prompt injection, jailbreaking, harmful content generation, bias, and data privacy issues using specialized prompts and evaluation frameworks. + +Learn more from the following resources: + +- [@course@AI Red Teaming Courses - Learn Prompting](https://learnprompting.org/blog/ai-red-teaming-courses) - Courses focused on testing LLMs. +- [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity - arXiv](https://arxiv.org/abs/2412.20787) - Dataset for evaluating LLMs on security tasks. +- [@guide@The Ultimate Guide to Red Teaming LLMs and Adversarial Prompts (Kili Technology)](https://kili-technology.com/large-language-models-llms/red-teaming-llms-and-adversarial-prompts) - Guide specifically on red teaming LLMs. diff --git a/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md b/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md index db2599d89..0cdbed37b 100644 --- a/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md +++ b/src/data/roadmaps/ai-red-teaming/content/model-inversion@iE5PcswBHnu_EBFIacib0.md @@ -1 +1,10 @@ -# Model Inversion \ No newline at end of file +# Model Inversion + +AI Red Teamers perform model inversion tests to assess if an attacker can reconstruct sensitive training data (like images, text snippets, or personal attributes) by repeatedly querying the model and analyzing its outputs. Success indicates privacy risks due to data memorization, requiring mitigation techniques like differential privacy or output filtering. + +Learn more from the following resources: + +- [@article@Model Inversion Attacks for ML (Medium)](https://medium.com/@ODSC/model-inversion-attacks-for-machine-learning-ff407a1b10d1) - Explanation with examples (e.g., face reconstruction). +- [@article@Model inversion and membership inference: Understanding new AI security risks - Hogan Lovells](https://www.hoganlovells.com/en/publications/model-inversion-and-membership-inference-understanding-new-ai-security-risks-and-mitigating-vulnerabilities) - Discusses risks and mitigation. +- [@paper@Extracting Training Data from LLMs (arXiv)](https://arxiv.org/abs/2012.07805) - Research demonstrating feasibility on LLMs. +- [@paper@Model Inversion Attacks: A Survey of Approaches and Countermeasures - arXiv](https://arxiv.org/html/2411.10023v1) - Comprehensive survey of model inversion attacks and defenses. diff --git a/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md b/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md index c9532891c..228d9246f 100644 --- a/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md +++ b/src/data/roadmaps/ai-red-teaming/content/model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md @@ -1 +1,9 @@ -# Model Vulnerabilities \ No newline at end of file +# Model Vulnerabilities + +This category covers attacks and tests targeting the AI model itself, beyond the prompt interface. AI Red Teamers investigate inherent weaknesses in the model's architecture, training data artifacts, or prediction mechanisms, such as susceptibility to data extraction, poisoning, or adversarial manipulation. + +Learn more from the following resources: + +- [@article@AI Security Risks Uncovered: What You Must Know in 2025 - TTMS](https://ttms.com/uk/ai-security-risks-explained-what-you-need-to-know-in-2025/) - Discusses adversarial attacks, data poisoning, and prototype theft. +- [@article@Attacking AI Models (Trail of Bits Blog Series)](https://blog.trailofbits.com/category/ai-security/) - Series discussing model-focused attacks. +- [@report@AI and ML Vulnerabilities (CNAS Report)](https://www.cnas.org/publications/reports/understanding-and-mitigating-ai-vulnerabilities) - Overview of known machine learning vulnerabilities. diff --git a/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md b/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md index 69009be05..9103c4cab 100644 --- a/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md +++ b/src/data/roadmaps/ai-red-teaming/content/model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md @@ -1 +1,10 @@ -# Model Weight Stealing \ No newline at end of file +# Model Weight Stealing + +AI Red Teamers assess the risk of attackers reconstructing or stealing the proprietary weights of a trained model, often through API query-based attacks. Testing involves simulating such attacks to understand how easily the model's functionality can be replicated, which informs defenses like query rate limiting, watermarking, or differential privacy. + +Learn more from the following resources: + +- [@article@A Playbook for Securing AI Model Weights - RAND](https://www.rand.org/pubs/research_briefs/RBA2849-1.html) - Discusses attack vectors and security levels for protecting model weights. +- [@article@How to Steal a Machine Learning Model (SkyCryptor)](https://skycryptor.com/blog/how-to-steal-a-machine-learning-model) - Explains model weight extraction via query attacks. +- [@paper@Defense Against Model Stealing (Microsoft Research)](https://www.microsoft.com/en-us/research/publication/defense-against-model-stealing-attacks/) - Research on detecting and defending against model stealing. +- [@paper@On the Limitations of Model Stealing with Uncertainty Quantification Models - OpenReview](https://openreview.net/pdf?id=ONRFHoUzNk) - Research exploring model stealing techniques. diff --git a/src/data/roadmaps/ai-red-teaming/content/monitoring-solutions@59lkLcoqV4gq7f8Zm0X2p.md b/src/data/roadmaps/ai-red-teaming/content/monitoring-solutions@59lkLcoqV4gq7f8Zm0X2p.md index 9c7be42ca..ae4bd0cfc 100644 --- a/src/data/roadmaps/ai-red-teaming/content/monitoring-solutions@59lkLcoqV4gq7f8Zm0X2p.md +++ b/src/data/roadmaps/ai-red-teaming/content/monitoring-solutions@59lkLcoqV4gq7f8Zm0X2p.md @@ -1 +1,11 @@ -# Monitoring Solutions \ No newline at end of file +# Monitoring Solutions + +AI Red Teamers interact with monitoring tools primarily to test their effectiveness (evasion) or potentially exploit vulnerabilities within them. Understanding tools like IDS (Snort, Suricata), network analyzers (Wireshark), and SIEMs helps red teamers simulate attacks that might bypass or target these defensive systems. + +Learn more from the following resources: + +- [@article@Open Source IDS Tools: Comparing Suricata, Snort, Bro (Zeek), Linux - LevelBlue](https://levelblue.com/blogs/security-essentials/open-source-intrusion-detection-tools-a-quick-overview) - Comparison of common open source monitoring tools. +- [@tool@Snort](https://www.snort.org/) - Open source IDS/IPS. +- [@tool@Suricata](https://suricata.io/) - Open source IDS/IPS/NSM. +- [@tool@Wireshark](https://www.wireshark.org/) - Network protocol analyzer. +- [@tool@Zeek (formerly Bro)](https://zeek.org/) - Network security monitoring framework. diff --git a/src/data/roadmaps/ai-red-teaming/content/neural-networks@RuKzVhd1nZphCrlW1wZGL.md b/src/data/roadmaps/ai-red-teaming/content/neural-networks@RuKzVhd1nZphCrlW1wZGL.md index e9fcdbc74..1dc11b008 100644 --- a/src/data/roadmaps/ai-red-teaming/content/neural-networks@RuKzVhd1nZphCrlW1wZGL.md +++ b/src/data/roadmaps/ai-red-teaming/content/neural-networks@RuKzVhd1nZphCrlW1wZGL.md @@ -1 +1,9 @@ -# Neural Networks \ No newline at end of file +# Neural Networks + +Understanding neural network architectures (layers, nodes, activation functions) is vital for AI Red Teamers. This knowledge allows for targeted testing, such as crafting adversarial examples that exploit specific activation functions or identifying potential vulnerabilities related to network depth or connectivity. It provides insight into the 'black box' for more effective white/grey-box testing. + +Learn more from the following resources: + +- [@guide@Neural Networks Explained: A Beginner's Guide - SkillCamper](https://www.skillcamper.com/blog/neural-networks-explained-a-beginners-guide) - Foundational guide. +- [@guide@Neural networks | Machine Learning - Google for Developers](https://developers.google.com/machine-learning/crash-course/neural-networks) - Google's explanation within their ML crash course. +- [@paper@Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review - arXiv](https://arxiv.org/html/2503.19626) - Review discussing AI methods like neural networks used in red teaming simulations. diff --git a/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md b/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md index 2d10f0367..916aa0891 100644 --- a/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md +++ b/src/data/roadmaps/ai-red-teaming/content/prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md @@ -1 +1,11 @@ -# Prompt Engineering \ No newline at end of file +# Prompt Engineering + +For AI Red Teamers, prompt engineering is both a tool and a target. It's a tool for crafting inputs to test model boundaries and vulnerabilities (e.g., creating jailbreak prompts). It's a target because understanding how prompts influence LLMs is key to identifying prompt injection vulnerabilities and designing defenses. Mastering prompt design is fundamental to effective LLM red teaming. + +Learn more from the following resources: + +- [@article@Introduction to Prompt Engineering - Datacamp](https://www.datacamp.com/tutorial/introduction-prompt-engineering) - Tutorial covering basics. +- [@article@System Prompts - InjectPrompt](https://www.injectprompt.com/t/system-prompts) - Look at the system prompts of flagship LLMs. +- [@course@Introduction to Prompt Engineering - Learn Prompting](https://learnprompting.org/courses/intro-to-prompt-engineering) - Foundational course from Learn Prompting. +- [@guide@Prompt Engineering Guide - Learn Prompting](https://learnprompting.org/docs/prompt-engineering) - Comprehensive guide from Learn Prompting. +- [@guide@The Ultimate Guide to Red Teaming LLMs and Adversarial Prompts (Kili Technology)](https://kili-technology.com/large-language-models-llms/red-teaming-llms-and-adversarial-prompts) - Connects prompt engineering directly to LLM red teaming concepts. diff --git a/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md b/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md index 1d4b5e7fa..444638853 100644 --- a/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md +++ b/src/data/roadmaps/ai-red-teaming/content/prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md @@ -1 +1,9 @@ -# Prompt Hacking \ No newline at end of file +# Prompt Hacking + +Prompt hacking is a core technique for AI Red Teamers targeting LLMs. It involves crafting inputs (prompts) to manipulate the model into bypassing safety controls, revealing hidden information, or performing unintended actions. Red teamers systematically test various prompt hacking methods (like jailbreaking, role-playing, or instruction manipulation) to assess the LLM's resilience against adversarial user input. + +Learn more from the following resources: + +- [@course@Introduction to Prompt Hacking - Learn Prompting](https://learnprompting.org/courses/intro-to-prompt-hacking) - Free introductory course. +- [@guide@Prompt Hacking Guide - Learn Prompting](https://learnprompting.org/docs/category/prompt-hacking) - Detailed guide covering techniques. +- [@paper@SoK: Prompt Hacking of LLMs (arXiv 2023)](https://arxiv.org/abs/2311.05544) - Comprehensive research overview of prompt hacking types and techniques. diff --git a/src/data/roadmaps/ai-red-teaming/content/prompt-injection@XOrAPDRhBvde9R-znEipH.md b/src/data/roadmaps/ai-red-teaming/content/prompt-injection@XOrAPDRhBvde9R-znEipH.md index 7227af039..7e2ceceef 100644 --- a/src/data/roadmaps/ai-red-teaming/content/prompt-injection@XOrAPDRhBvde9R-znEipH.md +++ b/src/data/roadmaps/ai-red-teaming/content/prompt-injection@XOrAPDRhBvde9R-znEipH.md @@ -1 +1,11 @@ -# Prompt Injection \ No newline at end of file +# Prompt Injection + +Prompt injection is a critical vulnerability tested by AI Red Teamers. They attempt to insert instructions into the LLM's input that override its intended system prompt or task, causing it to perform unauthorized actions, leak data, or generate malicious output. This tests the model's ability to distinguish trusted instructions from potentially harmful user/external input. + +Learn more from the following resources: + +- [@article@Prompt Injection & the Rise of Prompt Attacks: All You Need to Know | Lakera](https://www.lakera.ai/blog/guide-to-prompt-injection) - Guide covering different types of prompt attacks. +- [@article@Prompt Injection (Learn Prompting)](https://learnprompting.org/docs/prompt_hacking/injection) - Learn Prompting article describing prompt injection with examples and mitigation strategies. +- [@article@Prompt Injection Attack Explanation (IBM)](https://research.ibm.com/blog/prompt-injection-attacks-against-llms) - Explains what prompt injections are and how they work. +- [@article@Prompt Injection: Impact, How It Works & 4 Defense Measures - Tigera](https://www.tigera.io/learn/guides/llm-security/prompt-injection/) - Overview of impact and defenses. +- [@course@Advanced Prompt Hacking - Learn Prompting](https://learnprompting.org/courses/advanced-prompt-hacking) - Covers advanced injection techniques. diff --git a/src/data/roadmaps/ai-red-teaming/content/red-team-simulations@DpYsL0du37n40toH33fIr.md b/src/data/roadmaps/ai-red-teaming/content/red-team-simulations@DpYsL0du37n40toH33fIr.md index fd3d564b7..1d4a28898 100644 --- a/src/data/roadmaps/ai-red-teaming/content/red-team-simulations@DpYsL0du37n40toH33fIr.md +++ b/src/data/roadmaps/ai-red-teaming/content/red-team-simulations@DpYsL0du37n40toH33fIr.md @@ -1 +1,9 @@ -# Red Team Simulations \ No newline at end of file +# Red Team Simulations + +Participating in or conducting structured red team simulations against AI systems (or components) provides the most realistic practice. This involves applying methodologies, TTPs (Tactics, Techniques, and Procedures), reconnaissance, exploitation, and reporting within a defined scope and objective, specifically targeting AI vulnerabilities. + +Learn more from the following resources: + +- [@guide@A Simple Guide to Successful Red Teaming - Cobalt Strike](https://www.cobaltstrike.com/resources/guides/a-simple-guide-to-successful-red-teaming) - General guide adaptable to AI context. +- [@guide@The Complete Guide to Red Teaming: Process, Benefits & More - Mindgard AI](https://mindgard.ai/blog/red-teaming) - Overview of red teaming process. +- [@guide@The Complete Red Teaming Checklist [PDF]: 5 Key Steps - Mindgard AI](https://mindgard.ai/blog/red-teaming-checklist) - Checklist for planning engagements. diff --git a/src/data/roadmaps/ai-red-teaming/content/reinforcement-learning@Xqzc4mOKsVzwaUxLGjHya.md b/src/data/roadmaps/ai-red-teaming/content/reinforcement-learning@Xqzc4mOKsVzwaUxLGjHya.md index 998d9bd63..f7bc2a322 100644 --- a/src/data/roadmaps/ai-red-teaming/content/reinforcement-learning@Xqzc4mOKsVzwaUxLGjHya.md +++ b/src/data/roadmaps/ai-red-teaming/content/reinforcement-learning@Xqzc4mOKsVzwaUxLGjHya.md @@ -1 +1,10 @@ -# Reinforcement Learning \ No newline at end of file +# Reinforcement Learning + +Red teaming RL-based AI systems involves testing for vulnerabilities such as reward hacking (exploiting the reward function to induce unintended behavior), unsafe exploration (agent takes harmful actions during learning), or susceptibility to adversarial perturbations in the environment's state. Understanding the agent's policy and value functions is crucial for designing effective tests against RL agents. + +Learn more from the following resources: + +- [@article@Best Resources to Learn Reinforcement Learning - Towards Data Science](https://towardsdatascience.com/best-free-courses-and-resources-to-learn-reinforcement-learning-ed6633608cb2/) - Curated list of RL learning resources. +- [@article@What is reinforcement learning? - Blog - York Online Masters degrees](https://online.york.ac.uk/resources/what-is-reinforcement-learning/) - Foundational explanation. +- [@course@Deep Reinforcement Learning Course by HuggingFace](https://huggingface.co/learn/deep-rl-course/unit0/introduction) - Comprehensive free course on Deep RL. +- [@paper@Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning - arXiv](https://arxiv.org/html/2412.18693v1) - Research on using RL for red teaming and generating attacks. diff --git a/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md b/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md index 10a5f3987..98adde85c 100644 --- a/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md +++ b/src/data/roadmaps/ai-red-teaming/content/remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md @@ -1 +1,9 @@ -# Remote Code Execution \ No newline at end of file +# Remote Code Execution + +AI Red Teamers attempt to achieve RCE on systems hosting or interacting with AI models. This could involve exploiting vulnerabilities in the AI framework itself, the web server, connected APIs, or tricking an AI agent with code execution capabilities into running malicious commands provided via prompts. RCE is often the ultimate goal of exploiting other vulnerabilities like code injection or insecure deserialization. + +Learn more from the following resources: + +- [@article@Exploiting LLMs with Code Execution (GitHub Gist)](https://gist.github.com/coolaj86/6f4f7b30129b0251f61fa7baaa881516) - Example of achieving code execution via LLM manipulation. +- [@article@What is remote code execution? - Cloudflare](https://www.cloudflare.com/learning/security/what-is-remote-code-execution/) - Definition and explanation of RCE. +- [@video@DEFCON 31 - AI Village - Hacking an LLM embedded system (agent) - Johann Rehberger](https://www.google.com/search?q=https://www.youtube.com/watch%3Fv%3D6u04C1N69ks?v=1FfYnF2GXVU) - Demonstrates RCE risks with LLM agents. diff --git a/src/data/roadmaps/ai-red-teaming/content/reporting-tools@BLnfNlA0C4yzy1dvifjwx.md b/src/data/roadmaps/ai-red-teaming/content/reporting-tools@BLnfNlA0C4yzy1dvifjwx.md index b4885e59a..7c2ceea41 100644 --- a/src/data/roadmaps/ai-red-teaming/content/reporting-tools@BLnfNlA0C4yzy1dvifjwx.md +++ b/src/data/roadmaps/ai-red-teaming/content/reporting-tools@BLnfNlA0C4yzy1dvifjwx.md @@ -1 +1,9 @@ -# Reporting Tools \ No newline at end of file +# Reporting Tools + +AI Red Teamers use reporting techniques and potentially tools to clearly document their findings, including discovered vulnerabilities, successful exploit steps (e.g., effective prompts), assessed impact, and actionable recommendations tailored to AI systems. Good reporting translates technical findings into understandable risks for stakeholders. + +Learn more from the following resources: + +- [@article@The Complete Red Teaming Checklist [PDF]: 5 Key Steps - Mindgard AI](https://mindgard.ai/blog/red-teaming-checklist) (Mentions reporting and templates) +- [@guide@Penetration Testing Report: 6 Key Sections and 4 Best Practices - Bright Security](https://brightsec.com/blog/penetration-testing-report/) - General best practices for reporting security findings. +- [@guide@Penetration testing best practices: Strategies for all test types - Strike Graph](https://www.strikegraph.com/blog/pen-testing-best-practices) - Includes tips on documentation. diff --git a/src/data/roadmaps/ai-red-teaming/content/research-groups@ZlR03pM-sqVFZNhD1gMSJ.md b/src/data/roadmaps/ai-red-teaming/content/research-groups@ZlR03pM-sqVFZNhD1gMSJ.md index 8a1fbf925..2a326a2c5 100644 --- a/src/data/roadmaps/ai-red-teaming/content/research-groups@ZlR03pM-sqVFZNhD1gMSJ.md +++ b/src/data/roadmaps/ai-red-teaming/content/research-groups@ZlR03pM-sqVFZNhD1gMSJ.md @@ -1 +1,10 @@ -# Research Groups \ No newline at end of file +# Research Groups + +Following and potentially contributing to research groups at universities (like CMU, Stanford, Oxford), non-profits (like OpenAI, Anthropic), or government bodies (like UK's AISI) focused on AI safety, security, and alignment provides deep insights into emerging threats and mitigation strategies relevant to AI Red Teaming. + +Learn more from the following resources: + +- [@group@AI Cybersecurity | Global Cyber Security Capacity Centre (Oxford)](https://gcscc.ox.ac.uk/ai-security) - Academic research center. +- [@group@Anthropic Research](https://www.anthropic.com/research) - AI safety research lab. +- [@group@Center for AI Safety](https://www.safe.ai/) - Non-profit research organization. +- [@group@The AI Security Institute (AISI)](https://www.aisi.gov.uk/) - UK government institute focused on AI safety/security research. diff --git a/src/data/roadmaps/ai-red-teaming/content/research-opportunities@VmaIHVsCpq2um_0cA33V3.md b/src/data/roadmaps/ai-red-teaming/content/research-opportunities@VmaIHVsCpq2um_0cA33V3.md index 5841ae1c3..b4ce5312e 100644 --- a/src/data/roadmaps/ai-red-teaming/content/research-opportunities@VmaIHVsCpq2um_0cA33V3.md +++ b/src/data/roadmaps/ai-red-teaming/content/research-opportunities@VmaIHVsCpq2um_0cA33V3.md @@ -1 +1,9 @@ -# Research Opportunities \ No newline at end of file +# Research Opportunities + +AI Red Teaming relies on ongoing research. Key areas needing further investigation include scalable methods for finding elusive vulnerabilities, understanding emergent behaviors in complex models, developing provable safety guarantees, creating better benchmarks for AI security, and exploring the socio-technical aspects of AI misuse and defense. + +Learn more from the following resources: + +- [@article@Cutting-Edge Research on AI Security bolstered with new Challenge Fund - GOV.UK](https://www.gov.uk/government/news/cutting-edge-research-on-ai-security-bolstered-with-new-challenge-fund-to-ramp-up-public-trust-and-adoption) - Highlights government funding for AI security research priorities. +- [@research@Careers | The AI Security Institute (AISI)](https://www.aisi.gov.uk/careers) - Outlines research focus areas for the UK's AISI. +- [@research@Research - Anthropic](https://www.anthropic.com/research) - Example of research areas at a leading AI safety lab. diff --git a/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md b/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md index f9e2753fc..9aa3ce198 100644 --- a/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md +++ b/src/data/roadmaps/ai-red-teaming/content/responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md @@ -1 +1,9 @@ -# Responsible Disclosure \ No newline at end of file +# Responsible Disclosure + +A critical practice for AI Red Teamers is responsible disclosure: privately reporting discovered AI vulnerabilities (e.g., a successful jailbreak, data leak method, or severe bias) to the model developers or system owners, allowing them time to remediate before any public discussion, thus preventing malicious exploitation. + +Learn more from the following resources: + +- [@guide@Responsible Disclosure of AI Vulnerabilities - Preamble AI](https://www.preamble.com/blog/responsible-disclosure-of-ai-vulnerabilities) - Discusses the process specifically for AI vulnerabilities. +- [@guide@Vulnerability Disclosure Program | CISA](https://www.cisa.gov/resources-tools/programs/vulnerability-disclosure-program-vdp) - Government VDP example. +- [@policy@Google Vulnerability Reward Program (VRP)](https://bughunters.google.com/) - Example of a major tech company's VDP/bug bounty program. diff --git a/src/data/roadmaps/ai-red-teaming/content/risk-management@MupRvk_8Io2Hn7yEvU663.md b/src/data/roadmaps/ai-red-teaming/content/risk-management@MupRvk_8Io2Hn7yEvU663.md index bc1aa5e0e..ef1aad6ea 100644 --- a/src/data/roadmaps/ai-red-teaming/content/risk-management@MupRvk_8Io2Hn7yEvU663.md +++ b/src/data/roadmaps/ai-red-teaming/content/risk-management@MupRvk_8Io2Hn7yEvU663.md @@ -1 +1,9 @@ -# Risk Management \ No newline at end of file +# Risk Management + +AI Red Teamers contribute to the AI risk management process by identifying and demonstrating concrete vulnerabilities. Findings from red team exercises inform risk assessments, helping organizations understand the likelihood and potential impact of specific AI threats and prioritize resources for mitigation based on demonstrated exploitability. + +Learn more from the following resources: + +- [@framework@NIST AI Risk Management Framework](https://www.nist.gov/itl/ai-risk-management-framework) - Key framework for managing AI-specific risks. +- [@guide@A Beginner's Guide to Cybersecurity Risks and Vulnerabilities - Champlain College Online](https://online.champlain.edu/blog/beginners-guide-cybersecurity-risk-management) - Foundational understanding of risk. +- [@guide@Cybersecurity Risk Management: Frameworks, Plans, and Best Practices - Hyperproof](https://hyperproof.io/resource/cybersecurity-risk-management-process/) - General guide applicable to AI system context. diff --git a/src/data/roadmaps/ai-red-teaming/content/robust-model-design@6gEHMhh6BGJI-ZYN27YPW.md b/src/data/roadmaps/ai-red-teaming/content/robust-model-design@6gEHMhh6BGJI-ZYN27YPW.md index 793ae2fd1..a0532f58a 100644 --- a/src/data/roadmaps/ai-red-teaming/content/robust-model-design@6gEHMhh6BGJI-ZYN27YPW.md +++ b/src/data/roadmaps/ai-red-teaming/content/robust-model-design@6gEHMhh6BGJI-ZYN27YPW.md @@ -1 +1,9 @@ -# Robust Model Design \ No newline at end of file +# Robust Model Design + +AI Red Teamers assess whether choices made during model design (architecture selection, regularization techniques, ensemble methods) effectively contribute to robustness against anticipated attacks. They test if these design choices actually prevent common failure modes identified during threat modeling. + +Learn more from the following resources: + +- [@article@Model Robustness: Building Reliable AI Models - Encord](https://encord.com/blog/model-robustness-machine-learning-strategies/) - Discusses strategies for building robust models. +- [@article@Understanding Robustness in Machine Learning - Alooba](https://www.alooba.com/skills/concepts/machine-learning/robustness/) - Explains the concept of ML robustness. +- [@paper@Towards Evaluating the Robustness of Neural Networks (arXiv by Goodfellow et al.)](https://arxiv.org/abs/1608.04644) - Foundational paper on evaluating robustness. diff --git a/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md b/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md index 92742bd3b..81bb09296 100644 --- a/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md +++ b/src/data/roadmaps/ai-red-teaming/content/role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md @@ -1 +1,9 @@ -# Role of Red Teams \ No newline at end of file +# Role of Red Teams + +The role of an AI Red Team is to rigorously challenge AI systems from an adversarial perspective. They design and execute tests to uncover vulnerabilities related to the model's logic, data dependencies, prompt interfaces, safety alignments, and interactions with surrounding infrastructure. They provide detailed reports on findings, potential impacts, and remediation advice, acting as a critical feedback loop for AI developers and stakeholders to improve system security and trustworthiness before and after deployment. + +Learn more from the following resources: + +- [@article@The Complete Guide to Red Teaming: Process, Benefits & More - Mindgard AI](https://mindgard.ai/blog/red-teaming) - Discusses the purpose and process of red teaming. +- [@article@The Complete Red Teaming Checklist [PDF]: 5 Key Steps - Mindgard AI](https://mindgard.ai/blog/red-teaming-checklist) - Outlines typical red team roles and responsibilities. +- [@article@What is AI Red Teaming? - Learn Prompting](https://learnprompting.org/docs/category/ai-red-teaming) - Defines the role and activities. diff --git a/src/data/roadmaps/ai-red-teaming/content/safety-filter-bypasses@j7uLLpt8MkZ1rqM7UBPW4.md b/src/data/roadmaps/ai-red-teaming/content/safety-filter-bypasses@j7uLLpt8MkZ1rqM7UBPW4.md index 3b20ce7e0..1415fc509 100644 --- a/src/data/roadmaps/ai-red-teaming/content/safety-filter-bypasses@j7uLLpt8MkZ1rqM7UBPW4.md +++ b/src/data/roadmaps/ai-red-teaming/content/safety-filter-bypasses@j7uLLpt8MkZ1rqM7UBPW4.md @@ -1 +1,9 @@ -# Safety Filter Bypasses \ No newline at end of file +# Safety Filter Bypasses + +AI Red Teamers specifically target the safety mechanisms (filters, guardrails) implemented within or around an AI model. They test techniques like using synonyms for blocked words, employing different languages, embedding harmful requests within harmless text, or using character-level obfuscation to evade detection and induce the model to generate prohibited content, thereby assessing the robustness of the safety controls. + +Learn more from the following resources: + +- [@article@Bypassing AI Content Filters | Restackio](https://www.restack.io/p/ai-driven-content-moderation-answer-bypass-filters-cat-ai) - Discusses techniques for evasion. +- [@article@How to Bypass Azure AI Content Safety Guardrails - Mindgard](https://mindgard.ai/blog/bypassing-azure-ai-content-safety-guardrails) - Case study on bypassing specific safety mechanisms. +- [@article@The Best Methods to Bypass AI Detection: Tips and Techniques - PopAi](https://www.popai.pro/resources/the-best-methods-to-bypass-ai-detection-tips-and-techniques/) - Focuses on evasion, relevant for filter bypass testing. diff --git a/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md b/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md index aa116eb04..fc0a4d2b3 100644 --- a/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md +++ b/src/data/roadmaps/ai-red-teaming/content/specialized-courses@s1xKK8HL5-QGZpcutiuvj.md @@ -1 +1,10 @@ -# Specialized Courses \ No newline at end of file +# Specialized Courses + +Targeted training is crucial for mastering AI Red Teaming. Look for courses covering adversarial ML, prompt hacking, LLM security, ethical hacking for AI, and specific red teaming methodologies applied to AI systems offered by platforms like Learn Prompting, Coursera, or security training providers. + +Learn more from the following resources: + +- [@course@AI Red Teaming Courses - Learn Prompting](https://learnprompting.org/blog/ai-red-teaming-courses) - Curated list including free and paid options. +- [@course@AI Security | Coursera](https://www.coursera.org/learn/ai-security) - Covers AI security risks and governance. +- [@course@Exploring Adversarial Machine Learning - NVIDIA](https://www.nvidia.com/en-us/training/instructor-led-workshops/exploring-adversarial-machine-learning/) - Focused training on adversarial ML (paid). +- [@course@Free Online Cyber Security Courses with Certificates in 2025 - EC-Council](https://www.eccouncil.org/cybersecurity-exchange/cyber-novice/free-cybersecurity-courses-beginners/) - Offers foundational cybersecurity courses. diff --git a/src/data/roadmaps/ai-red-teaming/content/supervised-learning@NvOJIv36Utpm7_kOZyr79.md b/src/data/roadmaps/ai-red-teaming/content/supervised-learning@NvOJIv36Utpm7_kOZyr79.md index cc80ec827..3a596965e 100644 --- a/src/data/roadmaps/ai-red-teaming/content/supervised-learning@NvOJIv36Utpm7_kOZyr79.md +++ b/src/data/roadmaps/ai-red-teaming/content/supervised-learning@NvOJIv36Utpm7_kOZyr79.md @@ -1 +1,9 @@ -# Supervised Learning \ No newline at end of file +# Supervised Learning + +AI Red Teamers analyze systems built using supervised learning to probe for vulnerabilities like susceptibility to adversarial examples designed to cause misclassification, sensitivity to data distribution shifts, or potential for data leakage related to the labeled training data. Understanding how these models learn input-output mappings is key to devising tests that challenge their learned boundaries. + +Learn more from the following resources: + +- [@article@AI and cybersecurity: a love-hate revolution - Alter Solutions](https://www.alter-solutions.com/en-us/articles/ai-cybersecurity-love-hate-revolution) - Discusses supervised learning use in vulnerability scanning and potential exploits. +- [@article@What Is Supervised Learning? | IBM](https://www.ibm.com/think/topics/supervised-learning) - Foundational explanation. +- [@article@What is Supervised Learning? | Google Cloud](https://cloud.google.com/discover/what-is-supervised-learning) - Foundational explanation. diff --git a/src/data/roadmaps/ai-red-teaming/content/testing-platforms@c8n8FcYKDOgPLQvV9xF5J.md b/src/data/roadmaps/ai-red-teaming/content/testing-platforms@c8n8FcYKDOgPLQvV9xF5J.md index 1138bca53..5646b984d 100644 --- a/src/data/roadmaps/ai-red-teaming/content/testing-platforms@c8n8FcYKDOgPLQvV9xF5J.md +++ b/src/data/roadmaps/ai-red-teaming/content/testing-platforms@c8n8FcYKDOgPLQvV9xF5J.md @@ -1 +1,11 @@ -# Testing Platforms \ No newline at end of file +# Testing Platforms + +Platforms used by AI Red Teamers range from general penetration testing OS distributions like Kali Linux to specific AI red teaming tools/frameworks like Microsoft's PyRIT or Promptfoo, and vulnerability scanners like OWASP ZAP adapted for API testing of AI services. These platforms provide the toolsets needed to conduct assessments. + +Learn more from the following resources: + +- [@tool@AI Red Teaming Agent - Azure AI Foundry | Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/ai-red-teaming-agent) - Microsoft's tool leveraging PyRIT. +- [@tool@Kali Linux](https://www.kali.org/) - Standard pentesting distribution. +- [@tool@OWASP Zed Attack Proxy (ZAP)](https://owasp.org/www-project-zap/) - Widely used for web/API security testing. +- [@tool@Promptfoo](https://www.promptfoo.dev/) - Open-source tool for testing and evaluating LLMs, includes red teaming features. +- [@tool@PyRIT (Python Risk Identification Tool for generative AI) - GitHub](https://github.com/Azure/PyRIT) - Open-source framework from Microsoft. diff --git a/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md b/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md index 4d4e54d40..8349d5d2a 100644 --- a/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md +++ b/src/data/roadmaps/ai-red-teaming/content/threat-modeling@RDOaTBWP3aIJPUp_kcafm.md @@ -1 +1,10 @@ -# Threat Modeling \ No newline at end of file +# Threat Modeling + +AI Red Teams apply threat modeling to identify unique attack surfaces in AI systems, such as manipulating training data, exploiting prompt interfaces, attacking the model inference process, or compromising connected tools/APIs. Before attacking an AI system, red teamers perform threat modeling to map out possible adversaries (from curious users to state actors) and attack vectors, prioritizing tests based on likely impact and adversary capability. + +Learn more from the following resources: + +- [@article@Core Components of AI Red Team Exercises (Learn Prompting)](https://learnprompting.org/blog/what-is-ai-red-teaming) - Describes threat modeling as the first phase of an AI red team engagement. +- [@guide@Threat Modeling Process | OWASP Foundation](https://owasp.org/www-community/Threat_Modeling_Process) - More detailed process steps. +- [@guide@Threat Modeling | OWASP Foundation](https://owasp.org/www-community/Threat_Modeling) - General threat modeling process applicable to AI context. +- [@video@How Microsoft Approaches AI Red Teaming (MS Build)](https://learn.microsoft.com/en-us/events/build-may-2023/breakout-responsible-ai-red-teaming/) - Video on Microsoft’s AI red team process, including threat modeling specific to AI. diff --git a/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md b/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md index f589e9da8..abefc0554 100644 --- a/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md +++ b/src/data/roadmaps/ai-red-teaming/content/unauthorized-access@DQeOavZCoXpF3k_qRDABs.md @@ -1 +1,9 @@ -# Unauthorized Access \ No newline at end of file +# Unauthorized Access + +AI Red Teamers test if vulnerabilities in the AI system or its interfaces allow attackers to gain unauthorized access to data, functionalities, or underlying infrastructure. This includes attempting privilege escalation via prompts, exploiting insecure API endpoints connected to the AI, or manipulating the AI to access restricted system resources. + +Learn more from the following resources: + +- [@article@Unauthorized Data Access via LLMs (Security Boulevard)](https://securityboulevard.com/2023/11/unauthorized-data-access-via-llms/) - Discusses risks of LLMs accessing unauthorized data. +- [@guide@OWASP API Security Project](https://owasp.org/www-project-api-security/) - Covers API risks like broken access control relevant to AI systems. +- [@paper@AI System Abuse Cases (Harvard Belfer Center)](https://www.belfercenter.org/publication/ai-system-abuse-cases) - Covers various ways AI systems can be abused, including access violations. diff --git a/src/data/roadmaps/ai-red-teaming/content/unsupervised-learning@ZC0yKsu-CJC-LZKKo2pLD.md b/src/data/roadmaps/ai-red-teaming/content/unsupervised-learning@ZC0yKsu-CJC-LZKKo2pLD.md index a0216c8b4..d9b11f673 100644 --- a/src/data/roadmaps/ai-red-teaming/content/unsupervised-learning@ZC0yKsu-CJC-LZKKo2pLD.md +++ b/src/data/roadmaps/ai-red-teaming/content/unsupervised-learning@ZC0yKsu-CJC-LZKKo2pLD.md @@ -1 +1,8 @@ -# Unsupervised Learning \ No newline at end of file +# Unsupervised Learning + +When red teaming AI systems using unsupervised learning (e.g., clustering algorithms), focus areas include assessing whether the discovered patterns reveal sensitive information, if the model can be manipulated to group data incorrectly, or if dimensionality reduction techniques obscure security-relevant features. Understanding these models helps identify risks associated with pattern discovery on unlabeled data. + +Learn more from the following resources: + +- [@article@How Unsupervised Learning Works with Examples - Coursera](https://www.coursera.org/articles/unsupervised-learning) - Foundational explanation with examples. +- [@article@Supervised vs. Unsupervised Learning: Which Approach is Best? - DigitalOcean](https://www.digitalocean.com/resources/articles/supervised-vs-unsupervised-learning) - Contrasts learning types, relevant for understanding different attack surfaces. diff --git a/src/data/roadmaps/ai-red-teaming/content/vulnerability-assessment@887lc3tWCRH-sOHSxWgWJ.md b/src/data/roadmaps/ai-red-teaming/content/vulnerability-assessment@887lc3tWCRH-sOHSxWgWJ.md index 5666706cd..be2d34566 100644 --- a/src/data/roadmaps/ai-red-teaming/content/vulnerability-assessment@887lc3tWCRH-sOHSxWgWJ.md +++ b/src/data/roadmaps/ai-red-teaming/content/vulnerability-assessment@887lc3tWCRH-sOHSxWgWJ.md @@ -1 +1,9 @@ -# Vulnerability Assessment \ No newline at end of file +# Vulnerability Assessment + +While general vulnerability assessment scans infrastructure, AI Red Teaming extends this to assess vulnerabilities specific to the AI model and its unique interactions. This includes probing for prompt injection flaws, testing for adversarial example robustness, checking for data privacy leaks, and evaluating safety alignment failures – weaknesses not typically found by standard IT vulnerability scanners. + +Learn more from the following resources: + +- [@article@AI red-teaming in critical infrastructure: Boosting security and trust in AI systems - DNV](https://www.dnv.com/article/ai-red-teaming-for-critical-infrastructure-industries/) - Discusses vulnerability assessment within AI red teaming for critical systems. +- [@guide@The Ultimate Guide to Vulnerability Assessment - Strobes Security](https://strobes.co/blog/guide-vulnerability-assessment/) - Comprehensive guide on VA process (apply concepts to AI). +- [@guide@Vulnerability Scanning Tools | OWASP Foundation](https://owasp.org/www-community/Vulnerability_Scanning_Tools) - List of tools useful in broader system assessment around AI. diff --git a/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md b/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md index d0cc507ce..ef2567934 100644 --- a/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md +++ b/src/data/roadmaps/ai-red-teaming/content/white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md @@ -1 +1,9 @@ -# White Box Testing \ No newline at end of file +# White Box Testing + +White-box testing in AI Red Teaming grants the tester full access to the model's internals (architecture, weights, training data, source code). This allows for highly targeted attacks, such as crafting precise adversarial examples using gradients, analyzing code for vulnerabilities, or directly examining training data for biases or PII leakage. It simulates insider threats or deep analysis scenarios. + +Learn more from the following resources: + +- [@article@Black-Box, Gray Box, and White-Box Penetration Testing - EC-Council](https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/black-box-gray-box-and-white-box-penetration-testing-importance-and-uses/) - Comparison of testing types. +- [@article@White-Box Adversarial Examples (OpenAI Blog)](https://openai.com/research/adversarial-robustness-toolbox) - Discusses generating attacks with full model knowledge. +- [@guide@LLM red teaming guide (open source) - Promptfoo](https://www.promptfoo.dev/docs/red-team/) - Mentions white-box testing benefits for LLMs. diff --git a/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md b/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md index a127864b7..90bf8ff26 100644 --- a/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md +++ b/src/data/roadmaps/ai-red-teaming/content/why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md @@ -1 +1,10 @@ -# Why Red Team AI Systems? \ No newline at end of file +# Why Red Team AI Systems? + +AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment. + +Learn more from the following resources: + +@article@What's the Difference Between Traditional Red-Teaming and AI Red-Teaming? - Cranium AI - Compares objectives, techniques, expertise, and attack vectors to highlight why AI needs specialized red teaming. +@article@What is AI Red Teaming? The Complete Guide - Mindgard - Details specific use cases like identifying bias, ensuring resilience against AI-specific attacks, testing data privacy, and aligning with regulations. +@article@The Expanding Role of Red Teaming in Defending AI Systems - Protect AI - Explains why the dynamic, adaptive, and often opaque nature of AI necessitates red teaming beyond traditional approaches. +@article@How red teaming helps safeguard the infrastructure behind AI models - IBM - Focuses on unique AI risks like model IP theft, open-source vulnerabilities, and excessive agency that red teaming addresses. \ No newline at end of file