Update resources in AI Red Teaming Roadmap (#8570)
* Update why-red-team-ai-systems@fNTb9y3zs1HPYclAmu_Wv.md * Update prompt-engineering@gx4KaFqKgJX9n9_ZGMqlZ.md * Update generative-models@3XJ-g0KvHP75U18mxCqgw.md * Update prompt-hacking@1Xr7mxVekeAHzTL7G4eAZ.md * Update jailbreak-techniques@Ds8pqn4y9Npo7z6ubunvc.md * Update countermeasures@G1u_Kq4NeUsGX2qnUTuJU.md * Update forums@Smncq-n1OlnLAY27AFQOO.md * Update lab-environments@MmwwRK4I9aRH_ha7duPqf.md * Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md * Update ctf-challenges@2Imb64Px3ZQcBpSQjdc_G.md * Update industry-credentials@HHjsFR6wRDqUd66PMDE_7.md * Update agentic-ai-security@FVsKivsJrIb82B0lpPmgw.md * Update responsible-disclosure@KAcCZ3zcv25R6HwzAsfUG.md * Update benchmark-datasets@et1Xrr8ez-fmB0mAq8W_a.md * Update adversarial-examples@xjlttOti-_laPRn8a2fVy.md * Update large-language-models@8K-wCn2cLc7Vs_V4sC3sE.md * Update introduction@HFJIYcI16OMyM77fAw9af.md * Update ethical-considerations@1gyuEV519LjN-KpROoVwv.md * Update role-of-red-teams@Irkc9DgBfqSn72WaJqXEt.md * Update threat-modeling@RDOaTBWP3aIJPUp_kcafm.md * Update direct@5zHow4KZVpfhch5Aabeft.md * Update indirect@3_gJRtJSdm2iAfkwmcv0e.md * Update model-vulnerabilities@uBXrri2bXVsNiM8fIHHOv.md * Update model-weight-stealing@QFzLx5nc4rCCD8WVc20mo.md * Update unauthorized-access@DQeOavZCoXpF3k_qRDABs.md * Update data-poisoning@nD0_64ELEeJSN-0aZiR7i.md * Update model-inversion@iE5PcswBHnu_EBFIacib0.md * Update code-injection@vhBu5x8INTtqvx6vcYAhE.md * Update remote-code-execution@kgDsDlBk8W2aM6LyWpFY8.md * Update api-protection@Tszl26iNBnQBdBEWOueDA.md * Update authentication@J7gjlt2MBx7lOkOnfGvPF.md * Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md * Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md * Update white-box-testing@Mrk_js5UVn4dRDw-Yco3Y.md * Update automated-vs-manual@LVdYN9hyCyNPYn2Lz1y9b.md * Update specialized-courses@s1xKK8HL5-QGZpcutiuvj.mdmaster
parent
2937923fb1
commit
80a0caba2f
33 changed files with 36 additions and 41 deletions
@ -1,9 +1,10 @@ |
||||
# Benchmark Datasets |
||||
|
||||
AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models. |
||||
AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like HackAprompt, SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models. |
||||
|
||||
Learn more from the following resources: |
||||
|
||||
- [@dataset@HackAPrompt Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset) |
||||
- [@dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset](https://github.com/cysecbench/dataset) |
||||
- [@dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security](https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html) |
||||
- [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity](https://arxiv.org/abs/2412.20787) |
||||
|
@ -1,3 +1,8 @@ |
||||
# Why Red Team AI Systems? |
||||
|
||||
AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment. |
||||
AI systems introduce novel risks beyond traditional software, such as emergent unintended capabilities, complex failure modes, susceptibility to subtle data manipulations, and potential for large-scale misuse (e.g., generating disinformation). AI Red Teaming is necessary because standard testing methods often fail to uncover these unique AI vulnerabilities. It provides critical, adversary-focused insights needed to build genuinely safe, reliable, and secure AI before deployment. |
||||
|
||||
Learn more from the following resources: |
||||
|
||||
- [@course@Introduction to Prompt Hacking](https://learnprompting.org/courses/intro-to-prompt-hacking) |
||||
- [@article@Prompt Hacking Offensive Measures](https://learnprompting.org/docs/prompt_hacking/offensive_measures/introduction) |
||||
|
Loading…
Reference in new issue