# Benchmark Datasets

AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models.

Learn more from the following resources:

- [@dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset - GitHub](https://github.com/cysecbench/dataset) - Dataset of cybersecurity prompts for benchmarking LLMs.
- [@dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security](https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html) - Using CTF challenges to evaluate LLMs.
- [@dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity - arXiv](https://arxiv.org/abs/2412.20787) - Benchmarking LLMs on cybersecurity tasks.