computer-scienceangular-roadmapbackend-roadmapblockchain-roadmapdba-roadmapdeveloper-roadmapdevops-roadmapfrontend-roadmapgo-roadmaphactoberfestjava-roadmapjavascript-roadmapnodejs-roadmappython-roadmapqa-roadmapreact-roadmaproadmapstudy-planvue-roadmapweb3-roadmap
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1.0 KiB
1.0 KiB
Benchmark Datasets
AI Red Teamers may use or contribute to benchmark datasets specifically designed to evaluate AI security. These datasets (like SecBench, NYU CTF Bench, CySecBench) contain prompts or scenarios targeting vulnerabilities, safety issues, or specific cybersecurity capabilities, allowing for standardized testing of models.
Learn more from the following resources:
- @dataset@CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset - GitHub - Dataset of cybersecurity prompts for benchmarking LLMs.
- @dataset@NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security - Using CTF challenges to evaluate LLMs.
- @dataset@SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity - arXiv - Benchmarking LLMs on cybersecurity tasks.