Large Language Models (LLMs) present a complex array of opportunities and vulnerabilities. Prompt injection and jailbreaking techniques have emerged as indispensable methodologies for probing the resilience of these models, expanding the horizons of their capabilities while uncovering potential weaknesses.
Amidst the recent spotlight on Microsoft's PyRIT, a constellation of LLM pen testing tools has gained more attention:
Derczynski's garak stands as a testament to the potency of prompt injection techniques, providing a lens through which to scrutinize the fortitude of LLMs in the face of adversarial inputs.
HouYi's contribution lies in its innovative approach to LLM security, illuminating potential vulnerabilities and avenues of exploitation within these models.
A collaborative endeavor, JailbreakingLLMs delves into the intricacies of jailbreaking tailored for LLMs, offering invaluable insights into fortifying these systems against malicious incursions.
The llm-attacks framework presents a comprehensive toolkit for assessing the security posture of LLMs, encompassing a myriad of attack vectors and defensive strategies.
PromptInject emerges as a beacon of vigilance against prompt injection attacks, furnishing pragmatic solutions for bolstering the security resilience of LLMs.
LLM-Canary introduces novel methodologies for detecting anomalous behaviors within LLMs, serving as a preemptive safeguard against potential breaches.
Microsoft's PyRIT commands attention as a recent addition to the list of LLM pen testing tools, underscoring the burgeoning significance of AI security in the contemporary cybersecurity discourse.
Credits to Idan Gelbourt and Simo Jaanus for researching this list.
For those inclined towards deeper exploration, please see our past research on prompt injection detection:
Link to previous research on prompt injection detection