Patrick Levi

2024

pdf bib abs
Linguistic Obfuscation Attacks and Large Language Model Uncertainty
Sebastian Steindl | Ulrich Schäfer | Bernd Ludwig | Patrick Levi
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)

Large Language Models (LLMs) have taken the research field of Natural Language Processing by storm. Researchers are not only investigating their capabilities and possible applications, but also their weaknesses and how they may be exploited.This has resulted in various attacks and “jailbreaking” approaches that have gained large interest within the community.The vulnerability of LLMs to certain types of input may pose major risks regarding the real-world usage of LLMs in productive operations.We therefore investigate the relationship between a LLM’s uncertainty and its vulnerability to jailbreaking attacks.To this end, we focus on a probabilistic point of view of uncertainty and employ a state-of-the art open-source LLM.We investigate an attack that is based on linguistic obfuscation.Our results indicate that the model is subject to a higher level of uncertainty when confronted with manipulated prompts that aim to evade security mechanisms.This study lays the foundation for future research into the link between model uncertainty and its vulnerability to jailbreaks.

Co-authors

Venues

uncertainlp1
ws1