The Susceptibility of LLMs to the ‘Butterfly Effect’: A Closer Examination

The Susceptibility of LLMs to the ‘Butterfly Effect’: A Closer Examination

Join our daily and weekly newsletters for the latest updates and exclusive content on leading AI coverage.

Prompting is the method we use to communicate with generative AI and large language models (LLMs). Crafting these prompts is an art because it involves getting AI to provide ‘accurate’ answers. However, what happens when we vary these prompts? Does changing the way we ask questions impact the model’s decisions and its accuracy?

Research from the University of Southern California Information Sciences Institute confirms that even small tweaks—like adding a space at the beginning of a prompt or changing a question into a directive—can alter an LLM’s output. More concerning, using XML formats and jailbreak techniques can significantly distort the data labeled by models.

This phenomenon is likened to the butterfly effect in chaos theory, where minor changes can have significant outcomes. Each step in prompt design requires decisions, and LLMs are highly sensitive to these variations.

Researchers from DARPA explored this by testing ChatGPT with four different prompt methods. The first method asked for outputs in various formats like Python List, JSON Checkbox, CSV, XML, or YAML. The second method made minor prompt variations, such as adding spaces, using different greetings, and rephrasing questions into commands. The third method applied jailbreak techniques, including AIM, Dev Mode v2, Evil Confidant, and Refusal Suppression. The fourth method tested the idea of ‘tipping’ the model by mentioning tips or the lack thereof in the prompts.

Their experiments, conducted across 11 classification tasks, showed that small changes in prompts can lead to significant changes in predictions and accuracy. For example, specifying an output format caused at least a 10% change in predictions. Rephrasing statements or adding a space at the beginning also led to substantial changes in predictions. Jailbreak techniques like AIM and Dev Mode v2 resulted in invalid responses in about 90% of cases, highlighting the instability they introduce.

Interestingly, specifying tips or the lack of tips had minimal impact on the model’s performance.

The research highlighted that LLMs are still prone to instability with minor prompt changes. The next major step is to develop LLMs that are resistant to such changes and provide consistent answers. This requires a deeper understanding of why responses change with slight tweaks and finding ways to better anticipate these variations.

As LLMs are increasingly integrated into large-scale systems, ensuring their reliability and consistency becomes even more crucial.