Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
The ongoing leadership issues at OpenAI highlight the necessity of incorporating security throughout the development process of GPT models. On Friday, the drastic decision by OpenAI’s board to fire CEO Sam Altman sparked concerns due to the potential exit of key architects focused on AI security. This situation amplifies worries for businesses using GPT models, emphasizing the importance of security in their creation.
Security is crucial in building AI models that can scale and endure beyond any individual leader or team. However, this principle has yet to be established entirely. Indeed, the board let go of CEO Sam Altman on Friday partly because of his rapid product and business advancements, which reportedly overlooked the company’s safety and security mandates.
This scenario is a prime example of the new challenges in AI: tension arises when boards with independent directors strive for more control over safety, needing to balance risk management with growth pressures. If co-founder Ilya Sutskever and the independent board members backing him in the recent leadership shift manage to retain their positions amid significant backlash from investors and Altman’s supporters, several security issues highlighted by researchers emphasize the need for early integration of security in GPT software development.
Brian Roemmele, an acclaimed prompt engineering expert, identified a significant security flaw in OpenAI’s GPTs. This vulnerability allows ChatGPT to access prompt information and files from a session. Roemmele suggests a prompt addition to mitigate this risk. A similar issue occurred in March when OpenAI admitted and fixed a bug that let users view titles from another user’s chat history, and possibly the first message of a new conversation, if both users were active simultaneously. This bug also exposed payment-related information of 1.2% of active ChatGPT Plus subscribers during a nine-hour window.
Despite claims of session safeguards, attackers continue to perfect techniques to bypass them through prompt engineering. For instance, Brown University researchers found that using less common languages like Zulu and Gaelic could circumvent restrictions effectively. They achieved a 79% success rate compared to less than 1% in English, simply by translating unsafe inputs to these low-resource languages.
Microsoft researchers found in their study, DecodingTrust, that GPT models can be easily coaxed into generating harmful outputs and revealing private information from both training data and conversation history. Although GPT-4 is generally more reliable than GPT-3.5, it is more vulnerable to jailbreaks designed to bypass security measures.
OpenAI’s GPT-4V model, which includes image uploads, is particularly prone to multimodal injection attacks. By embedding commands or malicious scripts within images, attackers can manipulate the model to execute these commands. The lack of a data sanitization step in processing workflows means every image is trusted implicitly, making GPT-4V a primary target for such attacks.
To improve security, it must be an automated and integral part of the development process from the outset. This means embedding security measures in the initial design phases and throughout the software development lifecycle (SDLC). High-performing development teams deploy code frequently, and integrating security from the beginning can enhance code quality and deployment rates.
Effective collaboration between development and security teams can help break down procedural barriers, fostering shared ownership of deployment metrics, software quality, and security measures. This collaborative approach is essential for iteratively enhancing security as a core component of software products.