Simple jailbreak prompt can bypass safety guardrails on major models
Microsoft on Thursday published details about Skeleton Key – a technique that bypasses the guardrails used by makers of AI models to prevent their generative chatbots from creating harmful content.
That’s not an easy task as large language models are trained on all sorts of data, some of which may need to be nasty. To understand why, consider a chatbot asked how to write secure code, which will offer better replies trained on data related to spotting malicious code and security vulnerabilities.
The attack does so – or did so, for the developers that have fixed their models in response to Microsoft's responsible disclosure – with a simple text prompt that directs the model to revise, rather than abandon, its safety instructions.Alibaba Cloud reveals its datacenter design, homebrew network used for LLM training
Microsoft tried the Skeleton Key attack on the following models: Meta Llama3-70b-instruct , Google Gemini Pro , OpenAI GPT 3.5 Turbo , OpenAI GPT 4o , Mistral Large , Anthropic Claude 3 Opus , and Cohere Commander R Plus .
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Can you solve 'simple' maths puzzle as 'many people' fail to answer correctly?A maths puzzle that asks people to solve a simple sequence has left many scratching their heads - and it turns out the answer is completely different to what you may think
Read more »
Simple medication tweak 'could lower heart attack risk'The research was undertaken in conjunction with Helmholtz Munich and a team of scientists from Italy, the UK and the US
Read more »
The simple DIY fertiliser that will have your lavender blooming beautifullyYou can make a homemade fertiliser for your lavender plants that will have them flowering and smelling incredible this summer season.
Read more »
Spending £2 could get your teen £10k back with this simple hack from financial expertsIn rare cases, a lottery ticket wins big on the Euromillions – or you could pay the same amount and be guaranteed £10k for your teen
Read more »
Cozy Caravan mixes crafting, cute critters, and simple quests for the ultimate relaxing gameKara is an evergreen writer. Having spent three years as a games journalist guiding, reviewing, or generally waffling about the weird and wonderful, she’s more than happy to tell you all about which obscure indie games she’s managed to sink hours into this week.
Read more »
Simple shortbread recipe with just 3 ingredients that's perfect for home bakingA favourite among Scots - and others - for hundreds of years, now you can make it at home.
Read more »