Inside the British Lab Exploring A.I. Risks

2 weeks ago 0

Along Parliament Square in London, the A.I. Security Institute is taking a lead in examining the threats posed by artificial intelligence. Staffed by a mix of weapons inspectors, epidemiologists, and code breakers, the institute serves as a model for addressing the emerging risks of A.I.

On a recent Tuesday in an Edwardian government building, four A.I. experts challenged the capabilities of an A.I. chatbot by attempting to trick it into revealing instructions for making the deadly bioweapon anthrax. Despite the chatbot’s refusal to comply—responding with, “I’m sorry I can’t help with that”—the experts employed a custom algorithm to inundate the A.I. with thousands of prompts and questions.

Ultimately, the A.I. succumbed, providing a detailed list of materials and equipment, and a recipe for the dangerous mixture. For safety, the name of the A.I. system remains undisclosed. According to Xander Davies, a 25-year-old American leading the red team at the institute, “There are some questions that you definitely don’t want the model to give the answer to. We try really hard to get the answers out.”

Mr. Davies and his team, who simulate attacks on A.I. systems, have also succeeded in breaching safeguards on OpenAI’s latest ChatGPT, persuading it to share hacking tips in around six hours. After identifying vulnerabilities, they discuss findings with the companies involved.

Davies explained, “They try to fix it, report something back to us.” The companies work alongside the institute to improve system security. Davies, a computer scientist and Harvard graduate, chose to work at the institute over a tech position in San Francisco.