Government seeks to replicate attacks designed to use AI in ‘developing chemical weapons or election interference’

Sam Trendall October 25, 2024 in Defence and Security, Highlights, News - 3 Minutes

The AI Safety Institute created last year has signed a £500k deal with a tech supplier to help model ‘new attacks against frontier AI systems and mitigations that defang them’

The UK’s AI Safety Institute aims to replicate possible cyberattacks in which bad actors could use generative AI to help create chemical weapons or disrupt democratic processes.

AISI – which was created last year and sits within the Department for Science, Innovation and Technology – recently awarded a near-£500,000 deal to specialist tech firm Pattern Labs. The supplier has been retained to help the institute better understand “new attacks against frontier AI systems – and mitigations that defang them”, according to the text of the contract.

During the coming months, this will involve the delivery of “code providing novel attacks and corresponding protections, and computational experiments evaluating them”. The attacks developed to conduct these experiments should be “representative of attacks likely to be seen in the real world”.

This is likely to include “direct prompt injection, indirect prompt injection, training data poisoning, adversarial fine-tuning”, while the so-called ‘frontier’ AI programmes in question will include the OpenAI system behind ChatGPT, Google Gemini – formerly Bard – and the Llama 2 model developed by Facebook-owner Meta.

The aim of the engagement is to help replicate the kind of “attacks that enable an attacker to get assistance from an advanced AI system in executing harmful tasks”.

Such tasks include those that come with the gravest of consequences, including “providing guidance or automatically executing activities that break the law, such as developing chemical or biological weapons, cybercrime, theft, fraud, and election interference”.

More common and less severe attacks, such as those designed to interrupt services or steal user data, are not part of this research exercise.

Related content

As well as the code that could mirror potential attacks – and the defences that could be deployed against them – Pattern Labs will be tasked with creating “data sets consisting of misuse tasks that frontier AI systems should refuse to execute, for example requests to generate hate speech, write malicious code, create media that incites violence, or synthesise illegal imagery”.

Over the course of an initial six-month contract that commenced on 17 September, the IT firm will be tasked with “providing an R&D team that works collaboratively with an AISI researcher whom they meet with once a week”, according to the contract.

“The supplier’s delivery approach should scope precise requirements with AISI’s safeguards analysis researchers at the project start and subsequently providing a steady stream of relevant output,” the document adds.

During the first month of the agreement, the contract will require representatives of the tech provider to “undertake scoping in consultation with AISI safeguards researchers… [and], based on this scoping, they will develop a plan outlining to AISI their methodology for attack discovery and their approach to continuously delivering attacks, protections, and experimental evaluations”.

Throughout the remaining five months – plus a possible three-month extension – “the supplier will continuously deliver advice/reports/code/data sets for red-teaming AI systems, meeting with an AISI researcher once per week to update them on progress and receive feedback”.

US-headquartered Pattern Labs – which, according to the company’s LinkedIn profile, is still “operating in stealth mode” – is focused on helping to create security systems for emerging tech tools.

“As AI and other advanced technologies become more powerful, so must our ability to protect them,” the firm’s website says. “We enable labs to push the frontier of technology by giving them the confidence that their work isn’t being misused or stolen.”

AISI, meanwhile, was unveiled a year ago and, in January 2024, opened recruitment for about 30 posts, including a range of technical experts as well as senior management leaders. The DSIT unit, which in April signed a cooperation agreement with its US counterpart, describes its mission as being to “equip governments with an empirical understanding of the safety of advanced AI systems”.

The team’s deal with Pattern Labs is valued at almost £460,000.

Sam Trendall

Learn More →

PublicTechnology

Government seeks to replicate attacks designed to use AI in ‘developing chemical weapons or election interference’

The AI Safety Institute created last year has signed a £500k deal with a tech supplier to help model ‘new attacks against frontier AI systems and mitigations that defang them’

Related content

Sam Trendall

Transparency, trust and an uncertain future for GDS: Five public sector tech trends to look out for in 2025

DWP chief expects to ‘go further and faster in how we use technology to design and deliver services’

Government comms head on AI assistance and using digital to ‘reach citizens where they are’

Surveillance, cyber and standardisation – PublicTechnology’s most read stories of 2024

DSIT boss Munby embraces ‘difficult problems and brilliant teams’ after digital expansion

HMRC head Harra on service shortfalls and plans for a new digital roadmap

The AI Safety Institute created last year has signed a £500k deal with a tech supplier to help model ‘new attacks against frontier AI systems and mitigations that defang them’

Related content

Related Posts

Sam Trendall