Latest News

2 minute read

UK privacy watchdog to examine practice of web scraping to get training data for AI

January 15, 2024

Britain’s data protection regulator, the Information Commissioner’s Office (ICO), is scrutinizing the legality of web scraping to collect data to train generative AI models.

It announced on Monday the first consultation in a series focusing on generative AI models — the tools that create text or images based on a prompt after being trained on enormous datasets of similar media.

The collection of this training data can pose challenges under privacy laws due to the risk of collecting personal data, particularly because such collection is almost always automated due to scale.

Research papers have uncovered ways to extract training data from large language models (LLMs), potentially exposing personal information. The National Cyber Security Centre also has warned prompt injection attacks could potentially be a fundamental flaw for all such AI tools, by allowing attackers to access otherwise protected LLM data.

While there are concerns about web scraping infringing on intellectual property or contract law, the ICO’s consultations will be focusing on data protection standards.

“Based on current practices, five of the six lawful bases [for processing data under British laws] are unlikely to be available for training generative AI on web-scraped data,” wrote the ICO.

The only remaining lawful basis under the U.K. GDPR — legitimate interests — requires the entity doing the training to undertake a variety of actions, including assessing the balance between individuals’ rights to have their data handled safely and the necessity for web scraping for most generative AI training.

“We invite all stakeholders with an interest in generative AI to respond and help inform our positions. This includes developers and users of generative AI, legal advisors and consultants working in this area, civil society groups and other public bodies with an interest in generative AI,” the ICO stated.

Government Technology Briefs Privacy

Get more insights with the

Recorded Future

Intelligence Cloud.

Learn more.

No previous article

No new articles

Alexander Martin is the UK Editor for Recorded Future News. He was previously a technology reporter for Sky News and is also a fellow at the European Cyber Conflict Research Initiative.

British Library restores access to online collection following ransomware attack

January 15, 2024

Ransomware gang demands €10 million after attacking Spanish council

January 16, 2024

Hand-Picked Top-Read Stories

LockBit Developer Rostislav Panev Charged for Billions in Global Ransomware Damages

Judge rules NSO Group is liable for spyware hacks targeting 1,400 WhatsApp user devices

Nearly 6 million people were impacted by ransomware attack on Ascension Health

Trending Tags

UK privacy watchdog to examine practice of web scraping to get training data for AI

Previous Post

British Library restores access to online collection following ransomware attack

Next Post

Ransomware gang demands €10 million after attacking Spanish council

LockBit Developer Rostislav Panev Charged for Billions in Global Ransomware Damages

Judge rules NSO Group is liable for spyware hacks targeting 1,400 WhatsApp user devices

Nearly 6 million people were impacted by ransomware attack on Ascension Health

Romanian Netwalker ransomware affiliate sentenced to 20 years in US prison

US unseals complaint against Russian-Israeli accused of working for LockBit

UK privacy watchdog to examine practice of web scraping to get training data for AI

Previous Post

Next Post

Related Posts