Generative AI will offer up identifiable data if you ask nicely

Concerning new research shows how easy it is to get AI to produce identifiable data, raising privacy risks. Plus all the latest in data privacy, security, and governance.

Anthony Woodward

Founder/CEO

September 15, 2023

Get your monthly round-up of the latest news and views at the intersection of data privacy, data security, and governance.

Subscribe to FILED Newsletter

Get your monthly round-up of the latest news and views at the intersection of data privacy, data security, and governance.

Subscribe Now

Welcome to FILED Newsletter, your round-up of the latest news and views at the intersection of data privacy, data security, and governance.

This month:

Mozilla says your car may have access to your medical history.
Australia had 409 data breach notifications over the first half of 2023.
Is the group taking down Charles de Gaulle Airport’s website hacktivists or Russian cyber criminals?

But first, generative AI platforms have loose lips.

If you only read one thing:

With generative AI, you get out what you put in

Do you struggle to warn your workmates about the risks of inputting confidential information into generative AI tools like ChatGPT? You may have a little more ammunition following recent revelations from researchers.

Researchers from Google, DeepMind, UC Berkeley, ETH Zürich, and Princeton have demonstrated how easy it is to extract identifiable data—real people’s images—from image generation models, potentially violating people’s privacy.

The team prompted the image generation models Stable Diffusion and Google’s Imagen with captions for images, such as a person’s name, many times. Then, they analyzed whether any images they generated matched the original images in the model’s database. The group managed to extract more than 100 replicas of images in the AI's training set.

The researchers also managed to coax the models into outputting exact copies of medical images and copyrighted work for artists, suggesting these images also formed part of the model’s training data.

As well as providing valuable data for the many artists currently suing AI companies for copyright violations and the regulators seeking to punish them for privacy infringements, these results are another warning for individuals and organizations not to input confidential or sensitive information into these models.

Govern LLM usage to prevent privacy leaks

We’ve been discussing generative AI in the context of images. But while a similar result has not yet been demonstrated for large language models (LLMs) like ChatGPT, do we really need to wait? Why take that chance?

A free/consumer ChatGPT account gives OpenAI a license to use your data however they like. Suppose someone in your organization uses such an account and inputs confidential business data or sensitive customer information, and this data makes its way into the training set. In that case, this new research suggests it can be extracted by bad actors later.

If you’re responsible for records, information governance, or data privacy, you need to take control of the usage of such tools within your organization. The message to your team should be that any data you provide in a prompt could be used to train the model and extracted by others, either researchers or bad actors.

People must not input confidential or sensitive customer information into these models. We just don’t know how the data will be retained in the model and how it could be resurfaced later. If people need to input data into the model to complete their tasks, consider how it could be sanitized or de-identified before use.

This isn’t the only potential security hole in these models. The Open Worldwide Application Security Project (OWASP) recently released a list of the top 10 vulnerabilities for LLMs, which includes, among them, “insecure output handling,” where the output of an LLM is accepted without question, leading to security breaches, all the way to “model theft,” where unauthorized access leads to a model being copied.

Another vulnerability is hypnotization. Researchers at IBM have demonstrated they can "hypnotize" an LLM and get it to leak confidential financial information, create malicious code, and offer weak security recommendations. It is quite a bizarre attack to read, and one feels a little sorry for the hapless LLM.

Privacy & governance

The US Senate is bringing in AI leaders to discuss regulation for the technology, with a series of “listening sessions” scheduled for this month, starting with Elon Musk, Sundar Pichai, Sam Altman, and Satya Nadella. That should fix it.

The-Social-Network-Formerly-Known-as-Twitter wants permission to start collecting users’ biometric information and employment history. The former is for safety, security, and identification purposes, and the latter is for the platform’s nascent job application features, presumably.

According to new rules, almost 100 Australian government agencies will need a chief information security officer (CISO).

Modern cars are terrible for privacy, according to Mozilla. A study by the organization found that 84% of car companies review, share, or sell data collected from car owners, including six companies that can collect intimate information such as genetic data, medical history, and sexual activity.

Security

There were 409 data breach notifications in Australia over the first half of 2023, according to the Office of the Australian Information Commissioner (OAIC). This result is down 16% on the previous six months but slightly up on the same period a year ago, with 404 notifications. Typically, the second half of the year features more breaches, so buckle up, everyone.

Elsewhere in the Notifiable Data Breaches Report, the OAIC is tired of excuses for delaying data breach disclosure. Delays of more than 30 days were becoming more frequent, and in some cases, the description of the data accessed was deficient.

An apparent hacktivism campaign that has taken the website for Charles de Gaulle Airport in Paris offline may be a campaign by Russian cybercriminals seeking to destabilize French society.

The latest from RecordPoint

Read:

What does the New York City records commissioner think about AI? Read? Listen to the podcast if you haven’t already.

If you missed our discussion of ChatGPT’s use for records management, you can watch the recording while you read some of the questions and answers we received during the session.

Listen:

Truescope co-founder and CEO John Croll joined Kris Brown and me on the latest episode of FILED. He says generative AI platforms like ChatGPT may accelerate the spread of misinformation, impacting public opinion and company reputations and making the job of a modern communications professional more complex.

Generative AI will offer up identifiable data if you ask nicely

Subscribe to FILED Newsletter

If you only read one thing:

With generative AI, you get out what you put in

Govern LLM usage to prevent privacy leaks

Privacy & governance

Security

The latest from RecordPoint

Get hooked on FILED

Platform

Solutions

Resources

Company