ChatGPT: Is this popular new technology a threat to data privacy
How do AI tools like ChatGPT impact data privacy and records management? Plus, 5 questions boards and executives should be asking about data privacy.
Subscribe to FILED Newsletter
Welcome to FILED Newsletter, our monthly round-up of relevant news, opinion, guidance, and other useful links in the world of data, records and information management.
This month:
- How do AI tools like ChatGPT impact data privacy and records management?
- 5 questions boards and executives should be asking about data privacy.
- Looking for a new developer role? Try the dark web.
If you only read one thing
ChatGPT and the implications for data privacy and records management
Since it landed late last year, ChatGPT, OpenAI’s chatbot that can code, answer queries, and create content has captured the attention of the technology world—and raised a lot of concerns.
A state-of-the-art language model developed by OpenAI, ChatGPT can generate human-like text based on a prompt it has been given. The model was fine-tuned on a diverse range of internet text, enabling it to generate text on a variety of topics with high quality and coherence.
People have used the model to generate essays, screenplays, and novels, debug code, create marketing copy, and anything that involves text. If you dare, search for ‘chatgpt’ on any social network (especially LinkedIn) and you will find yourself inundated with guides to using it to do your work for you, and explaining how to automate away those annoying tasks that are part of your role.
For some more novel use cases:
- One US lawmaker used it to create (admittedly, pretty bland) speeches.
- The model passed law school, though the model was a mediocre student and “only” had a C+ average.
- Inevitably, someone also used ChatGPT to build... ChatGPT.
Naturally, educators are concerned the model could be used for wide-scale cheating, leading to schools around the world banning its use.
There are also concerns about accuracy: this is a natural language model that seeks to answer questions coherently—but not necessarily accurately. It's a confident student who hasn’t actually read the book or done the math homework but can talk their way through it. Would you want a student like that as your child's tutor?
But around here we care about data privacy and records management, so I’d like to discuss the implications of models like ChatGPT for each of these.
A "right to be forgotten", a "right to be correct"
Let's start with data privacy. ChatGPT and similar tools are built to effectively absorb the contents of the internet and then make inferences based on the data they find. This raises the possibility that the model will surface data you would rather have kept private. That home address you mistakenly left on a social profile 10 years ago, the phone number on a personal website you forgot to take offline.
As of right now, there is no way to request the removal of any data about you from the corpus that ChatGPT is absorbing. For sensitive data and personally identifiable information, we have no oversight. Where is this being stored, how is it curated and collated, and what control do we as citizens have?
There is also the risk of false information becoming accepted as fact. The model isn’t particularly opinionated or curious, it will take data at face value, without interrogating them to see whether they are plausible or true.
You could therefore imagine a disinformation campaign to seed the web with false information about an individual or group to “poison” the data set. Or historical allegations, since debunked, still finding their way into ChatGPT’s responses and therefore public opinion.
Due to the GDPR and other privacy regulations, we’ve become familiar with the idea of a “right to be forgotten”—do we need to consider a “right to be correct”? How would that even work?
An annoying older brother
Then we move to records management. Will generative models like ChatGPT help records managers with tasks like retrieving and summarizing data? I think it’s highly likely, but in the short term, there are some issues. Remember, ChatGPT will always provide a very confident and plausible answer, it just might not be correct.
The answers you receive are very generic. It is good at connecting rote information, but once we ask about more complex issues, the technology is not quite there.
For tasks like collating information to respond to a legal discovery request, I could imagine records managers needing to be very careful about how they frame their prompts to avoid being overly generic or overly specific.
ChatGPT is like a very literal-minded assistant, or maybe an annoying and unhelpful older brother. The model won’t be asking any follow-up questions to narrow down what you actually want, and it won’t see the implications of your request. You have to think ahead as to what your request leaves out, or how it could be made more appropriate. You could say the same about earlier tools like search engines, but this model differs in that it projects confidence and expertise. You have to keep in mind its biases.
We’re in the early stages of this technology. While Microsoft has invested in OpenAI and just released a version of Bing that incorporates ChatGPT, Google has called a “code red” and is building its own equivalent, Bard. This AI arms race will lead to more advanced versions which overcome these issues. These technologies are here to stay, so at RecordPoint we’re looking at ways to embrace them to enhance our products and help our customers guarantee data trust.
🤫 Privacy and governance
In the last 12 months, European data protection authorities issued a record-breaking a record €1.65 billion in fines.
A new report suggests that while company boards care about cybersecurity, they are less worried about data privacy.
How data privacy got its own week-long commemoration.
🔐 Security
The United Kingdom’s Royal Mail service is still struggling to get back up to speed following a ransomware attack last month, tentatively attributed to the Russian-speaking LockBit group. The organization recently spun its international tracked service back up.
Forthcoming SEC rules will force corporate boards to disclose “material” security breaches to the SEC and investors within four days, putting significant pressure on companies to take cybersecurity seriously.
US telecom provider T-Mobile disclosed a data breach it said affected 37 million accounts.
A whistleblower has told the US Federal Trade Commission that Twitter continues to violate its security and data privacy obligations, saying too many employees have access to “GodMode”, an internal tool allowing them to tweet as any user. This claim came up in 2020 but at the time the company said the issue had been resolved.
Looking for a developer role? Try the dark web. A new study from Kaspersky suggests there is huge demand for talented developers among hacking rings on the dark web. The roles come with a decent salary, a lot of (moral and logistical) flexibility, and no dress code. Great work-life balance too, providing you aren’t arrested.
📣 The latest from RecordPoint
If and when a data breach happens, we want our customers to be prepared with the full picture of exactly what sensitive data was exposed. That's why we've been working on Intelligence Signaling, a new add-on module that helps you automatically detect at-risk private information across your entire data inventory. Learn more at our product webinar on 22 February (register here).
Learn how one of our customers on the West Coast of the United States, a large municipal organization, used RecordPoint to reduce the time and effort required to remain compliant and guarantee data trust.
Finally, here are the five questions boards and executives should ask about data privacy.
That's all from us this month, I hope you've enjoyed the read. See you next month, or if you don't want to wait that long, just ask ChatGPT to write you a version of your own.