Finding it hard to keep up with this fast-paced industry?
ChatGPT and large language models (LLMs) are growing more powerful and widely used, with significant potential to transform fields like records management. But LLMs also come with risks and drawbacks, which are not as widely understood.
We recently held a webinar focused on these technologies, their strengths and limitations, and their potential applications in the field of records management. We aimed to educate data custodians like records management professionals, who may be called to govern the use of these technologies in their organization.
Did you miss the webinar? Watch the full recording below or on YouTube.
While we held a question-and-answer session at the end of the event, there were simply too many questions to answer in real-time. We decided to publish the full list of questions and their answers here so all can benefit.
Q. Can you describe in more detail how an LLM like ChatGPT can be useful for records management tasks?
ChatGPT and other LLMs are not always a reliable source of information (more on this below). Assuming they are provided verified information, they can be useful in manipulating text, allowing for applications like data enrichment, sentiment analysis, and data classification.
One particularly useful application for LLMs is summarization. The user can provide the LLM with (verified) information, and have the model return a summary of the data. This could allow you to provide the model with thousands of pages of information, and have the model respond with a few paragraphs of information.
Because the primary interaction with these models is via free form chat, and because LLMs can understand context, synonyms and terminology, users can interrogate data more easily, asking specific questions to acquire knowledge that would previously require them to read the full text. Responses based on verified text should still be fact-checked, but should be more reliable.
Q. How can we trust that the information an LLM provides is correct?
One of the significant risks of using a large language model like ChatGPT is that the model may provide incorrect information. LLMs are built to provide the most likely answer for a given input, but they do not care about objective truth. If a model does not know the answer to a given query, it will invent one it thinks you want. When this happens, the erroneous information is referred to as a hallucination, increasing the risks of using the models for research or as a basis for business decisions.
To combat this risk, you need to ground any queries to the language model in truth. Don't expect anything it tells you based on its training to be true or accurate. Find material from a trusted source (e.g., your records corpus or Google Scholar) containing the information you need and use the LLM to interrogate it. And then fact-check the answer.
Remember: LLMs are not search engines; they are language models.
Q. How do we protect the confidentiality of our data?
Many in the audience were rightly concerned about the privacy and confidentiality risks of any data they provided to an LLM. Here, as is often the case, it depends on which model you are using and whether you are a paying customer.
A free/consumer ChatGPT account gives OpenAI a license to use your data however they want, but they can't use your data if you are using the offering through Microsoft Azure. You should check with any third-party apps using an LLM under the covers to understand their agreement with the AI vendors and their policies.
Q. Will ChatGPT own the information relating to our information management systems and records?
They won't own it, but if you access ChatGPT with a consumer plan, they may use it to train their models. If you access ChatGPT through a business subscription (or through Microsoft Azure), your data will remain secure and cannot be used for training.
Q. Is there a data sovereignty risk for using an LLM?
Again, this depends on what kind of service you are using. A free/consumer ChatGPT account gives you no control over data sovereignty or confidentiality. Still, if you access it through a Microsoft Azure account, you can configure where the data is kept and other expected security/confidentiality concerns.
Q. What protections/procedures do these vendors have in place relating to data breaches and cyber security?
These will be the same as any third-party vendor. If using an offering hosted by Microsoft, you can expect all of Microsoft's usual security controls will be in place.
Q. Can an AI scan our data storage environments and learn what is a record and what is not?
In short, yes, but you may not need an LLM. A platform like RecordPoint can scan an environment and analyze your records to extract signals from your data, which will help to determine what is a record and what is redundant, obsolete, or trivial data (ROT). You can get quality results by looking at metadata, such as the file size and format. But an LLM like ChatGPT can help you achieve more sophisticated results. It depends very much on how your organization defines a record.
Thank you for your questions
Given the volume of questions and the turn-out to the webinar, the records and information governance industry is very interested in the world of LLMs and ChatGPT. We will continue to cover this industry in detail, but we also have more resources on the subject:
- For more coverage on the intersection of ChatGPT and records management, read our blog post.
- For more on the privacy implications of the technology, see our coverage in the FILED Newsletter.
ChatGPT: Is this popular new technology a threat to privacy?
In this month's FILED Newsletter, how do AI tools like ChatGPT impact privacy? 5 questions boards and execs should ask about data privacy. Looking for a new dev role? Try the dark web.
ML 101: How machine learning powers RecordPoint’s Classification Intelligence
Learn more about machine learning and AI, and how this technology powers modern records management solutions like RecordPoint’s Classification Intelligence.