TNA evaluates Classification Intelligence

The AI for Digital Selection project focuses on duplicate detection, entity extraction, and classification.

The National Archives

TNA are a non-ministerial department, and the official archive and publisher for the UK Government, and for England and Wales. They are the guardians of over 1,000 years of iconic national documents, from Shakespeare’s will to tweets from Downing Street, to preserve it for generations to come.

Location

Richmond, UK

Industry

Government stories

Helping secure 1,000 years of history

The National Archives (TNA) is the official archive and publisher for the UK government, and for England and Wales, holding official records containing 1,000 years of history. Its role is to collect and secure the future of the government record, both digital and physical, to preserve it for generations to come, and to make it as accessible and available as possible.

TNA holds over 11 million historical and government records, houses approximately 550 staff and currently welcomes approximately 80,000 visitors per year.

A significant role of TNA is the accessioning into collection records from across government. All government departments are required to pass records to TNA for future preservation. Until recently, records were on paper, however digital and born-digital records are becoming a greater proportion of the record set, and will eventually all but replace paper.

TNA recognizes future challenges and that managing the classification and preservation of records will require the use of artificial intelligence (AI).

In order to help TNA to better understand solutions that can increase its capabilities in leveraging artificial intelligence tools to appraise and select data for permanent preservation, RecordPoint has been invited to be part of the AI for Digital Selection project with its RecordPoint service along with Classification Intelligence.

TNA holds over 11 million historical and government records, houses approximately 550 staff and currently welcomes approximately 80,000 visitors per year.

RecordPoint’s layer of intelligence appraises the value and risk of high volumes of information

RecordPoint is a cloud-based SaaS platform that can connect to multiple content sources to enable organizations to apply federated governance across all their information, regardless of where it lives.

To help customers like TNA with the challenges faced as part of a digital transformation journey, RecordPoint is committed to bringing customers continuous innovation by delivering solutions that:

Centralize content from all sources, and make insights visible in easy user-friendly dashboards
Help intelligent, empowered organizations to realize lower costs and expenses through efficiency improvements
Are secure and compliant, enabling full regulatory compliance and data security.

As part of the project, TNA has provided RecordPoint with samples of labelled and unlabeled data that we have used to demonstrate the RecordPoint machine learning (ML) capabilities and increase TNA’s understanding of how to leverage AI using the following approach:

Load Retention Schedule: Using the retention schedules spreadsheet provided, we loaded each disposal class and retention schedule into the RecordPoint global File Plan.

Create Rules for Labelled Dataset: In order to automatically assign a disposal class and retention schedule in RecordPoint for the labelled data, a set of declarative rules were created in the RecordPoint rules tree that mapped each document to a specific disposal class using its metadata.

Import Labelled Dataset: Since the data was provided on a hard drive, for the scope of this project we have decided to load the labelled dataset from a Windows file share using the RecordPoints FileConnect connector. Once the connector was enabled and the documents were added to the file share, FileConnect looked for redundant, obsolete, and trivial (ROT) documents. The FileConnect ROTBot performed deduplication, enriched documents with additional metadata and automatically submitted them to RecordPoint. Once processed by the Intelligence Engine, each document was classified according to rules previously created.

Train Model on Labelled Dataset: The RecordPoint Classification Intelligence capabilities were designed to be used by compliance and records management teams without requiring the involvement of a data scientist. The model was trained by simply selecting the different disposal classes on the FilePlan with enough data samples. The rest of the processing is automatically handled by RecordPoint without requiring user intervention.

Apply ML to Unlabeled Dataset: Once the model was trained, we submitted the unlabeled datasheet to RecordPoint using the same Windows file share and FileConnect Connector previously mentioned. Once again, as the content was added to the file share the FileConnect ROTBot performed deduplication and named entity extraction to enrich the context to the document to be used for e-discovery. Once received by RecordPoint, the Intelligence Engine applied the machine learning model to each of the unlabeled documents to suggest a relevant category. After that, the Records Management team is still fully in control to make final decisions and can review the suggestions made by accepting or correcting them. This feedback loop is then used to improve the model over time.

... managing the classification and preservation of records will require the use of artificial intelligence.

Key observations and findings

As the outcome of the experiments undertaken during this project, the following key results and findings were determined:

Identified candidate records for permanent preservation
Detected duplicates for disposition
Overall training accuracy of 74.5%; test accuracy of 71.8%
Extracted entities: organizations, geopolitical entities, people
File analysis: content size summary, age summary

Future research and development

In addition to the intelligent capabilities available in RecordPoint today, we are making big additional investments in the AI space. We understand that organizations still struggle to control their information and make meaningful business decisions due to the out-of-control number of content sources that they are dealing with on a day-to-day basis which contain structured, semi-structured, and unstructured content.

Some of the capabilities that customers can expect to see in RecordPoint in the future are:

Context enrichment
Multi-model appraisal
Unsupervised learning
Searchable knowledge graph
Multi-dimensional appraisal
Language models
AI-driven content analytics
Intelligent connectors
AI-based risk and value scoring

We believe that machine learning capabilities will be at the core of helping organizations to reduce their current risk and make better decisions faster. To do so, those capabilities need to be explainable and easy to use.

Discover Connectors

View our expanded range of available Connectors, including popular SaaS platforms, such as Salesforce, Workday, Zendesk, SAP, and many more.

Explore the platform

This is a navy blue box with a pink geometric shape.

This is a navy blue box with a yellow and light blue geometric shape.

TNA evaluates Classification Intelligence

The National Archives

Helping secure 1,000 years of history

RecordPoint’s layer of intelligence appraises the value and risk of high volumes of information

Key observations and findings

Future research and development

Discover Connectors

Platform

Solutions

Resources

Company