Customer Case Studies

AI for Digital Selection – TNA Evaluates Records365 Classification Intelligence

RecordPoint invited to be part of the AI for Digital Selection project focused on duplicate detection, entity extraction, and classification.

Helping secure 1,000 years of history

The National Archives (TNA) is the official archive and publisher for the UK government, and for England and Wales, holding official records containing 1,000 years of history. Their role is to collect and secure the future of the government record, both digital and physical, to preserve it for generations to come, and to make it as accessible and available as possible.

TNA holds over 11 million historical and government records, houses approximately 550 staff and currently welcomes approximately 80,000 visitors per year.

A significant role of TNA is the accessioning into collection records from across government. All government departments are required to pass records to TNA for future preservation. Until recently, records were on paper, however digital and borndigital records are becoming a greater proportion of the record set, and will eventually all but replace paper.

TNA recognizes the future challenges and that managing the classification and preservation of records will require the use of artificial intelligence.

In order to help TNA to better understand solutions that can increase TNA’s depth of capability in leveraging artificial intelligence (AI) tools to appraise and select data for permanent preservation, RecordPoint has been invited to be part of the AI for Digital Selection project with its Records365 service along with Classification Intelligence.

TNA holds over 11 million historical and government records, houses approximately 550 staff and currently welcomes approximately 80,000 visitors per year.

RecordPoint’s layer of Intelligence appraises the value and risk of high volumes of information

Records365 is a cloud-based software-as-a-service platform that can connect to multiple content sources to enable organizations to apply federated governance across all their information, regardless of where it lives.

To help customers like TNA on the challenges they are facing as part of their digital transformation journey, RecordPoint is committed to bring customers continuous innovation by delivering solutions that are:

  • Centralize content from all sources, and insights made visible in easy user-friendly dashboards,
  • Intelligent, empowering organizations to realize lower costs and expenses through efficiency improvements,
  • Secure and compliant, enabling full regulatory and compliance and data security.

As part of the project, TNA has provided RecordPoint with samples of labelled and unlabeled data that we have used to demonstrate the Records365’s machine learning capabilities and increase TNA’s understanding on how to leverage AI using the following approach:

Load Retention Schedule: Using the retention schedules spreadsheet provided, we loaded each disposal class and retention schedule into the Records365 global File Plan.

Create Rules for Labelled Dataset: In order to automatically assign a disposal class and retention schedule in Records365 to the labelled data, a set of declarative rules were created in the Records365 rules tree that mapped each document to a specific disposal class using its metadata.

Import Labelled Dataset: Since the data was provided on a hard drive, for the scope of this project we have decided to load the labelled dataset from a windows file share using the Records365 FileConnect connector. Once the connector was enabled and the documents were added to the file share, FileConnect looked for redundant/obsolete/trivial (ROT) documents. The FileConnect ROTBot performed deduplication, enriched the document with additional metadata and automatically submitted them to Records365. Once processed by the Intelligent Processing Engine, each document was classified according to the rules previously created.

Train Model on Labelled Dataset: The Records365 Classification Intelligence capabilities have been designed to be used by compliance and record management teams without requiring the involvement of a data scientist. The model was trained by simply selecting the different disposal classes on the file plan with enough data samples. The rest of the processing was automatically handled by Records365 without requiring user intervention.

Apply ML to Unlabeled Dataset: Once the model was trained, we proceeded at submitting to Records365 the unlabeled dataset using the same Windows file share and FileConnect Connector previously mentioned. Once again as the content was added to the file share, the FileConnect ROTBot performed deduplication and named entity extraction to enrich the context to the document to be used for e-Discovery purposes. Once received by Records365 the Intelligent Policy Engine applied the Machine Learning model to each of the unlabeled documents to suggest a relevant category. After that, the Records Management team is still fully in control on making a final decision and can review the suggestions made by accepting or correcting it. This feedback loop is then used to improve the model over time.

...managing the classification and preservation of records will require the use of artificial intelligence.

Key observations and findings

As the outcome of the experiments undertaken during this project the following key results and findings were determined:

  • Identified candidate records for permanent preservation
  • Detected duplicates for disposition
  • Overall training accuracy of 74.5%, and test accuracy of 71.8%
  • Extracted entities: organizations, geopolitical entities, people
  • File analysis: content size summary, age summary

 

Future research and development

In addition to the Intelligent capabilities available in Records365 today, RecordPoint is making big additional investments in the AI space. We understand that organizations still struggle to control their information and make meaningful business decisions due to the out-of-control number of content sources that they are dealing with on a day-to-day basis which contain structured, semi-structured and unstructured content.

Some of the capabilities that customers can expect to see in Records365 in the future are:

  • Context Enrichment
  • Multi-Model Appraisal
  • Unsupervised Learning
  • Searchable Knowledge Graph
  • Multi-Dimensional Appraisal
  • Language Models
  • AI-driven Content Analytics
  • Intelligent Connectors
  • AI based Risk & Value Scoring

We believe that machine learning capabilities will be at the core of helping organizations to reduce their current risk and to make better decisions faster and to do so, those capabilities need to be explainable and easy to use by regular users.

...machine learning capabilities will be at the core of helping organizations to reduce their current risk and to make better decisions.
Information Intelligence

Does your organization need information intelligence?

Let’s explore how RecordPoint can help your organization increase compliance, reduce risk, strengthen data security, and improve organizational efficiencies. Contact us today to set up your free personalized RecordPoint demo.
Praesent non dolor ut facilisis vel, risus. mattis amet, elit.