As the official archive and publisher for the UK government, England and Wales, The National Archives (TNA) holds official records containing 1,000 years of history. While this in itself is impressive, the rate at which digital records are growing means the UK government is today already facing challenges to meet its Public Records Act requirements, let alone ensuring proper solutions are in place to meet the next data-rich millennium.
Realising that artificial intelligence (AI) is key in helping to secure the future of the government record, TNA set out to learn more about how AI and machine learning (ML) could assist with digital transformation in government, and more specifically help overcome limitations of traditional records management practices originally designed for paper records.
In 2020 it launched the UK National Archives Research project, inviting RecordPoint as one of five key AI vendors – together with AWS, Microsoft, Adlib and Iron Mountain – to participate in the process which would ultimately give an opportunity to apply tools to a dataset provided by TNA.
The ‘AI for Selection’ project begins
The project began with each vendor creating a report providing an overview of their product and capabilities, with RecordPoint’s Records365 being one of two ‘off-the-shelf’ solutions to be evaluated.
Records365, a cloud-based software-as-a-service federated data management platform, uses our AI and ML–powered Intelligence Engine to classify data, helping to organise and analyse content with greater accuracy. It’s also been designed to be used by compliance and records management teams without the need for a data scientist.
For the second phase of the project, TNA worked with each vendor to implement their chosen product and use ML toolkits to classify a sample dataset using TNA’s own corporate data, comprised of 110,882 files, 12,462 folders, and 44.1GB in various formats, including emails, PDFs and Microsoft Office files.
The Records365 model was trained on the labelled dataset and for final testing ML was applied to the unlabelled dataset, with a feedback loop allowing for manual review, helping to improve the model over time.
The results are in with good news for RecordPoint
We were able to deliver a proof of concept using our standard Records365 product, providing an overall training accuracy of 74.5% and a test accuracy of 71.8%.
In TNA’s final project report ‘Using AI for Digital Records Selection in Government’, the results show our AI capabilities were on par, if not better than some of the industry’s biggest players. Along with Adlib, RecordPoint was the only other vendor to support all the features in TNA’s ‘GUI and deployment’ section.
And while TNA’s aim was to demonstrate the range of functionality available, not just evaluate product performance, RecordPoint’s recognition was certainly encouraging especially in helping to position Records365’s ease of use for non-technical users.
The report also clearly articulated the difference between a built for purpose ML/AI product like Records365 and more powerful bespoke platforms, that can provide a similar outcome, but require much more specialist skills and ongoing maintenance – certainly important considerations for any records management team.
For organisations that don’t already have data scientists and AI/ML developers, an existing ‘off-the-shelf’ records management tool could potentially provide the same outcomes without having to engage costly consultants or hire additional employees.
Looking at the current state of AI/ML for records management
While AI and ML applications have delivered significant advances for other industries embracing digital transformation, it seems that records management has largely been left behind.
The most pressing issue is that record managers are already struggling to meet their basic compliance needs, with huge volumes of electronic records growing exponentially. And this is a problem that Records365 is already trying to solve, by providing the ability to better manage these obligations.
Secondly, if record managers can’t extract real value from their data, and analyse it correctly, it really does little to help users.
At RecordPoint we remain excited by the raft of new applications available, allowing for more intricate reporting and forensic analysis tasks. Technology such as natural language processing can offer huge opportunities, with applications across machine translation, question answering and image recognition.
As we continue to invest more resources into AI our teams are looking to extend our capabilities across context enrichment, multi-model appraisal, AI-based risk & value scoring and unsupervised learning. We hope this delivers huge benefits for the industry, offering record managers greater control over content and meeting operational compliance and governance requirements, while ultimately reducing risk.
Adding greater functionality to Records365
Since TNA’s final report was published, we’ve added new functionality to Records365.
By taking the lessons learned during the research phase of the project we’ve been able to apply this knowledge back to ongoing research and development, improving the user experience around ML and providing further transparency about what the product is doing.
We’ve also improved file analysis and the ability to mark record categories as trusted, so that they can be applied to incoming records automatically, further reducing human intervention.