Why data minimization matters
In a world of cheap, effectively infinite cloud storage, it can feel like there is no need for the delete key. After all, we can keep everything forever, pay next to nothing per item, and never worry about it again, right? Well, no.
Data has costs that go beyond the infrastructure bill and retaining it forever risks impacting your organization in unexpected ways. The cost/benefit calculation isn’t as straightforward as you might think.
In this post, we will look at the hidden (or not so hidden) costs of retaining data, and how a focus on data minimization can not only reduce them but enable your team to achieve more with the data you do decide to keep. But first, let’s look at a taxonomy of ROT: the data we don’t wish to retain.
What is ROT?
Redundant, Obsolete or Trivial data is a set of classifications for data that organizations don’t need but continue to retain. This data naturally accumulates over time in the course of normal business operations. Employees create ROT when they save multiple copies of the same information (Redundant data), retain out-of-date information (Obsolete data), or save irrelevant or personal information to their work devices or drives (Trivial data).
The argument for ‘keeping everything’ often revolves around the perception that data should be retained because, ‘we might need that one day’. The reality however is that keeping information too long can raise many types of risks to your business:
- Regulatory risks – Most jurisdictions with privacy regulations have a provision that personal information is not to be kept any longer than is necessary for the purpose it was obtained. Keeping personal information on your system ‘just in case’ is not a defense here and can cause significant financial penalties and reputational damage.
- Security risks – Keeping outdated file formats can open the organization to additional security risks. As those formats are no longer maintained or patched, they may provide an opportunity for malicious actors, or open the organization up to more general system failures.
- Process risks – The more content you have, the more inefficient your business becomes and the greater the risk that you are basing decisions on outdated or incorrect information. This can cause legal and financial damage, bringing accompanying reputational damage.
We often think of cost as a relatively simple $X per TB ratio. However, when calculating the true cost of storage, you also need to look at other dimensions, including:
- Replication costs. Some organizations have clear requirements to replicate data across regions, typically to enable higher availability or disaster recovery. If you are replicating critical datasets across different geographical regions, you are effectively duplicating the amount of data stored. You should expect to incur additional costs as a result, possibly up to 100%.
- Transfer or Egress costs. If you transfer data from your storage provider either as part of a business process (where a dataset is sent to another storage provider) or as part of a future migration of all data, you can incur additional costs.
- Management costs. These could be charged by your provider or indirect costs that your team incur as they action the tasks. It covers a broad range of items such as transferring data across tiers of storage (for better cost effectiveness), cloud monitoring to ensure data integrity, security activities (encryption, penetration testing, security architecture changes etc.)
While you may be paying $0.023 per TB of data, that number may pale in comparison to the direct and indirect costs above. Worse, this investment may be for content that you do not even need to keep and which, in its keeping, could cause your organization harm.
Overcoming the fear of hitting delete
So, what do you do?
At the core of the solution is developing a better understanding of your data, separating the wheat from the chaff. Only once you do that can you appropriately minimize the amount of data you are keeping on an ongoing basis. You need to remove the ROT.
A guide to removing ROT
Now you understand the issues ROT can cause, you can begin the process of removing it. Below is a high-level guide to removing ROT so your organization can begin achieving its business objectives. Understand your data. What do you have across your data corpus? What formats, date ranges exist? Can you determine the purpose/process it was collected for?
- Understand its value. Not all data that is ‘old’ is without value. But you may have a significant quantity of data that has no/negligible value. What does value mean to your organization, how do you measure it?
- Understand its risk. Different data sets have different levels of risk; personal information for example may carry higher levels of risks depending on your jurisdiction. It is important to map the risk level to the data set and determine the acceptable level of risks in your context.
- Profile and action. With the information assessed against a risk/value matrix you will be better able to profile the information and then action accordingly. Get rid of the information that you don’t need to keep any longer and that could cause undue risk to the organization.
Once you have the profile completed you will have a systematic set of rules against which you can analyze your various data shares and start to clean up your data corpus by disposing of data that is not required. This will save you storage space and budget, while making your team members’ lives easier. They won’t have to wade through pages of irrelevant and outdated search results, saving them valuable time and effort.
Remember, as we automate processes it is vital that the most accurate, up-to-date information is used within those processes. Having too much ROT can seriously hamper this goal, and cause harm, as decisions are made based on the incorrect information.
Introducing File Analysis
If you’re worried about the impact of ROT on your organization, but you lack the time and resources to conduct the review manually, RecordPoint could be the solution. As part of our commitment to high quality information governance outcomes we have developed File Analysis. File Analysis is a simple and efficient way to discover what's in your file share data, identify high value (and low value) data and give you the intelligence to understand what you can do with it next. Empowered by your customized File Analysis report, your organization can make cost saving, risk reducing and quality migration decisions about that data.
File Analysis ensures only the right information is retained, with ROT data clearly identified for easy deletion.
Why create a data disposition strategy?
Once a record’s retention period ends, an organization must dispose of it. By following a retention and disposition policy, organizations can reduce the amount of data in their possession. There is no exposure risk for data you don't have in your system.
ML 101: How machine learning powers RecordPoint’s Classification Intelligence
Learn more about machine learning and AI, and how this technology powers modern records management solutions like Records365’s Classification Intelligence.