Understanding your data: the ultimate guide
Explore the guide
Understand your data
This year, 147 zettabytes of global data will be produced, according to some forecasts, with 181 zettabytes predicted to be produced globally by the end of 2025. This surge in data creation presents both daunting challenges and transformative opportunities for businesses.
Indeed, the trend shows no sign of slowing down. If anything, the rise of generative AI platforms suggests that we are on the cusp of a further explosion in the amount of data created daily, as technology enables the rapid creation of content for use in business processes.
But as more data is produced, organizations are failing to understand what they have so they can protect it. Sifting through that data is vital for organizations looking to limit their risk, ensure they’re secure against cyber threats, and ensure they make the best business decisions. Who will ensure that data is protected, and how will they do so?
It’s clear organizations are struggling with this challenge. With the volume of data they possess increasing, it’s become impossible to examine each file to determine its level of risk: whether it contains sensitive information, whether it can be safely removed, and whether the overall data estate is compliant with regulations and secure against threats.
The rise of data privacy laws continues apace
Since the EU’s passage of the General Data Protection Regulation (GDPR) in 2016, more jurisdictions have passed privacy laws, each placing a requirement on companies to know where their data is stored, and to understand their data. This understanding enables them to follow data minimization rules by removing data once it’s no longer required. It also helps with ensuring data accuracy and responding to both Data Subject Access Requests (DSAR), and requests for data to be removed (also known as “the right to be forgotten”).
While the United States federal government has yet to pass a national privacy law, a new bill appears to have more support and momentum than previous efforts. But until that happens, a growing number of US states are implementing their own, many of which take inspiration from the GDPR. By 2026, 13 states will have enacted comprehensive privacy legislation, with many more considering bills of their own (the International Association of Privacy Professionals has an excellent analysis of the laws).
Meanwhile, US federal agencies like the Federal Trade Commission (FTC) and Federal Communications Commission (FEC) have established a strengthened enforcement environment, suggesting organizations need to improve their data governance practices.
In Australia, long-awaited privacy legislation reforms have been brought forward to August. These reforms will likely include new maximum/minimum retention periods, a "fair and reasonable" test for the collection, use, and disclosure of personal info, a right to erasure (and de-index search results), and a right to sue for invasions of privacy.
No matter where your business is based, it’s becoming increasingly likely that you need to comply with relevant privacy laws and enforcement to remain operational. To do this, you need to understand the data you hold.
Data breaches and ransomware are an ever-present threat
2023 was a record-setting year for data breaches, with 3205 breaches globally, according to an annual report from the Identity Theft Resource Center (ITRC). This was up 78% from 2022, and up 72% from the previous record set in 2021. Attackers continue to grow more sophisticated, as evidenced by recent attacks on organizations like Microsoft and Cloudflare.
The 2023 edition of IBM’s annual Cost of a Data Breach Report recommended that companies modernize their data protection across the hybrid cloud. But to properly focus their efforts on protecting what matters, companies again need to understand what they have.
What do we mean when we say, “understanding your data”?
Understanding your data goes beyond identifying where it is, whether it’s structured/unstructured, and the systems and teams it’s held by. Understanding your data means you understand the implications of each piece of data.
As well as understanding a given document’s location in a project folder, for example, you can see that the document also contains a Social Security Number (SSN) or passport number, so you can apply the proper controls for access and disposal/destruction.
The benefits of understanding your data
Gaining this understanding means you can make the right decisions. This results in improved risk posture, compliance with relevant privacy and records laws, lower costs, and improved efficiency.
However, understanding your data can also lead to improved strategies and business decisions. When you know what you have, you can view the bigger picture and detect patterns and trends. Organizations can sort the wheat from the chaff, and understand customer buying patterns, support requests, marketing campaign results, and other data to make informed decisions to grow their business.
Inevitably, organizations that understand the data they have can provide a better experience for their customers. Achieving these results is not easy.
Challenges in understanding your data
Data sprawl
With the adoption of cloud platforms and Software as a Service (SaaS) solutions, teams store data in more locations than ever before. They also transfer data across data sources with limited oversight, as they race to complete tasks. According to one study, 43% of respondents said they used an average of four to six platforms to manage their data. Another 11% use an average of 10-12 platforms.
How does this impact governance? A separate survey of 650 organizations, found that 44% of respondents said they cannot maintain governance and automate policy controls around data, and 42% could not enforce consistent security measures, in what 54% of respondents labeled.
Data silos
Data silos are the other side of the coin, but they also contribute to the growth in the number of data sources used and governance complexities. Data silos are a set of information that can only be accessed by a given team or group within the company. Think: A Customer Relationship Management (CRM) platform used by a marketing or sales team, a customer feedback system used by a product team, or website analytics used by the IT department.
Unstructured data
When discussing data sprawl and data silos, unstructured data deserves special mention. Unstructured data sources like documents, emails, or media files—unlike their structured counterparts such as databases and applications like customer relationship management (CRM) platforms — lack consistency in structure or data model.
According to estimates, 80-90% of enterprise data is unstructured, and this type of data is growing three times faster than structured data.
These data sources are valuable as they provide more context and insight and can act as a record of strategies formulated or decisions made, but their unstructured nature means records managers or privacy professionals must open and review the content to accurately classify the contents. To properly gauge the risk of an unstructured data source, this must be done at a scale that is impossible to do manually.
Reliance on manual effort
Indeed, thanks to the embrace of cloud platforms and SaaS, even a small organization now has too much data to review and inventory manually. But this is still what many organizations rely on, with organizational surveys of the types of data departments and teams keep and where they keep it, and a requirement that employees enter this information into a central inventory.
This takes significant, valuable time away from other tasks, but they are also redundant straight after the survey. Given the task is not part of their core role, employees will not set aside time, will be apathetic towards the task, or will do so inconsistently. Fortunately, there are modern, automated solutions to this problem.
Case study: A large municipal organization remains compliant and guarantees data trust
This large municipal organization oversees an area on the West Coast of the United States. A 2018 electronic records management assessment found records and information management was occupying significant staff time, resources, and effort, due to a lack of technical infrastructure. These issues made compliance with retention laws more difficult and were exacerbated due to the move to Microsoft 365 and the beginning of the COVID-19 pandemic in 2020.
The RecordPoint platform’s federated data management helped centralize governance, with in-place management also reducing overhead. By embracing RecordPoint, the organization made a rapid transition to digital tools, providing the confidence that all information is managed.
Read more: https://www.recordpoint.com/customers/large-municipal-organization
Data lifecycle management is key in overcoming data understanding challenges
Organizations that manage their data throughout the lifecycle—from creation to disposal/deletion—will be much better placed to follow data privacy regulations and limit the impact of a data breach. Here are the essential steps for gaining such an understanding.
Build a data inventory
To ensure you manage your data appropriately, you need to understand what you have. You need to discover all the data in your organization, and then analyze it to understand what you need to do with each piece of data. Once you have a central data inventory, you can more simply and consistently apply policies for access, privacy, and disposal.
Classify your data by sensitivity
By classifying your data, you can ensure you filter what matters, and manage it appropriately. Data classification refers to grouping data according to sensitivity. Once you know where your sensitive data is, such as personally identifiable information (PII) or payment card information (PCI), you can take the right steps to protect it by moving it to more secure storage, managing access, and removing it when it reaches the end of its retention period.
Data classification allows you to meet your obligations under privacy and compliance regulations, and better protect your organization from a damaging data breach.
Retention, disposal, and data minimization
As indicated above, privacy law, as well as other types of compliance regulations, require you to remove data after a certain time, or when other conditions are reached. Holding on to data longer than needed can result in regulatory penalties. But worse than this, the more data you possess, the bigger the blast radius of a data breach.
Australians experienced a dramatic example of this in 2023, when financial services company Latitude Financial was hacked and data from 14 million current and former customers was accessed. Some of this data was 18 years old at the time of the attack. As a result, Latitude reported a $158.5 million net loss for 2023 and is facing a class-action lawsuit from affected customers.
A more recent example comes from March this year when AT&T announced it had suffered a data breach exposing data from 7.6 million customers and 65.4 million former account holders. A preliminary analysis suggested the data was from 2019 and earlier and included personal information such as email and mailing addresses, phone numbers, birth dates, social security numbers (SSNs), AT&T account numbers, and passcodes.
While data breaches are damaging for any organization, it is hard to argue against the idea that a more rigorous approach to data minimization would have lessened the impact in this case.
Remove ROT
As well as removing sensitive data when required, you should focus on removing Redundant, Obsolete, and Trivial data (ROT). This unnecessary data is created during normal business operations, and organizations often hold on to this data in case they need it “one day”.
While the most direct impact of ROT is an increase in costs: storage, egress, and replication costs, ROT also increases risk. ROT may contain PII, PCI, and other types of sensitive data, which as we’ve outlined can increase regulatory and security risks. ROT can also slow down your organization’s processes, as employees find it hard to find what they need in systems clogged with unnecessary data.
How does RecordPoint help?
Connect all your data sources to ensure all your data is managed
As we've discussed, the more data sources you have, the more difficult it is to manage them. RecordPoint’s Connectors allow you to connect to any system essential to your business, both standardized systems with common configurations like Microsoft 365 applications like SharePoint, OneDrive, and Outlook, and unique configurations of systems like Salesforce or Workday, which are often customized for organizations’ needs.
Our File Analysis feature allows you to maintain visibility of data stored in on-premises systems. RecordPoint allows for visibility of all your data, whether stored in SaaS systems or on-premises storage.
Scan all your data for PII, PCI, and other custom identifiers
Once you have access to your data, the next critical step is to harness the intelligence that will help you make better decisions around classification, retention, and disposal.
The platform’s Intelligence Signaling feature scans all incoming data and records for Personally Identifiable Information (PII)—sensitive critical PII like social security numbers, tax file numbers, driver’s license numbers, and passport details, as well as less sensitive PII like name, email, phone—as well as Payment Card Industry (PCI) data.
You can also configure the system to check for specific data re to your organization's unique needs and jurisdiction. Such custom signals could be a unique customer number as an identifier, a Health Insurance Claim (HIC) number that might be essential for claims processing, or a jurisdiction-specific, government-issued ID like a state driver’s license.
Machine learning enables data classification at scale
Once you have all this intelligence, you need to ensure your data is classified for privacy and risk. These privacy signals help inform your data classification through either rules-based classification or with RecordPoint’s Classification Intelligence feature which allows you to train a machine learning model to auto-categorize based on content and context. The machine learning models themselves are straightforward to build through a simple interface, with key features like prediction probability scores.
Once you know what is sensitive, you can take appropriate action as we have outlined above: manage access, ensure it is stored securely, and remove data once you are permitted.
Automatically identify ROT so you can remove it, and reduce the risk
The RecordPoint platform allows you to identify and remove the ROT, further reducing risk, lowering storage and other costs, and increasing employee efficiency. Automatically identifying ROT and removing it makes managing what matters much easier.
Reporting allows you to understand your data—and your risk—at a glance
Business intelligence (BI) reporting plays a vital role in risk mitigation, allowing organizations to identify and reduce risk, predict market shifts, and spot data anomalies. By layering data governance metrics into your reporting, you can make better decisions to limit risk.
The RecordPoint platform’s deep reporting capabilities allow you to explore data in your BI platform of choice, including Power BI and Tableau. So, you can make decisions based on data, not your gut. Data governance metrics allow you to understand where your data is held, view trends like unsafe data-sharing practices, and surface data to comply with Data Subject Access Requests (DSAR), or requests for data to be deleted.
Case study: AIATSIS transforms its information governance strategy to lower risk
With a small team and limited records experience, the Australian Institute of Aboriginal and Torres Strait Islander Studies was facing challenges ensuring compliance across the organization. They needed a data inventory to centralize governance across their data corpus, with minimal disruption to user workflows.
By adopting RecordPoint, AIATSIS gained federated data management to centralize governance across business systems, including SharePoint Online and Microsoft Teams, as well as in-place management to reduce overhead.
RecordPoint’s automated classification allowed them to rapidly catalog and classify their data corpus at a high degree of accuracy.
The result was a lower compliance risk, and increased confidence that everything was captured, all with less effort for the records team.
Read the full case study
Discover a better platform
Understanding your data is a challenge no matter what industry you’re in. If you’d like to investigate how RecordPoint can help, explore the platform now, or book a demo for a full walk-through.