Benchmark: How much of your sensitive data is over-permissioned?
The rise of generative AI has transformed over-permissioning from a latent risk to one deserving urgent attention. But what's the scale of the issue? We attempted to find out
Published:
Last updated:

Finding it hard to keep up with this fast-paced industry?
Key findings at a glance
- ~16% of business-critical files are accessible to users who shouldn't have access — including broad internal groups, external guests, and anyone-with-a-link sharing configurations.
- 25–30% of files across cloud and productivity environments contain sensitive data, yet most organizations have no systematic view of where it lives or who can reach it.
- Over 800,000 files per enterprise are at risk from oversharing, a figure that grew 60% year over year in recent analyses — and shows no sign of slowing.
- Nearly 22% of files employees upload to GenAI tools contain sensitive information, extending over-permissioning risk well beyond traditional collaboration platforms.
- 61% of organizations have experienced insider-driven breaches tied to unauthorized file access in just the past two years, at an average cost of $2.7 million per incident.
Most security teams have a reasonable handle on who can access their critical applications. Far fewer can answer a simpler question: who has access to that spreadsheet full of customer records sitting in a shared drive? And is that the only copy of that file in existence?
Generative AI adoption, collaboration sprawl, and the steady rise of insider risk have made that question urgent. The files your team shared broadly during a product launch last quarter are now being indexed by AI assistants. The "anyone with a link" setting a departing employee left on a sensitive document two years ago is still live. And the groups your IT team created for a cross-functional project in 2021 still grant access to people who moved on long ago.
Over-permissioning has always been a latent risk. What's changed is the speed and scale at which it can now be exploited by threat actors, careless insiders, and increasingly by AI tools that surface the data organizations forgot they'd exposed.
This article draws on findings from multiple independent research efforts to establish a working benchmark: how much sensitive data does a typical enterprise have, how much of it is over-permissioned, and what does that exposure actually cost?
The problem: over-permissioning as a breach amplifier
At its core, over-permissioning is simple: people have access to data they don't need to do their job. It's the gap between what the principle of least privilege demands and what actually exists across your file shares, cloud storage, email, and collaboration platforms.
In theory, every user should only see the files relevant to their role. In practice, the gap is enormous, and it grows daily.
How over-permissioning happens
The root causes are familiar to any IT or security practitioner. Collaboration platforms default to broad sharing: a new Google Drive folder inherits permissive settings, a SharePoint site is shared with "Everyone except external users," and a Teams channel grants access to a group that's grown far beyond its original purpose. Link sharing is the quickest way to unblock a colleague and the quickest way to lose control of a document.
Beyond defaults, there's organizational drift. Mergers bring together permission structures that were never designed to coexist. Role changes leave employees with access to files from a previous team. Group memberships accumulate over time and are rarely pruned. Inherited permissions cascade through folder hierarchies in ways that are difficult to audit and nearly impossible to untangle at scale.
The cumulative result of all of these factors is an environment where broad access is the norm and least privilege is the exception.
The real-world consequences
Over-permissioning wouldn't matter much if breaches were rare or if attackers couldn't exploit it. Unfortunately, neither is the case.
The Verizon 2024 Data Breach Investigations Report found that more than two-thirds (68%) of breaches involve a non-malicious human element — people making mistakes or falling for social engineering. That's the context: most breaches start with ordinary human behavior. Over-permissioned data means that when a credential is compromised or an employee makes an error, the blast radius is far larger than it needs to be.
The insider risk picture is equally stark. According to a 2025 study by the Ponemon Institute, 61% of organizations have suffered file-related breaches caused by negligent or malicious insiders, at an average cost of $2.7 million per incident. These aren't exotic attacks — they're the predictable result of people having access to files they shouldn't, combined with weak visibility into who's accessing what.
When over-permissioning is the norm, every compromised account becomes a potential data breach. Every departing employee is an insider risk. And every AI assistant connected to your file storage is surfacing documents its users were never meant to see.
Why benchmarking permissions is tricky (and why most organizations don't)
If the problem is this pervasive, why don't more organizations measure it? Because benchmarking permission sprawl is genuinely hard.
Sensitive data is dispersed across platforms and formats. A single organization might store sensitive files across Google Drive, SharePoint, Box, email, Slack messages, and a dozen SaaS applications. Each platform has its own permission model, its own sharing defaults, and its own blind spots. Getting a unified view requires tooling and effort that most security teams don't have.
Permissions themselves are complex. Access to a single file can be granted directly, inherited from a parent folder, conferred through group membership, extended via a sharing link, or opened to an external guest. Understanding the effective permissions on a given file — who can actually access it and why — often requires untangling several layers of logic.
Ownership is unclear. Who's responsible for ensuring the right people have access to the right files? IT controls the infrastructure. Security sets the policies. But the people who create and share files every day — the data owners — are typically business users who never think about permissions unless something breaks. This diffusion of responsibility means nobody has a complete picture.
There's no standard yardstick. Unlike vulnerability management (where Common Vulnerability Scoring System (CVSS) scores offer a common framework) or endpoint security (where detection rates can be compared), there's no widely accepted benchmark for permission hygiene. Most organizations have no baseline, which means they have no way to know whether they're improving or falling behind.
The result is that over-permissioning persists as a known-but-unmeasured risk — acknowledged in security reviews, rarely quantified, and almost never tracked over time.
Building a sensitive data benchmark
To understand the scale of over-permissioning, it helps to start with a more basic question: how much of your data is sensitive in the first place? If you don't know how much sensitive data you have, you can't know how much of it is exposed. Drawing from several independent research efforts, a rough but consistent picture emerges.
Sensitive data in cloud infrastructure
A cloud security study by Dig Security (now part of Palo Alto Networks) examined more than 13 billion files stored in public cloud environments and found that more than 30% of cloud data assets contain sensitive information. PII — employee records, customer data, social security numbers, and credit card details — was the most common sensitive data type. The study also found that 95% of cloud principals were granted excessive privileges, reinforcing that the access problem runs deep alongside the data problem.
Sensitive files in productivity platforms
In an analysis of organizations using Google Drive, Material Security found that on average, over a quarter of files in Google Drive contain sensitive data. Sensitive file volume grew at roughly three times the rate of overall file growth — meaning the problem is compounding. Email remains a dense repository of sensitive information as well, with an average of 75,000 sensitive emails per organization in the sampled set. And the sharing behavior around these files is where risk accumulates: public links, overly broad group access, and external sharing that outlasts its purpose.
Misconfigurations and oversharing
Concentric AI's 2022 Data Risk Report, based on analysis of more than 500 TB of unstructured data in production environments, found that 16% of business-critical files could be seen by internal or external users who should not have access. The same report found that organizations averaged 802,000 files at risk from oversharing, roughly 400 at-risk files per employee. Overshared file volumes grew 60% year over year. More than 160,000 documents per enterprise were shared with "everyone in the company," and over 52,000 documents were shared by employees with their personal email accounts.
These aren't edge cases or theoretical scenarios. They're the baseline state of enterprise file environments.
Sensitive content uploaded to AI tools
The newest vector for data exposure is generative AI. Harmonic Security analyzed 1 million prompts and 20,000 uploaded files sent by employees to more than 300 GenAI and AI-enabled SaaS applications and found that nearly 22% of files and more than 4% of prompts contained sensitive information. The sensitive content ranged from source code and access credentials to M&A documents, customer records, and financial data. A large share of this data went to free-tier or personal accounts with no enterprise controls, meaning it may be used for model training or retained indefinitely.
By Q3 2025, the proportion of sensitive file uploads had risen to over 26%, indicating the trend is accelerating as employees embed AI tools deeper into their daily workflows.
What the numbers tell us
Taken together, these data points form a rough but useful benchmark for security and IT leaders:
The takeaway isn't a single number, it's a pattern. Roughly one in four files in a typical enterprise contains sensitive data. Of those sensitive files, a meaningful proportion is accessible to people who have no business need for it. The volume of at-risk files is growing sharply every year, driven by collaboration sprawl, cloud migration, and now AI adoption. And the cost of getting this wrong is measured in millions of dollars and months of incident response.
Where to go from here
These benchmarks aren't meant to alarm, they're meant to establish a baseline. You can't reduce what you haven't measured.
For security and IT leaders, the practical starting points are straightforward: gain visibility into where sensitive data lives across your productivity and cloud environments, understand the effective permissions on that data, and identify the largest gaps between current access and actual business need.
Over-permissioning isn't a vulnerability that can be patched with a single tool or policy change. It's a systemic condition that requires continuous measurement, clear ownership, and tooling that scales with the speed at which your organization creates and shares data.
The organizations that treat permission hygiene as a measurable, trackable metric — rather than an abstract best practice — are the ones that will be best prepared for the next breach attempt, the next regulatory inquiry, or the next AI-driven transformation of how their workforce handles data.
Discover Connectors
View our expanded range of available Connectors, including popular SaaS platforms, such as Salesforce, Workday, Zendesk, SAP, and many more.
Protect customer privacy and your business
Know your data is complete and compliant with RecordPoint Data Privacy.

