The unstructured data playbook: selected insights from FILED Talks E1

When you don’t know what's in your unstructured data, you can’t ensure your customers’ privacy is respected, your data is protected, and your AI model is being trained on good, compliant, trusted data.

Written by

Reviewed by

Published:

March 25, 2026

Last updated:

Finding it hard to keep up with this fast-paced industry?

Subscribe to FILED Newsletter.

Your monthly round-up of the latest news and views at the intersection of data privacy, data security, and governance.

Subscribe Now

Unstructured data is projected to triple over the next three years — creating both unprecedented opportunity and serious risk for organizations racing to deploy AI.

In our first FILED Talks webinar, RecordPoint's Head of Product Joe Pierce sat down with Melvin Baskin, principal consultant at ICG Consultants and an information governance veteran, to unpack what it actually takes to get your data estate AI-ready.

The core insight: you can't govern what you can't see, and you can't safely deploy AI on data you haven't prepared.

The following is an overview of the five key aspects of solving this issue. But if you missed the session, register and watch the session on demand.

What does AI-ready data actually look like?

AI-ready data isn't a single standard. It's a state defined by your environment, industry, and risk profile. At its foundation, it means having policies that aren't just written but actively executed across privacy, retention, and security domains.

Reaching that state requires mapping controls from multiple sources and securing agreement across stakeholders on which ones apply to AI deployment. The challenge isn't purely technical — it's organizational. As Baskin notes, the traditional stakeholders of IT, legal, and HR must now be joined by marketing, communications, and production teams who generate and depend on massive volumes of data.

The people problem: building cross-functional alignment

Technology alone won't solve the AI-readiness challenge. Every successful implementation starts with people.

Baskin emphasized a foundational principle: people, process, technology — always in that order. The human element determines whether governance programs succeed or stall. That means establishing strong relationships with stakeholders, particularly IT teams who want to own and control systems and business units who generate the data that drives revenue.

Now, let's go through the five aspects of solving the unstructured data problem

1. Cleaning up the data estate

Organizations preparing for AI deployment often want to fix everything at once. That approach almost always fails.

Start with the low-hanging fruit: data that hasn't been touched in years and serves no current business purpose. Running simple reports to identify stale content, then bringing that information to data owners for disposition decisions, creates early wins without overwhelming the organization.

2. Getting visibility into your data

You can't govern what you can't see. Data mapping is the essential first step.

Modern tools can automate much of the discovery process, though traditional spreadsheet-based mapping still works for smaller environments. The goal is transparency: understanding where all information lives, how it flows through its lifecycle, and what systems capture personal or sensitive data.

The challenge extends well beyond primary document management systems. SharePoint, Box, and Google Drive represent only the most visible repositories. The long tail — email attachments, Salesforce cases, Slack messages, Teams conversations — each requires governance attention too.

Baskin recommends separating communication channels from document management systems for retention purposes. Platforms like Slack and Teams generate enormous volumes of duplicate and transient content. Applying shorter retention periods to these channels reduces excess data without affecting the authoritative documents stored in proper repositories.

3. Classification at scale: making it practical

Classification is where many governance programs stall. Sorting every document into business-appropriate buckets across petabyte-scale data estates feels impossible.

When stakeholders agree on file structures and naming conventions, they're effectively walking into alignment with the retention schedule. Classification becomes a natural extension of existing work patterns rather than an imposed burden.

Modern AI tools can then be trained on these classifications. Building a model around a thousand similar policies, for example, allows automated identification across terabytes of content with probability scoring. Human review remains necessary, but the scale becomes manageable.

4. Fixing the oversharing problem

The scariest part of connecting Copilot to SharePoint or Gemini to Google Drive is oversharing. Public links created carelessly, documents shared with vendors who no longer need access, internal permissions that grant everyone access to everything — all of it creates immediate AI exposure.

Aligning system policies with specific roles and sensitivity classifications filters out inappropriate access before AI amplifies the problem. External data sharing requires particular attention. Organizations without clear policies should develop matrices that map data types to authorized sharing scenarios.

5. Defensible disposition: beyond the delete button

Defensible disposition doesn't always mean destruction. It means documenting that you're executing policy on a regular cycle.

Engaging data owners quarterly with clear information about applicable policies and supporting statutes creates a defensible position. When owners choose to retain data for legitimate business reasons, that decision gets documented. When they approve deletion, that gets documented too. The key is consistent execution and clear records.

The evolution from data governance to AI governance

The field is shifting fast. Professionals who spent careers managing physical records and basic retention schedules now face questions about preparing SharePoint estates for Copilot integration. This is the digital transformation of the governance profession itself.

Baskin's advice to colleagues who view these changes with trepidation: embrace it. Technology evolves faster with each cycle, but the fundamentals of good governance remain constant. Understanding data, applying appropriate controls, and enabling defensible decisions will always matter.

AI systems are only as good as the data they consume. Governance professionals who position themselves as gatekeepers of data quality have both job security and strategic influence.

‍

Discover Connectors

View our expanded range of available Connectors, including popular SaaS platforms, such as Salesforce, Workday, Zendesk, SAP, and many more.

Explore the platform

Register

See All

Compliance

Assure your customers their data is safe with you

Book a demo

The unstructured data playbook: selected insights from FILED Talks E1

Finding it hard to keep up with this fast-paced industry?

What does AI-ready data actually look like?

The people problem: building cross-functional alignment

1. Cleaning up the data estate

2. Getting visibility into your data

3. Classification at scale: making it practical

4. Fixing the oversharing problem

5. Defensible disposition: beyond the delete button

The evolution from data governance to AI governance

Discover Connectors

Register

Related Posts

7 Features of Unstructured Data Compliance Software

Tackling Unstructured Data: A Stakeholder Buy-In Guide

The Real Risks of Unstructured Data in Your Org

Assure your customers their data is safe with you

Platform

Solutions

Resources

Company