The return of metadata

Metadata is crucial for AI — just ask Salesforce

Anthony Woodward

Founder/CEO

June 18, 2025
Get your monthly round-up of the latest news and views at the intersection of data privacy, data security, and governance.

Subscribe to FILED Newsletter

Get your monthly round-up of the latest news and views at the intersection of data privacy, data security, and governance.
Subscribe Now

Hi there,  

Welcome to FILED Newsletter, your round-up of the latest news and views at the intersection of data privacy, data security, and governance.  

This month:

  • Ads in WhatsApp have privacy advocates concerned
  • OpenAI takes down some Chat GPT accounts linked to state-backed hacking and disinfo
  • And a call for boards to lead AI governance.

But first, as AI models struggle to gain more context on problems, metadata makes a surprising return as a potential solution.

If you only read one thing:  

Agentic AI runs on metadata

Businesses working with AI are running up against the same issue: the models can't hold onto all the information they need to be useful in solving problems. They don’t know enough about your particular business, industry, or problem, so the solutions or content they offer are often generic and naive.

For as much context as you can provide AI models, there is a limit to how much text (or “tokens”) they can remember at any one time. This is referred to as their “context window” and even in leading models it’s much too small – by an order of magnitude – to be useful in solving big problems.

This is the key issue for those working closely with AI to solve problems or build businesses. We spoke to founder, investor, and advisor Dave Slutzkin in the latest episode of the FILED Podcast, and he said the same thing (watch the clip here):  

“The challenge is having enough of that context to understand what actually matters in a given situation otherwise you end up with ... almost a downward spiral of ... information that gets gradually attenuated each time it's used ... to the point where it's not actually super useful anymore and potentially damaging at some point."

For his (pre-launch) startup Cadence, he’s solving that issue with a canonical data store, but every company leveraging AI in its products needs to find its own solution.

Tech giants’ AI agents need metadata

If you’re Salesforce, that solution involves acquiring data firm Informatica for US $8 billion.

As well as Informatica’s data collection and customer base, the acquisition means Salesforce can now integrate its agentic AI offerings with Informatica’s metadata infrastructure. As the world's largest CRM, there is a lot of information in Salesforce; this acquisition means the company can start to make sense of it, to supercharge their AI agents.

For the broader industry, this acquisition – along with ServiceNow’s acquisition of data catalog and data governance platform Data.World – sends a clear message: metadata is now foundational to enterprise AI strategy. For AI to drive real value in production, it needs context, meaning, and governance. This isn't just about combining data; it's about governing it with precision to guarantee trust, achieve compliance, and provide explainability.  

Metadata -> trustworthy data -> trustworthy AI

If all your data is held in Salesforce, and their AI product matures well, the acquisition is great news for you. More likely, your data is held in a collection of data siloes – file shares and SaaS, structured and unstructured – and the quality of the metadata for each is highly variable. (All of these data ecosystems are aiming to build their own agents to achieve platform lock-in, by the way.)

Remember: metadata is the thing that allows you to trust your data – giving you vital clues like when it was created, the system in which the data was created, and by whom. When you have trustworthy metadata across your data estate, you can trust your data, which in turn means you can begin to trust the output of the AI models using that data. Those AI models – or indeed, agents – don’t have to work as hard to understand the problem, industry, or business.

Well-governed data is the thing that will bring you into the agentic AI era. Complete, trustworthy metadata is the first step.  

🕵️ Privacy & governance

Meta announced the introduction of ads in WhatsApp, which -- if a user has added their details to the Meta Account Center -- be based on data collected in other Meta properties like Instagram and Facebook.  

There are concerns that this could violate EU laws like the Digital Markets Act and the GDPR and privacy non-profit noyb has indicated it could pursue a legal challenge depending on Meta’s implementation.

Makers of air fryers, smart speakers, fertility trackers and smart TVs have been told to respect people’s rights to privacy by the UK Information Commissioner’s Office (ICO).

🔐 Security

🔓Breaches

Ireland-based eyecare technology company Ocuco informed the US Department of Health and Human Services of a data breach impacting more than 240,000 individuals.

The Washington Post is looking into a cyberattack that compromised the email accounts of some of its journalists.

OpenAI took down ChatGPT accounts linked to state-backed hacking and disinformation.

🧑⚖️Legal cases & breach fallout

An interesting interview with Canada's cybersecurity head, offering rare insight into Nova Scotia Power breach

President Donald Trump's signed a new cybersecurity executive order last week, which among other things rescinds or leaves in limbo programs tied to software bills of materials, zero-trust implementation, and space contractor cybersecurity requirements. This collection of industry reactions to the new EO offers a mix of criticism and praise.

How data fabric addresses fragmentation, compliance, and Shadow IT to deliver robust, centralized data protection and governance.

🤖 AI governance

Boards must lead AI governance to ensure that the AI transformation doesn’t erode human capital-- but enhances it.

How to say "no" to the next AI release.

The latest from RecordPoint  

📖 Read

Juliet Hart has had a front seat to the rapid changes in the industry, both from a vendor and in-house perspective. She shares a snapshot of the lessons she's collected along the way, and explains how unlocking next-gen innovation starts with great records management.

An update to our AI governance foundations piece went live, adding details on core principles that typically guide AI model governance, as well as AI model governance practices to follow.

Over the course of her career in records and information manager, Karen Stitt has learned that hoarding data isn't always due to a lack of policies or procedures — it's a human thing.

What is data quality?

🎧 Listen

Kris and I discussed the intersection of AI, data governance, and productivity with Cadence founder Dave Slutzkin. Dave took us through his experience building on this shifting foundation, the problem his start-up is focused on solving, and his take on why many companies abandon vibe coding too soon.

The RecordPoint team have spent April and May on the road for events season, so Kris and I sat down with RecordPoint CTO Josh Mason to go through everything we learned, from the rise of agentic AI to the growing reality of quantum. Plus, learn about my proudest achievement: builing an AI agent to automate pizza delivery for his children.

Outside of our own podcasts, I have been lucky enough to be invited onto a few other shows:

I joined Alan Shimel at Techstrong.tv last week for a discussion about how organizations can navigate the complex world of data privacy, compliance, and innovation in the digital age.

I also recently appeared on The Founders Blueprint—Innovation Bay’s brand-new podcast, to talk through the RecordPoint journey.

bg
bg

Get hooked on FILED

This can be a fast-paced, complex industry and it can get overwhelming. FILED is here to help you navigate it.