top of page
Search

Why AI Won’t Save Your Sustainability Disclosure

  • Feb 2
  • 6 min read
Sustainability reports are dense text-heavy documents. Can a general purpose AI tool understand nuance deeply enough to make it credible?
Sustainability reports are dense text-heavy documents. Can a general purpose AI tool understand nuance deeply enough to make it credible?

TL;DR AI-Assisted drafting is becoming standard practice in all long-form reporting, including sustainability reporting. And the risk isn’t that teams are doing something wrong - it’s that the generic GenAI tools they’re using can’t tell the difference between a vague aspiration and a legally meaningful commitment. A system that reads fluently but reasons poorly is destined to produce disclosures and claims that sound rigorous without actually being rigorous. Buyer: beware.


Not all AI is equal for sustainability work. The gap between reading a report and reasoning about one is where credibility is won or lost.

When people talk about greenwashing, they tend to picture a particular kind of organisation. A petrochemical company painting its logo green or using the word ‘eco’ somewhere in the strap-line. A fast fashion brand running ads about recycled fabric while expanding production. A gas-growth company sponsoring a community solar farm.


That kind of greenwashing is real, well documented, and is increasingly being caught and penalised.1 But there’s another kind of greenwashing that almost nobody is talking about. It’s quieter, usually unintentional, and it’s spreading through organisations that are genuinely trying to do the right thing. It’s being made possible, in part, by public GenAI tools that are not as smart as they appear.


The problem is with language that sounds good


Here’s a sentence from a real corporate sustainability report, edited to protect the source:

“We are working toward reducing our operational carbon footprint in line with our commitment to a sustainable future.”

Now here’s a different kind of sentence:

“We commit to reducing our absolute Scope 1 and Scope 2 emissions by 50% by 2030, against a 2020 baseline, in line with a 1.5C pathway as defined by the Science Based Targets initiative (SBTi).”

To a person reading carefully, these two sentences are completely different. 

The first is an aspiration; the second is a commitment. 

The first is unfalsifiable; the second can be tested, audited, and if unmet, challenged. The first has no timeline; the second has a deadline.


But to a general purpose generative AI tool - the kind used to scan and summarise sustainability reports - these sentences might look very similar. Both mention carbon, both reference sustainability, and both can be inferred as having a positive sentiment. Both may get tagged as evidence of climate action.


This gap between how language sounds and what it means is one of the central challenges in climate action observability right now. Research into automated ESG analysis has found that standard natural language processing (NLP) tools frequently fail to distinguish between binding commitments and non-binding aspirations.2


That might seem like a technicality, but it has very real consequences for both observers holding corporate claims to account, and for the organisations themselves who may find they’re suddenly exposed to legal scrutiny.


When helpful becomes hazardous


The use of AI tools in sustainability work is growing fast. Teams are using general purpose large language models (LLMs) to help draft reports, summarise policies, scan competitor disclosures, and generate first-pass assessments of regulatory compliance. In many cases, these tools are genuinely useful and are saving people hours of work.3


The risk is not the tools themselves. The risk is mistaking what they’re good at for what they’re good enough at.


A large language model is extraordinarily good at generating fluent, plausible-sounding text. Its early predecessor - autocomplete - worked on the same underlying principle; suggesting the most probable next word, rather than the most accurate one. LLMs are that idea, scaled to an extraordinary degree, and trained to produce outputs that read like a confident human expert. 

But confidence in tone is not the same as accuracy in reasoning. These models can - and regularly do - produce outputs that are logically weird, factually incorrect, or subtly misleading… all while sounding completely authoritative.4


In a low-stakes context, like producing social media content, that’s an inconvenience. In sustainability reporting - where the accuracy of your disclosures is now a legal concern (not just a reputational one) - it’s a meaningful risk.  


A real-world example


We’re all familiar with the HSBC UK ‘we planted trees’ ad campaign debacle from 2022.5 But this behaviour of using language to strategically skew truth - and the consequences - continues: 


In November 2025, Tyson Foods (second-largest meat producer in the world) settled a greenwashing lawsuit brought by the Environmental Working Group. The case centred on Tyson's public commitment to reach "net-zero" greenhouse gas emissions by 2050 and its marketing of beef products as "climate-smart." The lawsuit alleged that despite these claims, Tyson had spent less than 0.1% of its annual revenue on actual emissions reduction efforts, and had no credible plan to achieve the targets it was publicly promoting. Under the settlement, Tyson agreed to stop making both claims for five years and cannot make new environmental claims unless they are first verified by an independent expert.6


The claims were not fabricated. The emissions reduction research was real. But the gap between what was being stated and what was actually being done was the problem. That is the greenwashing that courts and regulators are increasingly being asked to rule on.


The worry with AI-assisted reporting is a version of the same thing, but automated. If a tool cannot tell the difference between "we aim to" and "we commit to," or between a target that is backed by a plan and one that is not, it will produce summaries and assessments that create the impression of rigour without the substance. Scaled across thousands of supplier assessments or regulatory compliance checks, those impressions add up.


There’s a big difference between reading and reasoning


Reading a document means processing the words and extracting the surface meaning. A reasonably capable AI can do this pretty well; it can tell you that a document is about emissions reduction, pull out the key themes, and summarise the main claims.


Reasoning about a document is something much harder to achieve (computationally). 

It means asking: is this claim time-bound? Is it quantified? Does the evidence in this document actually support this conclusion? If the company says it is "on track" for its 2030 target, does the data in the preceding paragraphs actually demonstrate that? What would falsify this claim?


That kind of reasoning - the kind a good auditor or a rigorous analyst does - is not something current generative AI tools do reliably. It requires a system that can hold logical rules alongside text, check claims against those rules, and flag contradictions.


This is exactly the kind of problem that DataLoom is solving for - creating a knowledge base specifically designed to decode the logic and sentiment of sustainability disclosures. Not by replacing human judgement, but by giving that judgement a better foundation to work from.


What this means for sustainability professionals


We’re not saying GenAI is bad for sustainability work; tools that generate fluent summaries are useful and have their role to play. But a reasoning system that can distinguish a binding commitment from an aspiration, trace a claim back to its evidence, and flag logical inconsistencies is what will actually improve the quality and credibility of sustainability analysis.


The sustainability professionals doing this mahi are highly capable people working under significant pressure, often with inadequate tools. Just be wary that adding a tool that makes it easier to look thorough without being thorough does not help - it increases exposure.


What does help is tooling that is honest about what it can and cannot do, that surfaces its reasoning rather than hiding it, and that is designed specifically for the logical demands of sustainability analysis - and not general purpose text processing.


That’s the much harder thing to build. We think it’s also the only thing worth building.


At DataLoom, we're not interested in AI that makes sustainability reporting sound better. We’re interested in AI that makes it more accurate. The distinction matters, and we think about that a lot.

Sources referenced

  1. The EU has set rules for businesses to prove the green claims they make about sustainability are founded, and to prevent consumers from being victims of greenwashing and unfair commercial practices (2025): https://www.consilium.europa.eu/en/policies/green-claims-empowering-consumers-for-more-sustainable-choices/ 

  2. ESGReveal: An LLM-based approach for extracting structured data from ESG reports (2023): https://arxiv.org/html/2312.17264v1 

  3. How AI can transform sustainability reporting (2025): https://www.weforum.org/stories/2025/09/harnessing-ai-for-sustainability-reporting-path-forward/ 

  4. Detecting hallucinations in large language models using semantic entropy (2024): https://www.nature.com/articles/s41586-024-07421-0 

  5. HSBC climate change adverts banned by UK watchdog (2022): https://www.bbc.com/news/business-63309878

  6. Tyson Foods agrees to stop making ‘net-zero’ and ‘climate-smart beef’ claims (2025): https://www.ewg.org/news-insights/news-release/2025/11/tyson-foods-agrees-stop-making-net-zero-and-climate-smart-beef 


 
 
bottom of page