Manager insight | Evenlode’s James Knoedler on why markets are mispricing data as AI adoption accelerates

Unsplash - AI

As AI adoption accelerates, software and information services stocks have been hit recently, with investors panicking over the potential disruption of legal and data analytics tools. In the following update, James Knoedler, portfolio manager of the Evenlode Global Equity Fund, argues that markets are mispricing these companies: their true value lies not in software, but in unique, irreplaceable data assets that are becoming ever more critical in a world dominated by large language models.

The trends of 2025 have extended into 2026 – a flight to safety away from software, media, and information services towards the perceived safe havens of banks, semiconductor capital equipment, memory companies, and capital goods companies.

Last week the software complex was hit hard; this week was the turn of the media and information services group in Europe, triggered by overnight news of an Anthropic plugin which helps legal services build their own workflow software solutions. RELX and Wolters Kluwer, which derive 13% and 12% of their 2026 operating earnings from legal services respectively, were at Ground Zero.

Before we get into the path ahead for these companies and their valuations, it’s worth reminding our readers why we own them. Ultimately these companies are not software providers, they are data providers. Over time their delivery has shifted from physical books to digital services delivered via cloud, but the core differentiation has remained the very specific data and relationships between datasets that they, and only they, can provide to their customers.

The arrival of large language models (LLMs) has radically increased the value of differentiated data. For us the canonical statement of this comes from an OpenAI scientist, James Betker, who in 2023 observed

‘[T]rained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point… This is a surprising observation! It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else … when you refer to “Lambda”, “ChatGPT”, “Bard”, or “Claude” then, it’s not the model weights that you are referring to. It’s the dataset.’[1]

Quality as well as quantity of data is critical; up to 99.86% of data from the Common Crawl web repository is thrown away before models are trained[2]. We also know that large language models are far more reliable when trained on, and grounded on, concise and expertly authored data than on common crawl data.

The hallucinations and overgeneralisations intrinsic to LLMs are growing problems in industries which carry strict liability and where client trust is key. This explains the deals struck by Harvey, the bellwether LLM legal startup, to access legal data from RELX and Wolters Kluwer. We focus on RELX from this point as the greater severity of the move in RELX’s share price shows that it is at the epicentre of fears.

The recent market moves are not related to the core RELX legal business but to the adjacent opportunity of legal ‘workflow’ – the digital creation, analysis, and filing of legal documents and processes such as opinions, NDAs, discovery, and contract review. This budget consumes 80% of legal firm technology spend whereas citation libraries like LexisNexis and Westlaw make up the remaining 20% (based on broker estimates). Our companies certainly had ambitions to grow into this budget over time which may now be foreclosed, if legal firms decide to vibe code their own drafting tools – this is the terror stalking software companies. What the market is ignoring is that prowess in creating software tools has no bearing at all on the data assets – nothing has changed here, as probabilistic LLMs intrinsically cannot create a data asset which is totally deterministic (i.e., reliable). RELX’s LexisNexis product relies not just on the very long range of data it has of legal judgements and filings over centuries of case law, but also its taxonomy of connections between them. The reliability and stability of this product is comparable in our minds to the similar asset that lies in the heart of Alphabet, its search index and click-and-query database.

While it may be more difficult to expand the analytics business over time, the citation library at the heart of RELX’s legal business has not lost pricing power. If anything, it has gained power as it will increasingly be used by tools which lack common sense and the ability to generalise. Vibe coding will impact growth rates at the margin, but Legal revenue growth does not need to accelerate to double-digit rates at this valuation.

RELX’s science publishing division (31% of 2026 operating profit) has been similarly struck by fears that its analytics business, which in this case is half of the division revenue, will be ‘foreclosed’ by OpenAI and Anthropic tools. This again misses the point, which is in this case that the Cambrian explosion in science paper submissions due to LLMs has massively increased the importance of trusted third-party gatekeepers.

A recent paper in Science shows that authors who make the transition publish 36% more papers, but their rejection rate increases dramatically too, showing that the value added by curation is now structurally higher[3]. We expect that this will increase the differentiation and pricing power of the leading journals, of which RELX has a disproportionate share.

Scientists trying to build careers and secure funding now have fewer routes to credibility and profile, while the huge flood of paper ‘issuance’ has made the value of the ‘rating agencies’ even higher for the ‘buyers’ of papers – the institutions which fund them. We think a 5% growth rate for STM is eminently possible even without any progress on the sale of tools. Most of STM’s tools are additionally grounded in the unique data available to RELX, particularly from its unmatched view of the review process and readership and citation patterns, and that these data are exceptionally important to the people who oversee funding grants and career progression.

The rest of RELX’s earnings come from risk and exhibitions (41% and 14% of 2026 operating profit). The Risk business depends on contributory databases which are essentially impossible to replicate with probabilistic software. Any headwind from homegrown solutions we expect to be more than offset by an ongoing explosion in fraud rates caused by a shift to agentic commerce and the natural extreme vulnerability of LLMs to prompt injection attacks. We share the view of the market that exhibitions is immune to AI disruption risk, and that its value has actually grown post Covid as the differentiation of in-person events has grown in a world which is increasingly tilted to Teams meetings and remote working.

In 2020-2021 lockdown markets briefly panicked into thinking that the differentiation of exhibitions was gone following the explosive growth of Hopin, a remote conference alternative which for a time commanded a valuation of $7.8bn. What markets were missing was just how different outcomes were from a sales meeting in person with all the key personnel from the other side, as opposed to a sales meeting with a dozen inattentive people busy replying to emails in their main windows. There is a similar confusion going on now as markets are confusing the ability to create new code with the ability to replicate unique and irreplicable content. Ultimately, if the cost to create code collapses towards zero, this still has no negative impact on the value of the business-critical industry-specific data.

Current valuations assume rapid erosion of the pricing power of data assets, which flies in the face of all of everything that we know about LLMs. Even if – especially if – software commoditises totally, that simply means that the owners of unique and essential assets can raise their prices to compensate for any analytics revenue they lose (to be clear, this is a very hypothetical scenario). This can be funded from declining software budgets. The opportunity created by herding by a market driven by fear and uncertainty is now extreme for investors with a differentiated view on the controversy.


[1]  https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/

[2] Plaintiffs’ Remedies Proposed Findings of Fact, USA et al vs Google LLC, II.A.77

[3] “Scientific Production in the era of Large Language Models’, Science vol. 390 number 6779

Related Articles

IFA Magazine Newsletter

Sign up to our IFA Magazine newsletter to keep up to date.

Name

Trending Articles


IFA Talk is our flagship podcast, that fits perfectly into your busy life, bringing the latest insight, analysis, news and interviews to you, wherever you are.

IFA Talk Podcast – listen to the latest episode