SJK Labs

The B2B AI
Legibility Report

What Machines Tell Buyers Before You Do

How AI systems describe 50 regulated businesses.
An inaugural study in narrative legibility.

50 Companies audited

4 AI platforms tested

6 Dimensions scored

Read the findings Go to methodology

Inaugural report May 2026

View the full scoring table

What we already know

AI is already shaping how companies are understood before a human reaches the site.

Gartner reports that 67% of B2B buyers prefer a rep-free experience and 45% use AI in the process. G2 reports that 51% of buyers now start with AI more than Google. Forbes reports that 60% of purchases end with no clicks to website.

This inaugural study exists because narrative architecture is now the work of building a story that holds up under both human and machine reading. The findings are not really about what AI gets wrong, but about what happens when a company's narrative was built only for human readers.

Most common problem

64%

In 32 companies, at least one AI platform placed the business in the wrong or incomplete competitive frame.

Wrong company risk

46%

23 out of 50 companies were coded with an entity recognition, name collision, disambiguation or wrong company issue.

Cross-model volatility

52%

26 out of 50 companies had a gap of at least 10 points between their best and worst platform score.

Crawlability issue

22%

11 out of 50 companies were coded with a crawler, access, JavaScript, Cloudflare, gated site or homepage visibility issue.

What is Narrative Architecture

Narrative architecture is the underlying structure that holds a business story together.

Narrative architecture is the underlying structure that holds a business story together - the decisions about who the story is for, what's at stake, and how it sounds. It's not the words on the page. It's the logic underneath them that determines whether those words do any work.

Most businesses have content. Very few have architecture. The difference shows up when a buyer tries to explain what you do to someone else, when AI tries to describe you, when a journalist tries to place you in a story. If the architecture is missing, the story collapses under those conditions, and no amount of copywriting fixes it.

The SJK Labs Legibility Score

The scoring framework SJK Labs developed to run this study consistently across 50 companies.

The Legibility Score is the scoring framework SJK Labs developed to run this study consistently across 50 companies. It measures how accurately AI systems describe a business when asked buyer style questions, scoring each platform response across six dimensions.

1Clarity

2Accuracy

3Differentiation

4Customer pain point

5Proof / credibility

6Category fit

Each dimension is rated 1-5, giving a maximum of 30 points per platform and 120 overall. It matters because the way AI describes your business is your narrative now, whether you wrote it or not.

If a buyer asks ChatGPT what you do and gets a wrong answer, a generic answer, or someone else's answer entirely, that's the story they're carrying into every conversation with you. You didn't write it. You can't correct it in real time. And most businesses have no idea it's happening.

The Legibility Score makes that visible. It turns an invisible problem into a measurable one.

Results

Headline problem patterns, average dimension scores and average platform scores

Source: SJK Labs audit dataset, n=50 companies. Manual problem-mode codes were derived from the audit notes.

Chart 1 • Problem patterns across 50 companies

Category fit was the most common problem pattern.

Weak category fit in at least one model

64% (32/50)

Weak customer pain point in at least one model

56% (28/50)

Weak differentiation in at least one model

54% (27/50)

Platform disagreement of 10+ points

52% (26/50)

Weak or inaccurate reading in at least one model

50% (25/50)

Entity or name-recognition issue

42% (21/50)

First-party access or crawlability issue

16% (8/50)

The most common score based problem was category fit. In 32 companies, at least one AI platform placed the business in the wrong or incomplete competitive frame.

Chart 2 • Average score by dimension

Proof was easiest. Buyer logic was harder.

Proof / credibility

3.55 / 5

Clarity

3.53 / 5

Accuracy

3.45 / 5

Differentiation

3.37 / 5

Category fit

3.36 / 5

Customer pain point

3.33 / 5

Across this financial and regulated dataset, proof and credibility scored highest on average. The weaker dimensions were customer pain point, category fit and differentiation.

Chart 3 • Average score by platform

The same company can be legible in one system and collapse in another.

ChatGPT

19.7 / 30

Claude

24.1 / 30

Gemini

18.5 / 30

Perplexity

20.0 / 30

Claude was the highest-scoring platform in this dataset, largely because it was better at synthesis and at preserving commercial context. But the platform picture is uneven.

Seven things the audit shows

These are the patterns the audit surfaces repeatedly across the dataset. They are the parts of the story most likely to break once AI systems become the first interpreter of a business.

AI can often name the category, but it still loses the commercial difference.

The central problem is not total ignorance. In many cases, AI systems can identify the broad category but the problem is that the reason to buy gets compressed.

30/50 companies were coded with a narrative, category, wedge, freshness or differentiation issue

Name collisions are now a board level communications risk.

The most damaging problems are wrong company answers.

23/50 companies were coded with an entity recognition, name collision, disambiguation or wrong company issue

The same company can be legible in one AI system and invisible in another.

AI perception is a landscape. A company can score strongly in Claude and collapse in Gemini. It can be understood by Perplexity on one question and confused with a city, food brand or algorithm on the next.

26/50 had a 10-point platform gap or more

AI often knows the previous version of the company better than the current one.

The audit repeatedly shows that AI does not update just because a homepage changes. It updates when the new story is repeated, corroborated and indexed across the broader authority layer.

A rebrand without a machine readable cascade leaves AI telling the old story

Proof exists, but AI often cannot use it.

Many companies had strong proof on their websites and AI still missed it.

17/50 companies were coded with a proof, credibility or evidence surfacing issue

Crawlability is now a communications problem, not just a technical one.

When AI systems cannot read a company’s own website, they do not stop answering. They answer from whatever is available.

11/50 companies were coded with a crawler, access, JavaScript, Cloudflare, gated site or homepage visibility issue

AI is weakest when asked to compare.

The alternatives question exposed one of the clearest buyer risk patterns. When companies do not define their competitive set, AI fills the gap with scraped database logic, category adjacent guesses, outdated competitors or entirely wrong industries.

Comparison is where buyers move from curiosity to decision

Named examples

Snapshots

The full audits are the dataset. The public report uses a smaller set of named examples to make each problem mode tangible. These examples are not a ranking; they are evidence of different AI problem patterns.

Zepto

Entity collision

All four models described the Indian quick-commerce unicorn. None identified the Australian account-to-account payments infrastructure business at zepto.com.au.

Why it matters: a clear website is not enough when a stronger namesake dominates the graph.

Shift

Generic name risk

Across 24 responses, no platform identified Shift Australia. The models defaulted to browsers, insurance AI, used cars, keyboard keys and calculator functions.

Why it matters: generic brand names without a disambiguation footprint are not reliably machine-resolvable.

Wonderful

Wrong company

AI described a Californian pistachio conglomerate, an AI agent platform, or the adjective “wonderful”. No platform found the UK Pay by Bank business.

Why it matters: if the name does not pin the company, better-known meanings will win the query.

Oakbrook

Crawlability

The homepage was Cloudflare-blocked, so AI reconstructed the UK fintech from third-party fragments, unrelated namesakes and criticism.

Why it matters: once first-party access fails, the market learns the company in other people’s words.

Ferovinum

Freshness lag

Models understood the funded wine and spirits fintech, but they were one rebrand behind: Fero, ferodrinks.com, FLOW and updated metrics were largely invisible.

Why it matters: rebrands need a machine-readable cascade, not just a new homepage.

Payhawk

Narrative wedge

AI recognised enterprise spend management, but often missed the key wedge: Payhawk’s ability to manage existing corporate card programs rather than forcing migration.

Why it matters: a company can be visible and still have its best commercial difference flattened out.

Hippo Insurance

Outdated story

Most models still described the legacy smart-home carrier story, while the current site presents a 70+ carrier quote-comparison agency.

Why it matters: AI often remembers the old strategic form of the company better than the current one.

Tokenovate

Technical flattening

AI captured post-trade automation and CDM, but often flattened the edge: nine CDM-native workflows, the Unified Trade Record and Novat as settlement-act tokenisation.

Why it matters: deep technical differentiation is often the first thing to disappear under summary pressure.

Coinbase

First-party access

AI could describe the product surface accurately, but it could not read the homepage. One platform even made a false security claim that a first-party incident page could have corrected.

Why it matters: if the canonical narrative is unreadable, secondary sources become the de facto brand voice.

AlphaSense

Proof underused

Models captured the category and scale, but flattened the proprietary content moat: Tegus, Wall Street Insights, no-hallucination architecture and Gartner MQ proof were underused.

Why it matters: even well-covered companies can lose the proof that actually supports premium positioning.

Moula

Disambiguation drift

When the model found the lender, it often described the category correctly. On softer questions, some systems drifted into the Arabic term “Mawla” or older loan ceilings.

Why it matters: even partial recognition can break once the question becomes less anchored and the name has stronger alternate meanings.

R3

Blocked homepage

Because r3.com was effectively unreadable to non-JavaScript retrieval, one platform reconstructed the enterprise blockchain company, another asked for clarification, and one thought “R3” meant a game controller button.

Why it matters: if the first-party site is inaccessible and the name is overloaded, AI fills the gap with whatever surface is easiest to resolve.

Full scoring table

Companies are sorted by average total score across the four platforms. Each platform total is out of 30.

View the full scoring table

The separate scoring-table page includes the full ranking table from the report itself, including ChatGPT, Claude, Gemini and Perplexity scores, plus average score and spread.

Open the full scoring table

Methodology

How the audit was run

The study measured AI generated explanations of companies. It did not measure actual buyer behaviour, conversion, search ranking, website traffic, media coverage, analyst opinion or commercial performance.

Step 01

Company selection

The study audited 50 regulated, financial, fintech, insurance, wealth, payments, crypto and capital intensive businesses. Companies were selected to represent a range of visibility levels, naming risk, technical complexity and narrative maturity.

Step 02

Platforms tested

Each company was tested across four web-search-enabled AI platforms: ChatGPT, Claude, Gemini and Perplexity. The study does not attempt to create a sterile model benchmark; it measures how a buyer, journalist, analyst or partner would experience each platform in normal use.

Step 03

Prompts used

Each company was tested using the same six buyer style questions: what does this company do, who is it for, what problem does it solve, what makes it different, is it credible, and who are the main alternatives?

Step 04

Scoring framework

Responses were scored using the SJK Labs Legibility Score across six dimensions: clarity, accuracy, differentiation, customer pain point, proof and credibility, and category fit. Each dimension was scored from 1 to 5, giving each platform a maximum of 30 points per company.

Step 05

What the study measured

In several cases, the AI’s answer was not simply incomplete; it returned the wrong company, the wrong category, stale positioning, irrelevant competitors or third-party criticism without first party counterweight.

Step 06

Treatment of first-party access

For each company, first party site content was treated as the primary ground truth where available. Where the company website was blocked, JavaScript-walled, cookie-walled, Cloudflare-challenged, security-gated, redirect-only or effectively empty, that inaccessibility was recorded as part of the finding.

What companies should do now

The practical lesson is not to write more content, but to make the company easier for AI systems to explain accurately.

Companies should treat AI legibility as a distinct layer of communications strategy - part positioning, part PR, part technical visibility, part proof architecture.

About SJK Labs

For businesses that know the work is strong, but suspect the market signal is weaker than it should be.

SJK Labs helps businesses turn expertise into authority infrastructure: clearer positioning, stronger proof, sharper media logic and a digital ecosystem that can hold up in an AI-first world.

If buyers, journalists, search engines and AI systems cannot clearly understand what makes you credible, someone less capable but more legible will fill the gap.

Want to see how legible your own business looks in AI?

The Legibility Audit measures how clearly AI systems can explain your business, where they lose the commercial difference, and what story a buyer is likely to carry into the next conversation.

See the AI Visibility & Narrative Audit Book a 30-minute call

The B2B AILegibility Report

AI is already shaping how companies are understood before a human reaches the site.

Narrative architecture is the underlying structure that holds a business story together.

The scoring framework SJK Labs developed to run this study consistently across 50 companies.

Headline problem patterns, average dimension scores and average platform scores

Category fit was the most common problem pattern.

Proof was easiest. Buyer logic was harder.

The same company can be legible in one system and collapse in another.

Seven things the audit shows

AI can often name the category, but it still loses the commercial difference.

Name collisions are now a board level communications risk.

The same company can be legible in one AI system and invisible in another.

AI often knows the previous version of the company better than the current one.

Proof exists, but AI often cannot use it.

Crawlability is now a communications problem, not just a technical one.

AI is weakest when asked to compare.

Snapshots

Zepto

Shift

Wonderful

Oakbrook

Ferovinum

Payhawk

Hippo Insurance

Tokenovate

Coinbase

AlphaSense

Moula

R3

Full scoring table

View the full scoring table

How the audit was run

Company selection

Platforms tested

Prompts used

Scoring framework

What the study measured

Treatment of first-party access

What companies should do now

For businesses that know the work is strong, but suspect the market signal is weaker than it should be.

Want to see how legible your own business looks in AI?

The B2B AI
Legibility Report