In 32 companies, at least one AI platform placed the business in the wrong or incomplete competitive frame.
The B2B AI
Legibility Report
How AI systems describe 50 regulated businesses.
An inaugural study in narrative legibility.
AI is already shaping how companies are understood before a human reaches the site.
Gartner reports that 67% of B2B buyers prefer a rep-free experience and 45% use AI in the process. G2 reports that 51% of buyers now start with AI more than Google. Forbes reports that 60% of purchases end with no clicks to website.
This inaugural study exists because narrative architecture is now the work of building a story that holds up under both human and machine reading. The findings are not really about what AI gets wrong, but about what happens when a company's narrative was built only for human readers.
23 out of 50 companies were coded with an entity recognition, name collision, disambiguation or wrong company issue.
26 out of 50 companies had a gap of at least 10 points between their best and worst platform score.
11 out of 50 companies were coded with a crawler, access, JavaScript, Cloudflare, gated site or homepage visibility issue.
Narrative architecture is the underlying structure that holds a business story together.
Narrative architecture is the underlying structure that holds a business story together - the decisions about who the story is for, what's at stake, and how it sounds. It's not the words on the page. It's the logic underneath them that determines whether those words do any work.
Most businesses have content. Very few have architecture. The difference shows up when a buyer tries to explain what you do to someone else, when AI tries to describe you, when a journalist tries to place you in a story. If the architecture is missing, the story collapses under those conditions, and no amount of copywriting fixes it.
The scoring framework SJK Labs developed to run this study consistently across 50 companies.
The Legibility Score is the scoring framework SJK Labs developed to run this study consistently across 50 companies. It measures how accurately AI systems describe a business when asked buyer style questions, scoring each platform response across six dimensions.
Each dimension is rated 1-5, giving a maximum of 30 points per platform and 120 overall. It matters because the way AI describes your business is your narrative now, whether you wrote it or not.
If a buyer asks ChatGPT what you do and gets a wrong answer, a generic answer, or someone else's answer entirely, that's the story they're carrying into every conversation with you. You didn't write it. You can't correct it in real time. And most businesses have no idea it's happening.
The Legibility Score makes that visible. It turns an invisible problem into a measurable one.
Headline problem patterns, average dimension scores and average platform scores
Source: SJK Labs audit dataset, n=50 companies. Manual problem-mode codes were derived from the audit notes.
Category fit was the most common problem pattern.
The most common score based problem was category fit. In 32 companies, at least one AI platform placed the business in the wrong or incomplete competitive frame.
Proof was easiest. Buyer logic was harder.
Across this financial and regulated dataset, proof and credibility scored highest on average. The weaker dimensions were customer pain point, category fit and differentiation.
The same company can be legible in one system and collapse in another.
Claude was the highest-scoring platform in this dataset, largely because it was better at synthesis and at preserving commercial context. But the platform picture is uneven.
Seven things the audit shows
These are the patterns the audit surfaces repeatedly across the dataset. They are the parts of the story most likely to break once AI systems become the first interpreter of a business.
AI can often name the category, but it still loses the commercial difference.
The central problem is not total ignorance. In many cases, AI systems can identify the broad category but the problem is that the reason to buy gets compressed.
Name collisions are now a board level communications risk.
The most damaging problems are wrong company answers.
The same company can be legible in one AI system and invisible in another.
AI perception is a landscape. A company can score strongly in Claude and collapse in Gemini. It can be understood by Perplexity on one question and confused with a city, food brand or algorithm on the next.
AI often knows the previous version of the company better than the current one.
The audit repeatedly shows that AI does not update just because a homepage changes. It updates when the new story is repeated, corroborated and indexed across the broader authority layer.
Proof exists, but AI often cannot use it.
Many companies had strong proof on their websites and AI still missed it.
Crawlability is now a communications problem, not just a technical one.
When AI systems cannot read a company’s own website, they do not stop answering. They answer from whatever is available.
AI is weakest when asked to compare.
The alternatives question exposed one of the clearest buyer risk patterns. When companies do not define their competitive set, AI fills the gap with scraped database logic, category adjacent guesses, outdated competitors or entirely wrong industries.
Snapshots
The full audits are the dataset. The public report uses a smaller set of named examples to make each problem mode tangible. These examples are not a ranking; they are evidence of different AI problem patterns.
Zepto
Entity collisionAll four models described the Indian quick-commerce unicorn. None identified the Australian account-to-account payments infrastructure business at zepto.com.au.
Shift
Generic name riskAcross 24 responses, no platform identified Shift Australia. The models defaulted to browsers, insurance AI, used cars, keyboard keys and calculator functions.
Wonderful
Wrong companyAI described a Californian pistachio conglomerate, an AI agent platform, or the adjective “wonderful”. No platform found the UK Pay by Bank business.
Oakbrook
CrawlabilityThe homepage was Cloudflare-blocked, so AI reconstructed the UK fintech from third-party fragments, unrelated namesakes and criticism.
Ferovinum
Freshness lagModels understood the funded wine and spirits fintech, but they were one rebrand behind: Fero, ferodrinks.com, FLOW and updated metrics were largely invisible.
Payhawk
Narrative wedgeAI recognised enterprise spend management, but often missed the key wedge: Payhawk’s ability to manage existing corporate card programs rather than forcing migration.
Hippo Insurance
Outdated storyMost models still described the legacy smart-home carrier story, while the current site presents a 70+ carrier quote-comparison agency.
Tokenovate
Technical flatteningAI captured post-trade automation and CDM, but often flattened the edge: nine CDM-native workflows, the Unified Trade Record and Novat as settlement-act tokenisation.
Coinbase
First-party accessAI could describe the product surface accurately, but it could not read the homepage. One platform even made a false security claim that a first-party incident page could have corrected.
AlphaSense
Proof underusedModels captured the category and scale, but flattened the proprietary content moat: Tegus, Wall Street Insights, no-hallucination architecture and Gartner MQ proof were underused.
Moula
Disambiguation driftWhen the model found the lender, it often described the category correctly. On softer questions, some systems drifted into the Arabic term “Mawla” or older loan ceilings.
R3
Blocked homepageBecause r3.com was effectively unreadable to non-JavaScript retrieval, one platform reconstructed the enterprise blockchain company, another asked for clarification, and one thought “R3” meant a game controller button.
Full scoring table
Companies are sorted by average total score across the four platforms. Each platform total is out of 30.
View the full scoring table
The separate scoring-table page includes the full ranking table from the report itself, including ChatGPT, Claude, Gemini and Perplexity scores, plus average score and spread.
How the audit was run
The study measured AI generated explanations of companies. It did not measure actual buyer behaviour, conversion, search ranking, website traffic, media coverage, analyst opinion or commercial performance.
Company selection
The study audited 50 regulated, financial, fintech, insurance, wealth, payments, crypto and capital intensive businesses. Companies were selected to represent a range of visibility levels, naming risk, technical complexity and narrative maturity.
Platforms tested
Each company was tested across four web-search-enabled AI platforms: ChatGPT, Claude, Gemini and Perplexity. The study does not attempt to create a sterile model benchmark; it measures how a buyer, journalist, analyst or partner would experience each platform in normal use.
Prompts used
Each company was tested using the same six buyer style questions: what does this company do, who is it for, what problem does it solve, what makes it different, is it credible, and who are the main alternatives?
Scoring framework
Responses were scored using the SJK Labs Legibility Score across six dimensions: clarity, accuracy, differentiation, customer pain point, proof and credibility, and category fit. Each dimension was scored from 1 to 5, giving each platform a maximum of 30 points per company.
What the study measured
In several cases, the AI’s answer was not simply incomplete; it returned the wrong company, the wrong category, stale positioning, irrelevant competitors or third-party criticism without first party counterweight.
Treatment of first-party access
For each company, first party site content was treated as the primary ground truth where available. Where the company website was blocked, JavaScript-walled, cookie-walled, Cloudflare-challenged, security-gated, redirect-only or effectively empty, that inaccessibility was recorded as part of the finding.
What companies should do now
The practical lesson is not to write more content, but to make the company easier for AI systems to explain accurately.
Companies should treat AI legibility as a distinct layer of communications strategy - part positioning, part PR, part technical visibility, part proof architecture.
For businesses that know the work is strong, but suspect the market signal is weaker than it should be.
SJK Labs helps businesses turn expertise into authority infrastructure: clearer positioning, stronger proof, sharper media logic and a digital ecosystem that can hold up in an AI-first world.
If buyers, journalists, search engines and AI systems cannot clearly understand what makes you credible, someone less capable but more legible will fill the gap.
Want to see how legible your own business looks in AI?
The Legibility Audit measures how clearly AI systems can explain your business, where they lose the commercial difference, and what story a buyer is likely to carry into the next conversation.