Crawlability
Checks whether the homepage is publicly reachable, returns a successful status, and avoids blockers that prevent agents from loading the page.
Documentation
LLM Scan audits the public signals AI systems rely on: crawlability, machine-readable files, structured content, and generated fixes your team can ship.
Scan model
Each scan turns a public URL into a report with a score, eight signal checks, recommendations, generated artifacts, and a shareable link.
Checks whether the homepage is publicly reachable, returns a successful status, and avoids blockers that prevent agents from loading the page.
Looks for clear crawler permissions and sitemap discovery. Use it to guide crawlers, not to publish private instructions.
Checks for a concise Markdown map at the site root. Treat it as helpful agent context, not as a guaranteed ranking or citation signal.
Verifies canonical URL discovery so crawlers and agents can find important public pages without guessing.
Rewards clean, text-first content paths that reduce rendering noise for retrieval and agent workflows.
Checks headings, landmarks, links, and page structure so extraction systems can understand the document hierarchy.
Looks for JSON-LD and schema.org-style entity hints that reduce ambiguity around products, organizations, articles, and pages.
Checks whether the page clearly states what the product does, who it is for, and which facts agents should trust.
Generated fixes
Reports are useful immediately, but full fix access is the handoff. Visitors see partial blurred previews on-page, then provide an email to reveal the full files, download the bundle, or copy the agent-ready prompt.
LLM and agent guidance
There is no single magic file for AI visibility. Strong results come from crawlable pages, consistent entity signals, clean structure, and concise canonical context.
Important pages should return meaningful HTML on the first response: product summary, pricing context, docs, support, security, and policy information.
Why it matters
Crawlers and retrieval systems often extract the initial document. If the page is mostly an app shell, the model sees navigation chrome instead of your actual answer.
Avoid
Do not put key claims only in client-rendered tabs, modals, carousels, screenshots, or gated dashboards.
Start pages and sections with clear definitions, outcomes, constraints, and examples before long persuasive copy.
Why it matters
Agents summarize by extracting claims. A concise first paragraph makes the page easier to cite and less likely to be misread.
Avoid
Avoid vague hero copy that never says what the product is, who it is for, what it costs, or what problem it solves.
Keep public URLs for docs, pricing, changelog, security, contact, legal pages, and product pages stable over time.
Why it matters
Stable URLs accumulate crawler memory, references, embeddings, and citations. Frequent URL churn makes agents less confident.
Avoid
Avoid hiding canonical facts behind query parameters, session state, temporary campaign pages, or redirect chains.
Use headings like Pricing, API authentication, Supported crawlers, Data retention, and Enterprise security.
Why it matters
Headings are structural labels for extraction. Specific headings help models map a fact to the right topic.
Avoid
Avoid generic headings like More, Details, Resources, or Learn more when the section contains specific facts.
Maintain sitemap.xml, robots.txt, llms.txt, JSON-LD, canonical tags, metadata, and markdown/text-friendly content where it helps.
Why it matters
These files reduce discovery cost and ambiguity. They should point agents toward the same public facts users see.
Avoid
Do not invent schema, stuff keywords, or publish an llms.txt that disagrees with the real site.
Use the same product name, company name, description, pricing language, audience, and support contacts across homepage, docs, metadata, JSON-LD, and llms.txt.
Why it matters
Contradictory facts force models to choose between sources. Consistency helps answer engines summarize you cleanly.
Avoid
Avoid old taglines, stale pricing, mismatched social links, or different product names across pages.
Use robots.txt for crawler access, authentication for private data, and content-signal language for AI usage preferences such as search, grounding, and training.
Why it matters
Agents need to know both what they can crawl and how public content may be used. These are related but not the same signal.
Avoid
Do not rely on robots.txt to protect private content. Anything private should require authentication.
Write pages so small excerpts still make sense: include nouns, context, units, dates, product names, and links near the facts they explain.
Why it matters
Retrieval systems often pass small chunks to a model. Self-contained chunks reduce hallucination and improve answer quality.
Avoid
Avoid pronoun-heavy copy, unsupported superlatives, unlabeled charts, image-only text, and facts separated from their source link.
Public API
The versioned public API supports scan creation, polling, report loading, generated fixes, domain history, pricing discovery, and OpenAPI-based tooling.
1. Create a key
Sign in, open Settings / API keys, create a key, and grant scans:write. Store it as LLMREADY_API_KEY.
2. Create a scan
Send POST /api/public/v1/scans with a public HTTP or HTTPS URL. Private network targets are rejected.
3. Poll and read
Poll the returned status URL until completion, then read the report and fixes endpoints for automation.
POST
/api/public/v1/scans
Requires `scans:write`, `write`, or `admin` scope. Use `visibility: "unlisted"` for shareable reports that should not appear in recent public scan feeds.
Request
{
"url": "https://example.com",
"options": {
"scanType": "full",
"visibility": "unlisted",
"includeRecommendations": true,
"includeGeneratedFixes": true
}
}Response
{
"data": {
"scanId": "scan_123",
"publicId": "scan_123",
"statusUrl": "/api/public/v1/scans/scan_123/status",
"reportUrl": "/scan/scan_123",
"estimatedCompletionSeconds": 60
},
"error": null
}GET
/api/public/v1/scans/:scanId/status
Poll until `status` is `completed` or `failed`. The status endpoint is public for public and unlisted scans.
Response
{
"data": {
"scanId": "scan_123",
"status": "running",
"progress": 45,
"domain": "example.com",
"estimatedCompletion": "2026-05-24T10:01:00.000Z"
},
"error": null
}GET
/api/public/v1/scans/:scanId
Returns score, rating, summary, check results, recommendations, and report metadata.
Response
{
"data": {
"publicId": "scan_123",
"domain": "example.com",
"status": "completed",
"score": 92,
"rating": "ai-ready",
"results": {
"issueCount": 0,
"checks": []
}
},
"error": null
}GET
/api/public/v1/scans/:scanId/fixes
Returns generated `llms.txt`, robots additions, JSON-LD, content signals, and check-level fix snippets when available.
Response
{
"data": {
"scanId": "scan_123",
"domain": "example.com",
"generated": {
"llmsTxt": "# Example"
},
"fixes": []
},
"error": null
}curl https://llmready.com/api/public/v1/scans \
-H "Authorization: Bearer $LLMREADY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"options": { "scanType": "full", "visibility": "unlisted" }
}'const response = await fetch("https://llmready.com/api/public/v1/scans", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.LLMREADY_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
url: "https://example.com",
options: { scanType: "full", visibility: "unlisted" }
})
});
const { data, error } = await response.json();import os
import requests
response = requests.post(
"https://llmready.com/api/public/v1/scans",
headers={"Authorization": f"Bearer {os.environ['LLMREADY_API_KEY']}"},
json={
"url": "https://example.com",
"options": {"scanType": "full", "visibility": "unlisted"},
},
timeout=30,
)
payload = response.json()Send API keys as Authorization: Bearer <key>. Keys are created in the dashboard and can be scoped with scans:write, write, or admin for scan creation. Public read endpoints do not require a bearer token.
Paid clients can omit the bearer token and send PAYMENT-SIGNATURE. If payment is required, the API returns 402 with an x402 challenge. Successful paid scan creation includes PAYMENT-RESPONSE settlement metadata.
MCP
Agent developers can connect directly to the streamable HTTP MCP endpoint and call tools backed by the same public API.
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "scan_url",
"arguments": {
"url": "https://example.com",
"scanType": "full",
"visibility": "unlisted"
}
}
}Reliability
All public API JSON responses use the `{ data, error, errorCode }` envelope. Successful and rate-limited responses include API version and rate-limit headers.
X-LLMReady-API-Version public API version.X-RateLimit-Limit bucket capacity.X-RateLimit-Remaining remaining requests.X-RateLimit-Reset Unix reset timestamp.Retry-After seconds to wait after a 429.{
"data": null,
"error": "Too many requests.",
"errorCode": "rate_limited"
}FAQ
Short answers for teams using LLM Scan to improve agent readability, not chase myths or one-off tricks.
No single file can guarantee citations. Use llms.txt as a useful, forward-compatible context map, then back it up with crawlable pages, strong entity signals, structured data, and clear answer-oriented content.
robots.txt is a crawler access file. llms.txt is a Markdown-oriented context file that points agents toward important public pages. One controls discovery permissions; the other explains what matters.
Structured data is not required for every page, but it helps reduce ambiguity when it describes real entities such as an organization, product, article, documentation page, or pricing page.
Agents and retrieval systems need to understand page hierarchy. Clear headings, landmarks, descriptive links, and server-rendered text make extraction more reliable than visual layout alone.
Generated fixes are ready-to-edit snippets based on the scan result: llms.txt, robots.txt additions, JSON-LD, and content-signal copy. They are intended as a strong starting point, not a blind deploy.
The report stays visible without an email, and the fixes appear as a partial blurred preview. We ask for email before revealing the full generated files, downloading the bundle, or copying the agent-ready prompt so we can send setup notes and follow-up product updates with unsubscribe controls.
No. The scanner is designed for public website visibility. If a page requires login, blocks crawlers, or depends entirely on private state, agents generally cannot rely on it either.
Rescan after publishing new docs, changing navigation, updating robots.txt or sitemap files, redesigning key pages, or shipping new product/pricing content. For active sites, weekly or monthly checks are a practical rhythm.
Aim for AI-Ready, but prioritize the failed high-impact checks first. A site with crawlable pages, sitemap discovery, clear content, and accurate structured data is usually in much better shape than one that only adds llms.txt.
Yes. Start scans with POST /api/public/v1/scans, poll /api/public/v1/scans/:scanId/status while the scan runs, and load the public report payload when it completes.
References
These are the external references behind several checks. LLM Scan treats experimental AI-specific files honestly and still prioritizes durable web fundamentals.
// One last check
Free forever. Save your reports with an account.