Documentation

Make your website readable by LLMs and agents

LLM Scan audits the public signals AI systems rely on: crawlability, machine-readable files, structured content, and generated fixes your team can ship.

Run a scan View sample report

Scan model

What LLM Scan checks

Each scan turns a public URL into a report with a score, eight signal checks, recommendations, generated artifacts, and a shareable link.

Crawlability

Checks whether the homepage is publicly reachable, returns a successful status, and avoids blockers that prevent agents from loading the page.

robots.txt

Looks for clear crawler permissions and sitemap discovery. Use it to guide crawlers, not to publish private instructions.

llms.txt

Checks for a concise Markdown map at the site root. Treat it as helpful agent context, not as a guaranteed ranking or citation signal.

Sitemap

Verifies canonical URL discovery so crawlers and agents can find important public pages without guessing.

Markdown support

Rewards clean, text-first content paths that reduce rendering noise for retrieval and agent workflows.

Semantic HTML

Checks headings, landmarks, links, and page structure so extraction systems can understand the document hierarchy.

Structured data

Looks for JSON-LD and schema.org-style entity hints that reduce ambiguity around products, organizations, articles, and pages.

Content signals

Checks whether the page clearly states what the product does, who it is for, and which facts agents should trust.

Generated fixes

From report to shippable files

Reports are useful immediately, but full fix access is the handoff. Visitors see partial blurred previews on-page, then provide an email to reveal the full files, download the bundle, or copy the agent-ready prompt.

Downloadable fix bundle

`llms.txt` starter file with priority URLs and AI crawler guidance
`robots.txt` additions with sitemap references and crawler permissions
JSON-LD snippets for organization, website, product, or article context
Content-signal copy for clearer positioning, audience, and canonical facts

LLM and agent guidance

Best practices for being understood

There is no single magic file for AI visibility. Strong results come from crawlable pages, consistent entity signals, clean structure, and concise canonical context.

Serve readable content before JavaScript

Important pages should return meaningful HTML on the first response: product summary, pricing context, docs, support, security, and policy information.

Why it matters

Crawlers and retrieval systems often extract the initial document. If the page is mostly an app shell, the model sees navigation chrome instead of your actual answer.

Avoid

Do not put key claims only in client-rendered tabs, modals, carousels, screenshots, or gated dashboards.

Lead with direct answers

Start pages and sections with clear definitions, outcomes, constraints, and examples before long persuasive copy.

Why it matters

Agents summarize by extracting claims. A concise first paragraph makes the page easier to cite and less likely to be misread.

Avoid

Avoid vague hero copy that never says what the product is, who it is for, what it costs, or what problem it solves.

Use stable canonical URLs

Keep public URLs for docs, pricing, changelog, security, contact, legal pages, and product pages stable over time.

Why it matters

Stable URLs accumulate crawler memory, references, embeddings, and citations. Frequent URL churn makes agents less confident.

Avoid

Avoid hiding canonical facts behind query parameters, session state, temporary campaign pages, or redirect chains.

Make headings describe the answer

Use headings like Pricing, API authentication, Supported crawlers, Data retention, and Enterprise security.

Why it matters

Headings are structural labels for extraction. Specific headings help models map a fact to the right topic.

Avoid

Avoid generic headings like More, Details, Resources, or Learn more when the section contains specific facts.

Publish machine-readable context

Maintain sitemap.xml, robots.txt, llms.txt, JSON-LD, canonical tags, metadata, and markdown/text-friendly content where it helps.

Why it matters

These files reduce discovery cost and ambiguity. They should point agents toward the same public facts users see.

Avoid

Do not invent schema, stuff keywords, or publish an llms.txt that disagrees with the real site.

Keep entity facts consistent

Use the same product name, company name, description, pricing language, audience, and support contacts across homepage, docs, metadata, JSON-LD, and llms.txt.

Why it matters

Contradictory facts force models to choose between sources. Consistency helps answer engines summarize you cleanly.

Avoid

Avoid old taglines, stale pricing, mismatched social links, or different product names across pages.

Separate permissions from preferences

Use robots.txt for crawler access, authentication for private data, and content-signal language for AI usage preferences such as search, grounding, and training.

Why it matters

Agents need to know both what they can crawl and how public content may be used. These are related but not the same signal.

Avoid

Do not rely on robots.txt to protect private content. Anything private should require authentication.

Design for retrieval, not just ranking

Write pages so small excerpts still make sense: include nouns, context, units, dates, product names, and links near the facts they explain.

Why it matters

Retrieval systems often pass small chunks to a model. Self-contained chunks reduce hallucination and improve answer quality.

Avoid

Avoid pronoun-heavy copy, unsupported superlatives, unlabeled charts, image-only text, and facts separated from their source link.

A practical publishing checklist

Every important public page has one clear purpose and a descriptive H1.
The first paragraph explains the page without needing surrounding navigation.
Pricing, docs, support, security, and legal pages are linked from stable URLs.
sitemap.xml lists canonical public pages and robots.txt references the sitemap.
JSON-LD describes real entities and matches visible page content.
llms.txt links only to useful canonical resources and stays short enough to maintain.
Content that answers common buyer or developer questions is text, not image-only.
Generated metadata, Open Graph copy, and page headings use the same product facts.

Public API

Integrate scans into your workflow

The versioned public API supports scan creation, polling, report loading, generated fixes, domain history, pricing discovery, and OpenAPI-based tooling.

1. Create a key

2. Create a scan

Send POST /api/public/v1/scans with a public HTTP or HTTPS URL. Private network targets are rejected.

3. Poll and read

Poll the returned status URL until completion, then read the report and fixes endpoints for automation.

MethodEndpointAuthDescription

POST/api/public/v1/scansAPI key or x402Start a public AI visibility scan for a URL.

GET/api/public/v1/scans/:scanIdNoneRead a public scan report payload.

GET/api/public/v1/scans/:scanId/statusNonePoll scan status while a report is being generated.

GET/api/public/v1/scans/:scanId/fixesNoneFetch generated remediation artifacts for a public scan.

GET/api/public/v1/scans/recentNoneFetch recent public reports for social proof, examples, and leaderboard-style displays.

GET/api/public/v1/domains/:domain/historyNoneRead historical public scan scores for a domain.

GET/api/v1/scans/recentNoneFetch recent public reports for social proof, examples, and leaderboard-style displays.

POST/api/fix-downloadsNoneCapture email before full generated-fix access, downloads, or agent prompt copying.

GET/api/public/v1/pricingNoneDiscover x402 pricing metadata and accepted payment assets.

GET/api/public/v1/openapi.jsonNoneDownload the OpenAPI 3.0 source of truth.

GET/api/public/v1/postman.jsonNoneDownload a Postman collection generated from the OpenAPI spec.

OpenAPI 3.0 specification Postman collection Agent discovery metadata

POST

Create a scan

/api/public/v1/scans

Requires `scans:write`, `write`, or `admin` scope. Use `visibility: "unlisted"` for shareable reports that should not appear in recent public scan feeds.

Request

{
  "url": "https://example.com",
  "options": {
    "scanType": "full",
    "visibility": "unlisted",
    "includeRecommendations": true,
    "includeGeneratedFixes": true
  }
}

Response

{
  "data": {
    "scanId": "scan_123",
    "publicId": "scan_123",
    "statusUrl": "/api/public/v1/scans/scan_123/status",
    "reportUrl": "/scan/scan_123",
    "estimatedCompletionSeconds": 60
  },
  "error": null
}

GET

Poll scan status

/api/public/v1/scans/:scanId/status

Poll until `status` is `completed` or `failed`. The status endpoint is public for public and unlisted scans.

Response

{
  "data": {
    "scanId": "scan_123",
    "status": "running",
    "progress": 45,
    "domain": "example.com",
    "estimatedCompletion": "2026-05-24T10:01:00.000Z"
  },
  "error": null
}

GET

Read a report

/api/public/v1/scans/:scanId

Returns score, rating, summary, check results, recommendations, and report metadata.

Response

{
  "data": {
    "publicId": "scan_123",
    "domain": "example.com",
    "status": "completed",
    "score": 92,
    "rating": "ai-ready",
    "results": {
      "issueCount": 0,
      "checks": []
    }
  },
  "error": null
}

GET

Read generated fixes

/api/public/v1/scans/:scanId/fixes

Returns generated `llms.txt`, robots additions, JSON-LD, content signals, and check-level fix snippets when available.

Response

{
  "data": {
    "scanId": "scan_123",
    "domain": "example.com",
    "generated": {
      "llmsTxt": "# Example"
    },
    "fixes": []
  },
  "error": null
}

cURL

curl https://llmready.com/api/public/v1/scans \
  -H "Authorization: Bearer $LLMREADY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "options": { "scanType": "full", "visibility": "unlisted" }
  }'

JavaScript

const response = await fetch("https://llmready.com/api/public/v1/scans", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.LLMREADY_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    options: { scanType: "full", visibility: "unlisted" }
  })
});

const { data, error } = await response.json();

Python

import os
import requests

response = requests.post(
    "https://llmready.com/api/public/v1/scans",
    headers={"Authorization": f"Bearer {os.environ['LLMREADY_API_KEY']}"},
    json={
        "url": "https://example.com",
        "options": {"scanType": "full", "visibility": "unlisted"},
    },
    timeout=30,
)

payload = response.json()

API key authentication

Send API keys as Authorization: Bearer <key>. Keys are created in the dashboard and can be scoped with scans:write, write, or admin for scan creation. Public read endpoints do not require a bearer token.

x402 payment authentication

Paid clients can omit the bearer token and send PAYMENT-SIGNATURE. If payment is required, the API returns 402 with an x402 challenge. Successful paid scan creation includes PAYMENT-RESPONSE settlement metadata.

MCP

Agent integration guide

Agent developers can connect directly to the streamable HTTP MCP endpoint and call tools backed by the same public API.

Endpoint and tools

`GET /mcp` returns discovery metadata, tools, x402 support, pricing URL, and pricing hash.
`POST /mcp` accepts JSON-RPC 2.0 requests for `initialize`, `tools/list`, and `tools/call`.
`scan_url` creates a public scan and returns status/report URLs plus payment receipt data when paid.
`get_scan_status` reads progress for a scan id.
`get_scan_report` returns the score, summary, issue count, and top failing checks.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "scan_url",
    "arguments": {
      "url": "https://example.com",
      "scanType": "full",
      "visibility": "unlisted"
    }
  }
}

Reliability

Rate limiting and error handling

All public API JSON responses use the `{ data, error, errorCode }` envelope. Successful and rate-limited responses include API version and rate-limit headers.

Rate-limit headers

X-LLMReady-API-Version public API version.
X-RateLimit-Limit bucket capacity.
X-RateLimit-Remaining remaining requests.
X-RateLimit-Reset Unix reset timestamp.
Retry-After seconds to wait after a 429.

{
  "data": null,
  "error": "Too many requests.",
  "errorCode": "rate_limited"
}

400invalid_requestMalformed JSON, unsupported scan option, invalid URL, private IP, or unsupported protocol.

401invalid_api_keyMissing, malformed, revoked, expired, or unknown bearer API key.

402payment_requiredNo valid x402 `PAYMENT-SIGNATURE` was provided for a paid access flow.

403insufficient_scopeThe API key is valid but does not grant the required scope.

404not_foundThe scan or domain does not exist or is not publicly accessible.

429rate_limitedThe request exceeded the endpoint bucket. Respect `Retry-After` and `X-RateLimit-Reset`.

500server_errorUnexpected server-side failure. Retry idempotent reads with backoff.

FAQ

Common questions

Short answers for teams using LLM Scan to improve agent readability, not chase myths or one-off tricks.

Will adding llms.txt make AI tools cite my site?

No single file can guarantee citations. Use llms.txt as a useful, forward-compatible context map, then back it up with crawlable pages, strong entity signals, structured data, and clear answer-oriented content.

What is the difference between robots.txt and llms.txt?

robots.txt is a crawler access file. llms.txt is a Markdown-oriented context file that points agents toward important public pages. One controls discovery permissions; the other explains what matters.

Do I need structured data for LLMs?

Structured data is not required for every page, but it helps reduce ambiguity when it describes real entities such as an organization, product, article, documentation page, or pricing page.

Why does the scan care about semantic HTML?

Agents and retrieval systems need to understand page hierarchy. Clear headings, landmarks, descriptive links, and server-rendered text make extraction more reliable than visual layout alone.

What are generated fixes?

Generated fixes are ready-to-edit snippets based on the scan result: llms.txt, robots.txt additions, JSON-LD, and content-signal copy. They are intended as a strong starting point, not a blind deploy.

Why ask for email before viewing or copying fixes?

The report stays visible without an email, and the fixes appear as a partial blurred preview. We ask for email before revealing the full generated files, downloading the bundle, or copying the agent-ready prompt so we can send setup notes and follow-up product updates with unsubscribe controls.

Can LLM Scan crawl private or authenticated pages?

No. The scanner is designed for public website visibility. If a page requires login, blocks crawlers, or depends entirely on private state, agents generally cannot rely on it either.

How often should I rescan a site?

Rescan after publishing new docs, changing navigation, updating robots.txt or sitemap files, redesigning key pages, or shipping new product/pricing content. For active sites, weekly or monthly checks are a practical rhythm.

What score should I aim for?

Aim for AI-Ready, but prioritize the failed high-impact checks first. A site with crawlable pages, sitemap discovery, clear content, and accurate structured data is usually in much better shape than one that only adds llms.txt.

Can I use the API in my own workflow?

Yes. Start scans with POST /api/public/v1/scans, poll /api/public/v1/scans/:scanId/status while the scan runs, and load the public report payload when it completes.

References

Useful standards and guidance

These are the external references behind several checks. LLM Scan treats experimental AI-specific files honestly and still prioritizes durable web fundamentals.

Google Search Central: robots.txt interpretation sitemaps.org protocol Google structured data introduction llms.txt reference and adoption notes

Make your website readable by LLMs and agents

What LLM Scan checks

Crawlability

robots.txt

llms.txt

Sitemap

Markdown support

Semantic HTML

Structured data

Content signals

From report to shippable files

Downloadable fix bundle

Best practices for being understood

Serve readable content before JavaScript

Lead with direct answers

Use stable canonical URLs

Make headings describe the answer

Publish machine-readable context

Keep entity facts consistent

Separate permissions from preferences

Design for retrieval, not just ranking

A practical publishing checklist

Integrate scans into your workflow

Create a scan

Poll scan status

Read a report

Read generated fixes

cURL

JavaScript

Python

API key authentication

x402 payment authentication

Agent integration guide

Endpoint and tools

Rate limiting and error handling

Rate-limit headers

Common questions

Useful standards and guidance

Scan your site before your customers' agents do.