www.crunchbase.comscanned May 25, 2026 · 09:191.64s
Public AI visibility report

www.crunchbase.comAI visibilityPoor

This site is difficult for AI tools to read right now.

Key strengths include structured data, while homepage access and crawler policy need attention.

Recommended next step

remove AI crawler Disallow: / rules or replace them with narrower path-level restrictions for private content only.

Why monitor after one scan?

AI visibility changes when teams ship new pages, edit pricing or docs, update sitemaps, or change crawler rules. Weekly monitoring catches those silent regressions before answer engines and agents start reading stale or broken signals.

Monitor weekly

Overall score

17/100
Poor
Go to fixesOverall position486 out of 491Leaderboard

// score breakdown

Points by check

8 checks

Crawlability0/20
Robots.txt0/15
llms.txt0/15
Sitemap0/10
Markdown support0/15
Semantic HTML7.1/10
Structured data10/10
Content signals0/5
1pass1warn6fail

Public link

llmscan.dev/scan/xaWMmvSyDBVcGIN94IOLY

Signals checked

8 AI visibility signals

Fix bundle

4 copy-ready files

Share badge

Poor · 17/100

Add a polished proof badge

A compact badge for footer, press, or trust sections that links visitors to this public report.

Embed codellmscan.dev/scan/xaWMmvSyDBVcGIN94IOLY
<a href="https://www.llmscan.dev/scan/xaWMmvSyDBVcGIN94IOLY"
  target="_blank"
  rel="noopener"
>
  <img
    src="https://www.llmscan.dev/scan/xaWMmvSyDBVcGIN94IOLY/badge.png"
    alt="LLM Scan AI visibility score badge"
    width="460"
    height="120"
    style="width: 260px; max-width: 100%; height: auto;"
  />
</a>
Open badge
L
LLM Scan
Poor
Score
17/100

Share your score

Post the public report with: “We scored 17/100 for AI-readability.”

Download fixes

Grab generated files and implementation notes for the highest-impact gaps.

Rescan weekly

Save this domain to catch regressions after content, sitemap, or robots changes.

Monitor weekly

// signal breakdown

8 signals AI systems depend on

The homepage is reachable, but robots.txt blocks GPTBot, ChatGPT-User, Claude-Web, PerplexityBot, and Google-Extended from crawling the site.

Signal weight

0/20
Fail

Evidence

url
https://crunchbase.com/
finalUrl
https://www.crunchbase.com/
status
200

Recommendation

Next step: Remove AI crawler Disallow: / rules or replace them with narrower path-level restrictions for private content only.

robots.txt explicitly blocks GPTBot, ChatGPT-User, Claude-Web, PerplexityBot, and Google-Extended from the whole site.

Signal weight

0/15
Fail

Evidence

robotsTxtUrl
https://www.crunchbase.com/robots.txt
exists
true
rawRobotsTxt
User-agent: * # Allow API and JS paths to be requested by crawlers Allow: /v4/md/applications/crunchbase Allow: /*.js$ Disallow: /login Disallow: /register Disallow: /account Disallow: /account/invite Disallow: /reset-password Disallow: /subscriptions Disallow: /contribute Disallow: /add-new Disallow: /edit Disallow: /edit/success Disallow: /edit/review Disallow: /buy Allow: /buy/select-product Disallow: /account-setup Disallow: /verify Disallow: /admin Disallow: /v4 Disallow: /home Disallow: /search Disallow: /discover Disallow: /textsearch # AI and LLM Crawling User-agent: CCBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Omgilibot Disallow: / User-agent: Omgili Disallow: / User-agent: FacebookBot Disallow: / User-agent: Diffbot Disallow: / User-agent: Bytespider Disallow: / User-agent: ImagesiftBot Disallow: / User-agent: cohere-ai Disallow: / User-agent: Claude-Web Disallow: / User-agent: PerplexityBot Disallow: / Sitemap: https://www.crunchbase.com/www-sitemaps/sitemap-index.xml

Recommendation

Next step: Remove AI crawler Disallow: / rules or add narrower Allow/Disallow rules if AI crawlers should be able to discover public content.

No llms.txt file was found for this site.

Signal weight

0/15
Fail

Evidence

llmsTxtUrl
https://www.crunchbase.com/llms.txt
present
false
accessible
false

Recommendation

Next step: Publish /llms.txt as text or markdown with more than 200 characters, markdown headings, and at least one absolute URL.

No accessible XML sitemap was found for this site.

Signal weight

0/10
Fail

Evidence

sitemapUrl
https://www.crunchbase.com/sitemap.xml
sitemapUrls
[https://www.crunchbase.com/sitemap.xml, https://www.crunchbase.com/www-sitemaps/sitemap-index.xml]
robotsSitemapUrls
[https://www.crunchbase.com/www-sitemaps/sitemap-index.xml]

Recommendation

Next step: Publish a valid XML sitemap at /sitemap.xml and reference it from robots.txt so crawlers and AI systems can discover important URLs.

The homepage returned HTML when requested with Accept: text/markdown, so the server appears to ignore markdown content negotiation.

Signal weight

0/15
Fail

Evidence

url
https://www.crunchbase.com/
acceptHeader
text/markdown
status
200

Recommendation

Next step: Add content negotiation for Accept: text/markdown on the homepage and return a markdown representation with Content-Type: text/markdown. Keep the HTML response for regular browser requests.

The homepage has some semantic HTML signals, but one or more title, metadata, heading, landmark, content, or link text checks need improvement.

Signal weight

7/10
Warn

Evidence

url
https://www.crunchbase.com/
quality
partial
score
71

Recommendation

Next step: Add a meta description between 50 and 160 characters. Add missing semantic elements: main, article, nav, footer.

Valid JSON-LD structured data was found with core Organization or WebSite schema.org types.

Signal weight

10/10
Pass

Evidence

url
https://www.crunchbase.com/
quality
good
hasStructuredData
true

Content-Signal directive not detected in headers, HTML metadata, or robots.txt.

Signal weight

0/5
Fail

Evidence

url
https://www.crunchbase.com/
hasContentSignals
false
hasContentSignalHeader
false

Recommendation

Next step: Add the standard directive 'Content-Signal: ai-train=no, search=yes, ai-input=yes' to robots.txt, HTML metadata, or HTTP headers so AI systems can discover content usage preferences.

// generated fixes

Downloadable fix files

Preview the generated files below. Enter your email to reveal the full fixes, download the bundle, or copy the agent-ready implementation prompt.

Done-for-you

Agency package

Not sure how to ship the technical fixes? Book a call and we can help turn this report into implemented updates.

Fix planning from your scan

Implementation guidance

AI visibility monitoring

llms.txtMarkdown
# Crunchbase > Use this file to orient AI systems to the site's public content, canonical URLs, and crawler expectations. This llms.txt file summarizes the public, canonical resources that AI assistants and crawlers should use to understand this site. ## Site Overview - Canonical URL: https://www.crunchbase.com/- Site type: web site
robots.txtTXT
# robots.txt additions# Copy these blocks into the existing robots.txt file. Keep current rules unless a note calls out a conflicting Disallow. # AI crawler access# Add explicit Allow rules for blocked AI crawlers; remove or narrow conflicting Disallow rules if your crawler target requires precedence.User-agent: GPTBotAllow: / User-agent: ChatGPT-UserAllow: /
schema.jsonJSON
{  "@context": "https://schema.org",  "@graph": [    {      "@type": "Organization",      "@id": "https://www.crunchbase.com/#organization",      "name": "Crunchbase",      "description": "Website for Crunchbase.",      "url": "https://www.crunchbase.com/",      "logo": "https://images.crunchbase.com/image/upload/c_pad,h_25,w_25,f_auto,b_white,q_auto:eco,dpr_1/c2905ebc906840f98955e4acd3d6fbb0?ik-sanitizeSvg=true",
head metaHTML
# Content-Signal recommendations Use these directives to make AI-use preferences explicit for compliant crawlers and AI systems. They are advisory signals, so keep them aligned with robots.txt, terms, and access controls. ## Recommended values - ai-train=no: AI model training, fine-tuning, and dataset creation.- search=yes: AI search indexing, snippets, and discovery.- ai-input=yes: AI answer grounding, retrieval, and generated-response context.