# ============================================================================== # clariBI Robots.txt # https://claribi.com/robots.txt # ============================================================================== # Default rules for all crawlers User-agent: * Allow: / Allow: /features/ Allow: /use-cases/ Allow: /integrations/ Allow: /templates/ Allow: /docs/ Allow: /blog/ Allow: /knowledge-base/ Allow: /pricing Allow: /contact Allow: /legal/ # Disallow non-public areas Disallow: /app/ Disallow: /admin/ Disallow: /api/ Disallow: /o/ Disallow: /static/admin/ Disallow: /media/private/ # /components/ holds the global navigation partial + global CSS + nav JS that # every marketing page loads. Block the partial HTMLs (so they don't appear as # orphan content) but allow the CSS and JS so Googlebot can render pages # correctly (Search Console flags blocked render resources). Disallow: /components/*.html$ Allow: /components/*.css$ Allow: /components/*.js$ # Disallow query parameters that create duplicate content Disallow: /*?*utm_ Disallow: /*?*ref= Disallow: /*?*source= # Block search results (thin/duplicate) and tag-filtered listings, # but ALLOW pagination — `?page=N` is how blog/KB index pages expose # pages 2+, and blocking it left 43 of 61 blog posts and 89 of 120 # KB articles invisible to Google (Search Console showed them as # "URL is unknown to Google"). Re-allowing /*?*page= so Google can # crawl deep content via pagination links. Disallow: /knowledge-base/search/ Disallow: /blog/*?*tag= Disallow: /knowledge-base/*?*tag= # Allow important meta files Allow: /sitemap.xml Allow: /robots.txt Allow: /llms.txt Allow: /llms-full.txt # No Crawl-delay — we want Google + Bing to crawl as fast as # possible. Crawl-delay was suppressing Bingbot (combined with the # low CrawlRate set in Bing Webmaster Tools, the site was getting # ~2 page-fetches per day, which is why Bing's sitemap-blog.xml # hadn't been re-crawled in 2 months as of May 2026). # ============================================================================== # Google-specific rules # ============================================================================== User-agent: Googlebot Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ Disallow: /o/ # /components/ holds the global navigation partial + global CSS + nav JS that # every marketing page loads. Block the partial HTMLs (so they don't appear as # orphan content) but allow the CSS and JS so Googlebot can render pages # correctly (Search Console flags blocked render resources). Disallow: /components/*.html$ Allow: /components/*.css$ Allow: /components/*.js$ User-agent: Googlebot-Image Allow: / Disallow: /app/ Disallow: /admin/ # ============================================================================== # Bing-specific rules # ============================================================================== User-agent: Bingbot Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ Disallow: /o/ # /components/ holds the global navigation partial + global CSS + nav JS that # every marketing page loads. Block the partial HTMLs (so they don't appear as # orphan content) but allow the CSS and JS so Googlebot can render pages # correctly (Search Console flags blocked render resources). Disallow: /components/*.html$ Allow: /components/*.css$ Allow: /components/*.js$ # No per-Bingbot Crawl-delay either — we explicitly want full crawl # speed. See note in the default ruleset above. # ============================================================================== # AI Training Bots - Allowed with guidelines # See /llms.txt for detailed AI interaction guidelines # ============================================================================== User-agent: GPTBot Allow: / Allow: /docs/ Allow: /blog/ Allow: /knowledge-base/ Allow: /features/ Allow: /use-cases/ Disallow: /app/ Disallow: /admin/ Disallow: /api/ User-agent: ChatGPT-User Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ User-agent: Claude-Web Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ User-agent: Anthropic-AI Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ User-agent: Google-Extended Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ User-agent: PerplexityBot Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ User-agent: Cohere-ai Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /api/ # ============================================================================== # Sitemap References # ============================================================================== Sitemap: https://claribi.com/sitemap.xml Sitemap: https://claribi.com/sitemap-pages.xml Sitemap: https://claribi.com/sitemap-blog.xml Sitemap: https://claribi.com/sitemap-features.xml Sitemap: https://claribi.com/sitemap-use-cases.xml Sitemap: https://claribi.com/sitemap-integrations.xml Sitemap: https://claribi.com/sitemap-docs.xml Sitemap: https://claribi.com/sitemap-knowledge-base.xml Sitemap: https://claribi.com/sitemap-kb-categories.xml Sitemap: https://claribi.com/sitemap-blog-categories.xml Sitemap: https://claribi.com/sitemap-legal.xml Sitemap: https://claribi.com/sitemap-compare.xml Sitemap: https://claribi.com/sitemap-calculators.xml # ============================================================================== # AI/LLM Guidance # ============================================================================== # For AI models and LLM crawlers, see /llms.txt for detailed guidance # on how to best represent clariBI information to users.