Scrapfly is the cloud-browser + scraping infrastructure many data teams use when sites enforce anti-bot protection. Connecting Scrapfly to clariBI surfaces structured page data, headless browser sessions, and crawl-extraction tooling inside AI analyses.
Why connect Scrapfly
When data behind anti-bot protection is needed for analysis, Tavily/Firecrawl-style search-grade tooling is not enough. Scrapfly's cloud-browser MCP gives the AI engine an escape hatch.
You can ask "Extract the product table from this site that blocks scrapers", "Render this single-page app and pull the JSON-LD", or "Crawl the docs subdomain and surface page titles" and the AI engine routes through Scrapfly.
How the connection works
clariBI talks to Scrapfly through its hosted MCP server at https://mcp.scrapfly.io/mcp. Authentication uses an OAuth flow that clariBI registers itself for (no developer console setup on your side). Tokens stay encrypted server-side and never leave clariBI in clear form.
sequenceDiagram
actor U as You
participant C as clariBI
participant V as Scrapfly
U->>C: Click Authorize with Scrapfly
C->>V: Open OAuth authorization
V-->>U: Grant read access?
U->>V: Approve
V-->>C: Authorization code
C->>V: Exchange code for tokens
V-->>C: Access + refresh tokens
C->>C: Encrypt and store credentials
C-->>U: Connection ready
Available tools
clariBI exposes the read-only Scrapfly tools that the vendor's MCP server publishes at connection time. Write operations (create, update, delete, send, refund) are filtered out by a name-pattern blocklist before any tool reaches the analysis engine, so connecting Scrapfly cannot modify data on the vendor side.
The exact tool inventory depends on the Scrapfly features your account has access to. After connecting, try a few natural-language questions to see what Scrapfly data clariBI can pull.
Data flow during analysis
When you ask a question that maps to Scrapfly, the AI engine routes to the right tool, reads the result, and pairs the answer with a chart you can pin to a dashboard.
sequenceDiagram
actor U as You
participant C as clariBI
participant AI as AI engine
participant V as Scrapfly
U->>C: Ask a question about cloud-browser scraping and structured extraction
C->>AI: Plan the analysis
AI->>V: Call the right tool
V-->>AI: Tool result
AI->>AI: Summarize and chart
C-->>U: Answer plus visual
Setting up the connection
- Open Data Sources in the clariBI sidebar.
- Click Add data source.
- Open the MCP Servers tab.
- Click the Scrapfly card.
- Click Authorize with Scrapfly.
- Sign in to Scrapfly in the popup window and grant the requested read scopes.
- Back in clariBI, give your data source a name.
- Click Finish.
Permissions and data access
OAuth scoping is granted at authorize time on the Scrapfly consent screen. clariBI restricts itself to read-only operations on the Scrapfly side; the data Scrapfly fetches comes from third-party sites you choose.
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
| "Quota exceeded" | Scrapfly enforces per-plan request quotas. | Upgrade your Scrapfly plan or narrow the scrape scope. |
| "Site blocks scraper" | Some sites use advanced anti-bot fingerprinting Scrapfly can't bypass for your plan tier. | Try a higher Scrapfly plan with browser sessions, or use the Firecrawl-flavoured Tavily/Apify path instead. |