Tutorial

Build a Web Scraper in 5 Minutes

Scrape any website with JavaScript rendering, pagination, and structured data extraction using BrowserFabric.

Most web scraping tools break on JavaScript-heavy sites. With BrowserFabric, you get a full Chromium browser in the cloud that renders everything — SPAs, dynamic content, lazy-loaded images.

Prerequisites

pip install browserfabric
export BROWSERFABRIC_API_KEY=bf_your_key_here

Step 1: Scrape a single page

import browserfabric
import asyncio

async def scrape_page():
    async with browserfabric.browser() as session:
        await session.navigate(
            "https://news.ycombinator.com",
            wait_until="networkidle"
        )

        # Extract data with JavaScript
        stories = await session.evaluate_js("""
            Array.from(document.querySelectorAll('.athing'))
                .slice(0, 10)
                .map(row => ({
                    title: row.querySelector('.titleline a')?.textContent,
                    url: row.querySelector('.titleline a')?.href,
                    rank: row.querySelector('.rank')?.textContent,
                }))
        """)

        for story in stories:
            print(f"{story['rank']} {story['title']}")

asyncio.run(scrape_page())

Step 2: Handle pagination

Use click and wait_for to navigate through pages:

async def scrape_with_pagination():
    async with browserfabric.browser() as session:
        await session.navigate("https://news.ycombinator.com")
        all_titles = []

        for page in range(3):
            titles = await session.evaluate_js("""
                Array.from(document.querySelectorAll('.titleline a'))
                    .map(a => a.textContent)
            """)
            all_titles.extend(titles)
            print(f"Page {page+1}: {len(titles)} titles")

            # Click "More" and wait for content
            try:
                await session.click("a.morelink")
                await session.wait_for(".athing")
            except:
                break

        print(f"Total: {len(all_titles)} titles")

Step 3: Use batch operations

For maximum efficiency, use the batch endpoint to run multiple operations in a single HTTP call:

curl -X POST https://api.browserfabric.com/api/v1/services/browseruse/batch \
  -H "Authorization: Bearer bf_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "YOUR_SESSION_ID",
    "operations": [
      {"tool_name": "navigate", "arguments": {"url": "https://example.com", "wait_until": "networkidle"}},
      {"tool_name": "evaluate_js", "arguments": {"expression": "document.title"}},
      {"tool_name": "take_screenshot", "arguments": {"full_page": true}}
    ]
  }'

Tips

  • Use wait_until: "networkidle" for JavaScript-heavy pages
  • Use wait_for before interacting with dynamically loaded elements
  • Use scroll to trigger lazy-loaded content before scraping
  • Save persistent contexts with persist=True to avoid re-authentication on sites that require login

Check out the full API documentation for all 28 available browser tools.