Distributed Scraper Network

Route every scrape job to the right node

puppetRouter is a centralized Puppeteer job-routing service. It manages a pool of headless and headful browser nodes — home machines, permanent droplets, and ephemeral workers — and routes each job to the least-loaded available node.

Open Dashboard → API Access

Features

Everything a scraper fleet needs

Built for long-running automotive inventory scrapers that can't afford downtime or IP bans.

⚖️

Capacity-aware routing

Each job is routed to the node with the most available capacity. Home machines are always preferred over DO workers.

🔄

Autoscaler

When the headless pool hits 85% saturation, ephemeral DigitalOcean droplets spin up automatically and destroy themselves when idle.

🖥️

Headless & headful modes

Home Windows nodes always maintain at least one headful slot — scraping continues in the background even while the machine is in use.

🛡️

VPN integrity checks

Each completed job's exit IP is compared against the expected Spectrum static IP. Any node scraping from a DO IP is flagged immediately.

🗄️

3-server MySQL failover

Configurable primary, secondary, and tertiary MySQL servers. getPoolWithFallback() tries each in order until one connects.

📊

Live dashboard

Node cards, pool utilization, job throughput, autoscaler log, VPN integrity, and runtime config toggles — all in one view.

How it works

From job request to browser launch

Four steps, fully automated.

Job submitted

A consumer posts POST /api/jobs/route with a mode (headless or headful) and an optional URL or domain.

Node selected

The router queries alive nodes, filters by mode and remaining capacity, then picks the least-loaded home node first.

Browser launched

The consumer receives the node's IP and port and launches Puppeteer against that target. A job row is created in the database.

Job completed

When done, the consumer calls POST /api/jobs/:id/complete. The session slot is freed and duration, exit IP, and any error are logged.

🔒

API access is by invitation only

puppetRouter is an internal infrastructure service. There is no public sign-up. API credentials are issued manually to authorized consumers — currently ssoScraper and select internal tooling.

If you're building a new scraper or data pipeline that needs routed Puppeteer sessions, reach out through the internal engineering channel to request access. Include your use case, expected job volume, and preferred mode (headless vs headful).

⚠ Unauthorized API requests are logged and rate-limited at the network level.