API Documentation
Full REST API reference for WebCrawler.buzz. Start crawls, track progress, retrieve results, and export data — all programmatically.
Base URL: https://webcrawler.buzz
/api/crawl
Start a new crawl. Validates the URL, creates a job, pushes to the crawl queue, and returns a tracking URL immediately.
Request Body (JSON)
{
"url": "https://example.com"
}
Response (201 — New Crawl)
{
"status": "started",
"job_id": "abc-123-def",
"tracking_url": "https://webcrawler.buzz/results?job_id=abc-123-def",
"message": "Crawl started! Use tracking_url to follow progress."
}
Response (200 — Domain Already Crawled)
{
"status": "exists",
"job_id": "xyz-789",
"domain": "example.com",
"total_pages": 42,
"tracking_url": "https://webcrawler.buzz/results?job_id=xyz-789",
"prompt": "Send action 're-crawl' or 'use-existing' to POST /api/crawl/decide"
}
/api/crawl/decide
When a domain was already crawled, choose to re-crawl or use existing results.
Request Body (JSON)
{
"domain": "example.com",
"action": "re-crawl" // or "use-existing"
}
/api/crawl/:job_id/progress
Lightweight progress polling. Returns current status and completion percentage.
Response
{
"job_id": "abc-123-def",
"status": "running",
"total_pages_found": 25,
"total_pages_queued": 100,
"progress_percent": 25
}
/api/crawl/:job_id?page=1&limit=50
Get paginated crawl results. Supports 50, 100, 500, or 1000 results per page.
Query Parameters
| Param | Type | Default | Description |
|---|---|---|---|
page | integer | 1 | Page number |
limit | integer | 50 | Results per page (50, 100, 500, 1000) |
/api/crawl/:job_id/notify
Register an email address to receive a notification when the crawl completes.
Request Body (JSON)
{
"email": "you@example.com"
}
/api/crawl/:job_id/export
Request a CSV export of the crawl results. A download link will be sent to the provided email address.
Request Body (JSON)
{
"email": "you@example.com"
}
Operational Endpoints
/api/health
Returns system health status including database and Redis connectivity.
/api/stats
Returns public usage statistics (total crawls, pages crawled, etc.).
⚠️ Rate Limits
POST endpoints are rate-limited to 200 requests per 15 minutes per IP. GET endpoints (progress polling, results) are not rate-limited.
