About WebCrawler.buzz — Free Web Crawler Online

Name: WebCrawler.buzz
Author: Hasibul Islam

A free, open-source web crawler online that anyone can use, inspect, and contribute to. Domain crawler built with transparency and trust in mind.

What is WebCrawler.buzz?

WebCrawler.buzz is a production-grade web crawler online and domain crawler online tool that discovers every publicly accessible page on any website. Use it to crawl website online and extract page titles, meta descriptions, HTTP status codes, response times, content types, page sizes, internal/external link counts, redirect chains, and indexability status.

It's designed for SEO professionals, developers, and website owners who need a quick, reliable free website crawler — without installing desktop software, creating accounts, or paying subscription fees.

💡 This is NOT affiliated with the original 1990s WebCrawler search engine. This is an independent, modern, open-source SEO tool built from scratch.

How It Works

Paste any URL into the search bar and the crawler starts a breadth-first scan of the entire domain. It follows every internal link it finds, building a complete map of your site as it goes — no depth limits, no page caps.

For each page discovered, you get actionable data: the page title, meta description, HTTP status code, server response time, content type, word count, and whether the page is indexable. If there are redirect chains, the crawler traces them to the final destination.

When the crawl finishes, you can browse the results in your browser or export everything as a CSV file delivered to your inbox. The whole process runs in the cloud — nothing to install, no account required.

🔓 Fully Open Source

Every line of code is publicly available on GitHub under the MIT License. You can read it, fork it, audit it, or self-host it.

Open to contributions

🧱 Tech Stack

Built with battle-tested, production-grade technologies:

Runtime Node.js v20+ with TypeScript (strict mode)

Framework Express.js v5

Database PostgreSQL via Neon (serverless)

Queue BullMQ + Redis (persistent job queue)

Crawling Axios + Cheerio (HTML parsing)

Email Nodemailer + Gmail SMTP

Security Helmet, CORS, Rate Limiting

API Docs Swagger / OpenAPI 3.0

🔒 Security & Privacy

We take security seriously. Here's what we do to keep your data safe:

✓ Ephemeral data — All crawl results are automatically deleted after 1 hour. CSV exports are deleted after 5 minutes. We don't store your data permanently.
✓ No tracking or profiling — We use Google Analytics only for basic page views. We don't build user profiles or sell data.
✓ Parameterized queries — All database queries use parameterized inputs. SQL injection is impossible.
✓ HTTP security headers — Helmet.js applies HSTS, XSS protection, content-type sniffing prevention, and more.
✓ Rate limiting — API endpoints are rate-limited to prevent abuse (200 POST requests per 15 min per IP).
✓ Responsible disclosure — Security vulnerabilities can be reported via our security.txt contact.

⚙️ Under the Hood — How the Crawl Pipeline Works

A step-by-step look at the entire process — from the moment you paste a URL to the CSV landing in your inbox.

URL Submission & Validation

You paste a URL and hit "Start Crawl." The Express.js server validates the URL format, extracts the root domain, and checks PostgreSQL for an existing crawl on the same domain. If a recent crawl exists, you're asked whether to re-crawl or use existing results — no wasted work.

Job Creation & Queuing

A new crawl job is inserted into PostgreSQL (status: pending) and immediately pushed to a BullMQ queue backed by Redis. You get a unique tracking URL you can bookmark or share — close the tab anytime and come back later.

BFS Crawl Execution

A crawler worker (concurrency: 5 parallel crawls) picks up the job and runs breadth-first search scoped to the root domain. For every page, Axios fetches the HTML (15s timeout, custom User-Agent) and Cheerio parses it to extract:

Page title & meta description
HTTP status code & server response time
Content type, page size, word count
Internal & external link counts
Redirect chains traced to final destination
Indexability status (noindex, canonical tags)

Each discovered internal link is normalized, deduplicated, and added to the BFS queue. Each crawled page is atomically inserted into PostgreSQL with counters updated in real time. A 50ms delay between requests keeps the target server safe.

Real-Time Progress Tracking

Your browser polls the GET /api/crawl/:job_id/progress endpoint every 5 seconds. The progress bar shows pages crawled vs. pages discovered — so you always know how far along the crawl is. Job status transitions: pending → running → completed.

Results & Email Notification

Once the BFS queue is exhausted, the job is finalized (status: completed). If you provided an email, a styled HTML notification is sent via Gmail SMTP with a "View Full Results" button. There's a race-condition fix too: if the crawl finishes before you submit your email, the notification fires immediately after you add it.

Paginated Results View

Browse your crawl results in a paginated, searchable table with 50, 100, 500, or 1,000 results per page. Every column is visible: URL, title, meta description, HTTP status, response time, content type, page size, internal/external links, redirect URL, and indexability.

CSV Export Pipeline

Click "Export CSV" and a second BullMQ queue (csv-export, concurrency: 2) processes the request. The export worker fetches all crawled pages from PostgreSQL, builds a 12-column CSV in memory (URL, title, meta, status, response time, content type, size, internal links, external links, redirect, indexable, timestamp), writes it to disk, and sends you a download link via email. Export deduplication prevents duplicate requests within 5 minutes.

Automatic Cleanup

Two scheduled cron jobs run in the background: one deletes crawl jobs (and all their pages) after 1 hour, and another removes CSV export files after 5 minutes. No data is stored permanently — ephemeral by design.

Architecture Flow

User submits URL
  │
  ├─► Validate URL ─► Extract root domain
  │                        │
  │              Duplicate check (PostgreSQL)
  │                   │              │
  │             EXISTS (200)    NEW JOB (201)
  │             prompt decide   jobService.createJob()
  │                                  │
  │                                  ▼
  │                    ┌──── BullMQ + Redis ────┐
  │                    │    'crawler' queue     │
  │                    └───────────┬────────────┘
  │                                ▼
  │                    Crawler Worker (×5 concurrency)
  │                                │
  │                                ▼
  │                    BFS Loop: axios.get() → cheerio
  │                    Extract: title, meta, links, status
  │                    Insert page → PostgreSQL
  │                    Internal links → back to queue
  │                                │
  │                                ▼
  │                    Job finalized → status: completed
  │                    Email notification (if provided)
  │
  └─► User clicks "Export CSV"
                  │
                  ▼
      ┌──── BullMQ + Redis ────┐
      │   'csv-export' queue   │
      └───────────┬────────────┘
                  ▼
      Export Worker (×2 concurrency)
      Fetch all pages → Build CSV → Write to disk
                  │
                  ▼
      Email download link → Auto-delete in 5 min

Architecture Flow

▸ User submits URL

↓ Validate URL → Extract domain

↓ Duplicate check (PostgreSQL)

→ EXISTS: prompt re-crawl

→ NEW: createJob()

↓ BullMQ + Redis queue

↓ Crawler Worker (×5)

→ BFS: axios → cheerio

→ Extract SEO data

→ Insert → PostgreSQL

↓ Job finalized ✓

↓ Email notification

▸ Export CSV

↓ csv-export queue

↓ Export Worker (×2)

→ Build CSV → disk

↓ Email link → 5min TTL

👤 Who Built This?

Hasibul Islam

Full-Stack Developer

I built WebCrawler.buzz as an open-source tool to make website auditing accessible to everyone — for free. The entire source code is public on GitHub. If you find a bug, have a feature request, or want to contribute, you're welcome to open an issue or pull request.

GitHub Email 🏆 Product Hunt