Open source · Windows desktop · Local Chrome

Inspect, crawl, and batch-load URLs on your Windows PC.

  • Interactive crawl — seed URLs, open pages in grouped batches, discover same-host links, and track success, failure, and discovered URLs in live panels.
  • Batch load from file — export discoveries to urls.txt, then run the console app to open large lists in Chrome (20 URLs per session).
  • Local-first — no cloud service, no database, no web API. Chrome runs on your machine; data stays on disk.
Site Crawler Advance main window with seed URLs, Group Set, Crawl Pages, and success, failed, and discovered URL panels

How it works

Two workflows, one toolkit

Use Site Crawler Advance for interactive discovery and validation, then Site Overloader to batch-open everything you saved.

Interactive crawl (Site Crawler Advance)

  1. Enter seed URLs

    Paste one URL per line in the seed list (left column). These are the starting points for your session.

  2. Configure Group Set and Crawl Pages

    Group Set controls how many URLs open together in each Chrome batch. Crawl Pages sets how many batches still extract same-host links from loaded pages (decrements as crawling runs).

  3. Initiate

    Click Initiate (green button in the app) to start the queue. The controller processes URLs in batches until the queue is empty.

  4. Review three panels

    Watch live lists: success (loaded pages), failed (navigation, timeout, or other errors), and discovered (same-host links found). Status totals update for URL found, success, and failed counts.

  5. Export to urls.txt

    Discovered URLs are written to urls.txt in the working directory (numbered lines) as the crawl progresses—ready for batch loading.

Batch load (Site Overloader)

  1. Prepare urls.txt

    Place urls.txt in the console app directory (one URL per line, or numbered lines from Advance). The app reads every line on startup.

  2. Run Site Overloader

    Launch the .NET 8 console app. It splits the list into chunks of 20 URLs per Chrome session, opens each chunk sequentially, then moves to the next.

  3. Graceful shutdown

    On process exit, pending browser sessions close cleanly so Chrome does not linger after you stop the tool.

Features

Two apps, shared foundations

Everything below maps to real behavior in the Site Crawler repository—no cloud crawler features.

Windows Forms UI

Interactive desktop app (net7.0-windows7.0) with seed URL entry, Group Set, Crawl Pages, and Initiate.

Group Set batching

Opens URLs in groups sized by Group Set. The crawl queue feeds the next batch automatically when a Chrome session finishes.

Crawl Pages

When Crawl Pages > 0, loaded pages are scanned for same-host anchor links (PDFs skipped). The counter decrements per batch until link extraction stops.

Live result panels

Success, failed, and discovered URL lists update in real time with sorted, deduplicated entries and status totals (URL found, success, failed).

urls.txt persistence

Discovered URLs export to urls.txt with serial numbers—your bridge to Site Overloader.

Script blocking

During page loads, script requests are aborted to speed navigation-focused inspection (other resources continue).

Architecture

How the pieces connect

Two entry points share Chrome automation; urls.txt links crawl output to batch loading.

urls.txt — discovered URLs written by Advance, read by Overloader

Success panel

Pages that finished navigation without error.

Failed panel

Prefixed with Navigation, Timeout, or Error when loads do not complete.

Discovered panel

Same-host links extracted during crawl-enabled batches.

Benefits

Why site owners and developers use it

Structure discovery

Walk same-host links from seed URLs to build a picture of internal pages without a hosted crawler service.

Reachability awareness

See which URLs load in Chrome and which fail on navigation or timeout—useful before QA or content reviews.

Large-list handling

Persist discoveries to urls.txt, then batch-open hundreds of URLs in controlled 20-URL Chrome sessions.

Security & compliance

Local-only by design

  • No cloud database or web API — processing runs on your Windows machine; nothing is sent to a Softasium or third-party crawl service.
  • Chrome on your PC — uses your installed Google Chrome via PuppeteerSharp; you control what URLs are opened.
  • No credential storage — the apps do not implement login vaults or cloud accounts; they open URLs you supply.
  • MIT license — inspect, modify, and redistribute per LICENSE.txt.

Use cases

Practical workflows

QA prep

Build a URL list from a staging site crawl, then batch-open pages before a release test pass.

Sitemap exploration

Discover same-host routes from a few seeds when you need a ground-truth link inventory on disk.

Batch smoke-open

Run Site Overloader against urls.txt to visually confirm many URLs load in Chrome.

URL inventories

Maintain numbered URL lists for audits, migrations, or content cleanup projects.

Requirements

What you need

Windows

WinForms Advance and the console Overloader target Windows desktops.

.NET 7 and .NET 8

Advance: net7.0-windows7.0. Overloader: net8.0. Install the matching SDKs to build from source.

Google Chrome

A standard local Chrome install (user or Program Files paths). Required for both apps.

Visual Studio

Open Siteoverloader.sln in Visual Studio, restore NuGet packages (including PuppeteerSharp), and build Release.

Roadmap

Planned enhancements

Listed in the project README as future work—not available in the current release.

  • Export inspection reports to CSV and Excel Planned
  • Scheduled automated website inspections Planned
  • Email notifications for broken links Planned
  • Advanced filtering and reporting capabilities Planned

FAQ

Common questions

What is the difference between Site Crawler Advance and Site Overloader?

Advance is the WinForms interactive crawler: seeds, Group Set, Crawl Pages, live panels, and urls.txt export. Overloader is a console tool that only reads urls.txt and opens URLs in fixed 20-URL Chrome batches—no UI panels or link discovery.

What is urls.txt?

Advance writes discovered URLs to urls.txt in the working directory (numbered lines). Overloader reads plain lines from its own directory copy. Use it as the handoff file between crawl and batch load.

What do failed URLs mean?

Failures are labeled Navigation, Timeout, or Error when Chrome cannot complete the page load within the navigation timeout. The tool does not provide an HTTP status code dashboard or automated 404 detection.

Does it use Selenium?

No. Both apps automate local Chrome via PuppeteerSharp. The GitHub README still mentions Selenium in places—that documentation is outdated.

What .NET versions do I need?

Build Advance with .NET 7 (Windows) and Overloader with .NET 8. Download pre-built Siteoverloader.zip from GitHub Releases if you prefer not to compile.

Why is the solution file named Siteoverloader.sln?

That is the actual solution name in the repository. Open it in Visual Studio—not the older SiteCrawler.sln name referenced in some README install steps.

Open source · MIT License · Windows desktop

Get started with Site Crawler

Clone the repo, open Siteoverloader.sln, build with Visual Studio, or download the latest release zip. Report issues on GitHub.

  1. Clone the repository

    git clone https://github.com/XeroDays/SiteCrawler.git

  2. Open the solution

    Open Siteoverloader.sln in Visual Studio and restore NuGet packages.

  3. Run Advance or Overloader

    Start Site Crawler Advance for interactive crawls, or run Site Overloader with urls.txt ready in its folder.

  4. Download a release (optional)

    Published releases attach Siteoverloader.zip from the CI pipeline.

MIT License · Site Inspector (Site Crawler) · github.com/XeroDays/SiteCrawler