Site Inspector — URL Inspection & Crawl Toolkit

Interactive crawl (Site Crawler Advance)

Enter seed URLs

Paste one URL per line in the seed list (left column). These are the starting points for your session.
Configure Group Set and Crawl Pages

Group Set controls how many URLs open together in each Chrome batch. Crawl Pages sets how many batches still extract same-host links from loaded pages (decrements as crawling runs).
Initiate

Click Initiate (green button in the app) to start the queue. The controller processes URLs in batches until the queue is empty.
Review three panels

Watch live lists: success (loaded pages), failed (navigation, timeout, or other errors), and discovered (same-host links found). Status totals update for URL found, success, and failed counts.
Export to urls.txt

Discovered URLs are written to urls.txt in the working directory (numbered lines) as the crawl progresses—ready for batch loading.

Batch load (Site Overloader)

Prepare urls.txt

Place urls.txt in the console app directory (one URL per line, or numbered lines from Advance). The app reads every line on startup.
Run Site Overloader

Launch the .NET 8 console app. It splits the list into chunks of 20 URLs per Chrome session, opens each chunk sequentially, then moves to the next.
Graceful shutdown

On process exit, pending browser sessions close cleanly so Chrome does not linger after you stop the tool.

Windows Forms UI

Interactive desktop app (net7.0-windows7.0) with seed URL entry, Group Set, Crawl Pages, and Initiate.

Group Set batching

Opens URLs in groups sized by Group Set. The crawl queue feeds the next batch automatically when a Chrome session finishes.

Crawl Pages

When Crawl Pages > 0, loaded pages are scanned for same-host anchor links (PDFs skipped). The counter decrements per batch until link extraction stops.

Live result panels

Success, failed, and discovered URL lists update in real time with sorted, deduplicated entries and status totals (URL found, success, failed).

urls.txt persistence

Discovered URLs export to urls.txt with serial numbers—your bridge to Site Overloader.

Script blocking

During page loads, script requests are aborted to speed navigation-focused inspection (other resources continue).

Console batch loader

.NET 8 console app that reads urls.txt from the current working directory and processes the full list.

20 URLs per session

Splits the file into chunks of 20 URLs. Each chunk opens in one Chrome run, then the next chunk starts.

Sequential processing

Chunks run one after another—ideal for smoke-opening large inventories without manual copy-paste.

Exit cleanup

Registers for process exit and closes any pending browser automation instances.

Local Google Chrome

Both apps launch your installed Chrome (standard install paths). Headful windows by default so you can see what loads.

Chrome automation

Navigation timeout

30-second navigation timeout per URL. Failures surface as navigation, timeout, or generic errors—not HTTP status dashboards.

File-only persistence

No database and no cloud API. Runtime state lives in memory; long-term output is urls.txt on disk.

GitHub Release builds

CI packages Release output as Siteoverloader.zip when a GitHub Release is published.

MIT license

Open source under MIT. See LICENSE.txt in the repo.

Site Crawler Advance with seed URLs and success, failed, and discovered panels — Advance — interactive crawl and live panels

Site Crawler Advance status area showing totals for URLs found, success, and failed — Status totals and crawl controls

urls.txt — discovered URLs written by Advance, read by Overloader

Success panel

Pages that finished navigation without error.

Failed panel

Prefixed with Navigation, Timeout, or Error when loads do not complete.

Discovered panel

Same-host links extracted during crawl-enabled batches.

Structure discovery

Walk same-host links from seed URLs to build a picture of internal pages without a hosted crawler service.

Reachability awareness

See which URLs load in Chrome and which fail on navigation or timeout—useful before QA or content reviews.

Large-list handling

Persist discoveries to urls.txt, then batch-open hundreds of URLs in controlled 20-URL Chrome sessions.

No cloud database or web API — processing runs on your Windows machine; nothing is sent to a Softasium or third-party crawl service.
Chrome on your PC — uses your installed Google Chrome via PuppeteerSharp; you control what URLs are opened.
No credential storage — the apps do not implement login vaults or cloud accounts; they open URLs you supply.
MIT license — inspect, modify, and redistribute per LICENSE.txt.

QA prep

Build a URL list from a staging site crawl, then batch-open pages before a release test pass.

Sitemap exploration

Discover same-host routes from a few seeds when you need a ground-truth link inventory on disk.

Batch smoke-open

Run Site Overloader against urls.txt to visually confirm many URLs load in Chrome.

URL inventories

Maintain numbered URL lists for audits, migrations, or content cleanup projects.

Windows

WinForms Advance and the console Overloader target Windows desktops.

.NET 7 and .NET 8

Advance: net7.0-windows7.0. Overloader: net8.0. Install the matching SDKs to build from source.

Google Chrome

A standard local Chrome install (user or Program Files paths). Required for both apps.

Visual Studio

Open Siteoverloader.sln in Visual Studio, restore NuGet packages (including PuppeteerSharp), and build Release.

Export inspection reports to CSV and Excel Planned
Scheduled automated website inspections Planned
Email notifications for broken links Planned
Advanced filtering and reporting capabilities Planned

What is the difference between Site Crawler Advance and Site Overloader?

Advance is the WinForms interactive crawler: seeds, Group Set, Crawl Pages, live panels, and urls.txt export. Overloader is a console tool that only reads urls.txt and opens URLs in fixed 20-URL Chrome batches—no UI panels or link discovery.

What is urls.txt?

Advance writes discovered URLs to urls.txt in the working directory (numbered lines). Overloader reads plain lines from its own directory copy. Use it as the handoff file between crawl and batch load.

What do failed URLs mean?

Failures are labeled Navigation, Timeout, or Error when Chrome cannot complete the page load within the navigation timeout. The tool does not provide an HTTP status code dashboard or automated 404 detection.

Does it use Selenium?

No. Both apps automate local Chrome via PuppeteerSharp. The GitHub README still mentions Selenium in places—that documentation is outdated.

What .NET versions do I need?

Build Advance with .NET 7 (Windows) and Overloader with .NET 8. Download pre-built Siteoverloader.zip from GitHub Releases if you prefer not to compile.

Why is the solution file named Siteoverloader.sln?

That is the actual solution name in the repository. Open it in Visual Studio—not the older SiteCrawler.sln name referenced in some README install steps.

Open source · MIT License · Windows desktop

Get started with Site Crawler

Clone the repo, open Siteoverloader.sln, build with Visual Studio, or download the latest release zip. Report issues on GitHub.

Clone the repository

git clone https://github.com/XeroDays/SiteCrawler.git
Open the solution

Open Siteoverloader.sln in Visual Studio and restore NuGet packages.
Run Advance or Overloader

Start Site Crawler Advance for interactive crawls, or run Site Overloader with urls.txt ready in its folder.
Download a release (optional)

Published releases attach Siteoverloader.zip from the CI pipeline.

Download release Open GitHub repo Report an issue

MIT License · Site Inspector (Site Crawler) · github.com/XeroDays/SiteCrawler

Inspect, crawl, and batch-load URLs on your Windows PC.

Two workflows, one toolkit

Interactive crawl (Site Crawler Advance)

Enter seed URLs

Configure Group Set and Crawl Pages

Initiate

Review three panels

Export to urls.txt