Windows Forms UI
Interactive desktop app (net7.0-windows7.0)
with seed URL entry, Group Set, Crawl Pages, and Initiate.
Open source · Windows desktop · Local Chrome
urls.txt,
then run the console app to open large lists in Chrome (20 URLs per session).
How it works
Use Site Crawler Advance for interactive discovery and validation, then Site Overloader to batch-open everything you saved.
Paste one URL per line in the seed list (left column). These are the starting points for your session.
Group Set controls how many URLs open together in each Chrome batch. Crawl Pages sets how many batches still extract same-host links from loaded pages (decrements as crawling runs).
Click Initiate (green button in the app) to start the queue. The controller processes URLs in batches until the queue is empty.
Watch live lists: success (loaded pages), failed (navigation, timeout, or other errors), and discovered (same-host links found). Status totals update for URL found, success, and failed counts.
Discovered URLs are written to urls.txt in the
working directory (numbered lines) as the crawl progresses—ready for batch loading.
Place urls.txt in the console app directory (one
URL per line, or numbered lines from Advance). The app reads every line on startup.
Launch the .NET 8 console app. It splits the list into chunks of 20 URLs per Chrome session, opens each chunk sequentially, then moves to the next.
On process exit, pending browser sessions close cleanly so Chrome does not linger after you stop the tool.
Features
Everything below maps to real behavior in the Site Crawler repository—no cloud crawler features.
Interactive desktop app (net7.0-windows7.0)
with seed URL entry, Group Set, Crawl Pages, and Initiate.
Opens URLs in groups sized by Group Set. The crawl queue feeds the next batch automatically when a Chrome session finishes.
When Crawl Pages > 0, loaded pages are scanned for same-host anchor links (PDFs skipped). The counter decrements per batch until link extraction stops.
Success, failed, and discovered URL lists update in real time with sorted, deduplicated entries and status totals (URL found, success, failed).
Discovered URLs export to urls.txt with
serial numbers—your bridge to Site Overloader.
During page loads, script requests are aborted to speed navigation-focused inspection (other resources continue).
.NET 8 console app that reads urls.txt from
the current working directory and processes the full list.
Splits the file into chunks of 20 URLs. Each chunk opens in one Chrome run, then the next chunk starts.
Chunks run one after another—ideal for smoke-opening large inventories without manual copy-paste.
Registers for process exit and closes any pending browser automation instances.
Both apps launch your installed Chrome (standard install paths). Headful windows by default so you can see what loads.
Powered by PuppeteerSharp against local Chrome—not Selenium, not a remote browser farm.
30-second navigation timeout per URL. Failures surface as navigation, timeout, or generic errors—not HTTP status dashboards.
No database and no cloud API. Runtime state lives in
memory; long-term output is urls.txt on disk.
CI packages Release output as
Siteoverloader.zip when a GitHub Release is published.
Open source under MIT. See LICENSE.txt in the repo.
Gallery
Screenshots from the project README—three-panel layout with Group Set, Crawl Pages, and live crawl results.
Architecture
Two entry points share Chrome automation; urls.txt
links crawl output to batch loading.
urls.txt — discovered URLs written by Advance,
read by Overloader
Pages that finished navigation without error.
Prefixed with Navigation, Timeout, or Error when loads do not complete.
Same-host links extracted during crawl-enabled batches.
Benefits
Walk same-host links from seed URLs to build a picture of internal pages without a hosted crawler service.
See which URLs load in Chrome and which fail on navigation or timeout—useful before QA or content reviews.
Persist discoveries to urls.txt, then batch-open
hundreds of URLs in controlled 20-URL Chrome sessions.
Security & compliance
Use cases
Build a URL list from a staging site crawl, then batch-open pages before a release test pass.
Discover same-host routes from a few seeds when you need a ground-truth link inventory on disk.
Run Site Overloader against urls.txt to visually
confirm many URLs load in Chrome.
Maintain numbered URL lists for audits, migrations, or content cleanup projects.
Requirements
WinForms Advance and the console Overloader target Windows desktops.
Advance: net7.0-windows7.0. Overloader:
net8.0. Install the matching SDKs to build from source.
A standard local Chrome install (user or Program Files paths). Required for both apps.
Open Siteoverloader.sln in Visual Studio, restore
NuGet packages (including PuppeteerSharp), and build Release.
Roadmap
Listed in the project README as future work—not available in the current release.
FAQ
Advance is the WinForms interactive crawler: seeds, Group Set, Crawl Pages,
live panels, and urls.txt export. Overloader is a console tool
that only reads urls.txt and opens URLs in fixed 20-URL Chrome batches—no UI
panels or link discovery.
Advance writes discovered URLs to urls.txt in the working directory (numbered
lines). Overloader reads plain lines from its own directory copy. Use it as the handoff file
between crawl and batch load.
Failures are labeled Navigation, Timeout, or Error when Chrome cannot complete the page load within the navigation timeout. The tool does not provide an HTTP status code dashboard or automated 404 detection.
No. Both apps automate local Chrome via PuppeteerSharp. The GitHub README still mentions Selenium in places—that documentation is outdated.
Build Advance with .NET 7 (Windows) and Overloader with .NET 8. Download pre-built
Siteoverloader.zip from
GitHub Releases if you prefer not to compile.
That is the actual solution name in the repository. Open it in Visual Studio—not the older
SiteCrawler.sln name referenced in some README install steps.
Open source · MIT License · Windows desktop
Clone the repo, open Siteoverloader.sln, build with Visual
Studio, or download the latest release zip. Report issues on GitHub.
git clone https://github.com/XeroDays/SiteCrawler.git
Open Siteoverloader.sln in Visual Studio and restore
NuGet packages.
Start Site Crawler Advance for interactive crawls, or run Site
Overloader with urls.txt ready in its folder.
Published releases attach Siteoverloader.zip from the
CI pipeline.
MIT License · Site Inspector (Site Crawler) · github.com/XeroDays/SiteCrawler