Sitemap Generator

Summary

This little utility, given a starting URL, will crawl a website and find all the static assets and links on that site.

Design Goals

Crawl an entire site and report on its structure
Flexible output formats (i.e. json, tab, digraph)
Customize performance characteristics

Design Decisions

The utility will stay within the same domain
THe utility, when it finds duplicate URLs, it will not traverse into its links, but still report on the links found.

Features

Ability to save results to a file
Set number of worker threads/goroutines to crawl a site
Set rate limiter, if desired
Set inactivity timeout
Read in saved results and redisplay in different formats

How to get it

(1) You have Docker installed

docker run mkboudreau/sitemap ....

(2) You have Go installed

go get github.com/mkboudreau/sitemap 
make install

Example Usage

Crawl site with sensible defaults

sitemap www.microsoft.com

Crawl site with 50 workers

sitemap -w 50 www.microsoft.com

Crawl site with rate limiting turned off

sitemap -r 0s www.microsoft.com

Crawl site and output JSON

sitemap -f json www.microsoft.com

Crawl site and output tabular format (default)

sitemap -f tab www.microsoft.com

Crawl site and output digraph (dot)

sitemap -f digraph www.microsoft.com

Crawl site and save results to file

sitemap -o saved.json www.microsoft.com

Use saved results and output as a digraph

sitemap -i saved.json -f digraph

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
builder		builder
command		command
domain		domain
example		example
format		format
.gitignore		.gitignore
.godir		.godir
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.go		main.go

License

mkboudreau/sitemap

Folders and files

Latest commit

History

Repository files navigation