#htmlrep Command htmlrep prints a report of tags and links used in a UTF-8 encoded HTML blob read from stdin.
Download:
go get github.com/jimmyfrasche/htmlrep
If you do not have the go command on your system, you need to Install Go first
Command htmlrep prints a report of tags and links used in a UTF-8 encoded HTML blob read from stdin.
The document is never parsed, only tokenized, so many documents may be concatenated together.
These reports are useful when preparing to migrate legacy data into a new web site.
There are three reports: tags and attributes, links in attributes, and links in text nodes.
The tags and attributes report lists each tag used in the blob on a line followed by all attributes used on all instances of that tag, where each attribute is indented by one tab. The reports are separated by blank lines.
Both link reports list all unique links, one per line, in the blob. The links in attributes reports links found in all attributes known to contain links. The links on content scans text nodes for things that may be links, using a number of heuristics to cull false positives, which, while unicode aware, are largely English-centric.
By default all reports are shown, but some may be hidden using the following flags:
-t only show tags and attributes report
-l only show the links reports
-c of the links reports, only show links from text nodes
-a of the links reports, only show links from attributes
##EXAMPLES Show all reports
cat *.html | htmlrep
Show only tags and attributes reports
cat *.html | htmlrep -t
Show only links reports
cat *.html | htmlrep -l
Show only probable links from text nodes:
cat *.html | htmlrep -l -c
Show only links from attributes:
cat *.html | htmlrep -l -a
Automatically generated by autoreadme on 2016.07.03