Skip to content

jimmyfrasche/htmlrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#htmlrep Command htmlrep prints a report of tags and links used in a UTF-8 encoded HTML blob read from stdin.

Download:

go get github.com/jimmyfrasche/htmlrep

If you do not have the go command on your system, you need to Install Go first


Command htmlrep prints a report of tags and links used in a UTF-8 encoded HTML blob read from stdin.

The document is never parsed, only tokenized, so many documents may be concatenated together.

These reports are useful when preparing to migrate legacy data into a new web site.

There are three reports: tags and attributes, links in attributes, and links in text nodes.

The tags and attributes report lists each tag used in the blob on a line followed by all attributes used on all instances of that tag, where each attribute is indented by one tab. The reports are separated by blank lines.

Both link reports list all unique links, one per line, in the blob. The links in attributes reports links found in all attributes known to contain links. The links on content scans text nodes for things that may be links, using a number of heuristics to cull false positives, which, while unicode aware, are largely English-centric.

By default all reports are shown, but some may be hidden using the following flags:

-t	only show tags and attributes report
-l	only show the links reports
-c	of the links reports, only show links from text nodes
-a	of the links reports, only show links from attributes

##EXAMPLES Show all reports

cat *.html | htmlrep

Show only tags and attributes reports

cat *.html | htmlrep -t

Show only links reports

cat *.html | htmlrep -l

Show only probable links from text nodes:

cat *.html | htmlrep -l -c

Show only links from attributes:

cat *.html | htmlrep -l -a

Automatically generated by autoreadme on 2016.07.03

About

basic reports about legacy html

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published