Turk: robots.txt permission verifier

Simple (Golang) HTTP web-service to verify whether a supplied "Agent" is allowed to access the requested URL. Pass in the URL of the resource you want to fetch and the name of your agent and Turk will download, parse the robots.txt file and respond with a 200 if you can proceed, and 400 otherwise.

$> goinstall github.com/temoto/robotstxt.go
$> make && ./turk -host="localhost:9090"
$>
$> curl -v "http://127.0.0.1:9090/?agent=Googlebot&url=http://blogspot.com/comment.g"
   < HTTP/1.1 400 Bad Request

$> curl -v "http://127.0.0.1:9090/?agent=Googlebot&url=http://blogspot.com/"
   < HTTP/1.1 200 OK

Note: blogger.com/robots.txt blocks allow agents from fetching comment.g resource.

Notes

Turk is an experiment with Go. Go's http stack is "async", hence many parallel requests can be processed at the same time. Turk also has naive, unbounded in-memory cache to avoid refetching the same robots.txt data for a given host.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
turk.go		turk.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

turk.go

turk.go

Repository files navigation

Turk: robots.txt permission verifier

Notes

License

About

Releases

Packages

abuiles/turk

Folders and files

Latest commit

History

Repository files navigation

Turk: robots.txt permission verifier

Notes

License

About

Resources

Stars

Watchers

Forks