Skip to content

missionMeteora/go-metainspector

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-metainspector Build StatusGoDoc

Simple web metadata scraping in Go.

go-metainspector is a web scraper package that provides access to basic info and meta tags of a given URL. It is inspired by the metainspector gem by Jaime Iniesta and completely written in Go.

Install

  • Step 1: Get the metainspector package
go get -u github.com/fern4lvarez/go-metainspector/metainspector
  • Step 2 (Optional): Run tests
$ go test -v -cover ./...

##Usage

API

package main

import (
  "fmt"

  "github.com/fern4lvarez/go-metainspector/metainspector"
)

func main() {
  url := "http://www.cloudcontrol.com/pricing"
  MI, err := metainspector.New(url)
  if err != nil {
    fmt.Printf("Error: %v", err)
  } else {
    fmt.Printf("\nURL: %s\n", MI.Url())
    fmt.Printf("Scheme: %s\n", MI.Scheme())
    fmt.Printf("Host: %s\n", MI.Host())
    fmt.Printf("Root: %s\n", MI.RootURL())
    fmt.Printf("Title: %s\n", MI.Title())
    fmt.Printf("Language: %s\n", MI.Language())
    fmt.Printf("Author: %s\n", MI.Author())
    fmt.Printf("Description: %s\n", MI.Description())
    fmt.Printf("Charset: %s\n", MI.Charset())
    fmt.Printf("Feed URL: %s\n", MI.Feed())
    fmt.Printf("Links: %v\n", MI.Links())
    fmt.Printf("Images: %v\n", MI.Images())
    fmt.Printf("Keywords: %v\n", MI.Keywords())
    fmt.Printf("Compatibility: %v\n", MI.Compatibility())
  }

CLI

You can use the go-metainspector as a command line tool. Note: Once you have installed the package, you might want to set an alias for the CLI: add alias metainspect=go-metainspector or similar in your appropiated dotfile.

Simple to use!

$ go-metainspector -help
Usage of go-metainspector:
  -u="www.example.com": URL to metainspect.
  -all=false: Show full results.

$ go-metainspector -u www.cloudcontrol.com
www.cloudcontrol.com
----> Title: cloudControl » Cloud App Platform » supercharging development
----> Author: cloudControl GmbH
----> Description: Cloud hosting secure, easy and fair: Highly available and scalable cloud hosting with no administraton hassle and pay as you go billing
----> Charset: utf-8
----> Language: en
----> Feed URL: https://www.cloudcontrol.com/blog/feed
----> Keywords: cloudcontrol cloud control cloud hosting cloud computing cloud hosting web-hosting platform as a service paas 
----> Links: http://www.cloudcontrol.com/ http://www.cloudcontrol.com/pricing http://www.cloudcontrol.com/dev-center http://www.cloudcontrol.com/add-ons https://console.cloudcontrolled.com http://www.cloudcontrol.com/for-isvs http://www.cloudcontrol.com/for-developers http://www.cloudcontrol.com/use-cases/vamos http://www.cloudcontrol.com/use-cases/searchbox http://www.cloudcontrol.com/use-cases/snipclip ...
----> Images: http://www.cloudcontrol.com/assets/spinner-9bde1d21899a52974160da652c0a6622.gif https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/006/thumb/vamos-logo-rgb-vertical.png?1360313785 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/008/thumb/logo.png?1362736875 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/002/thumb/snipclip_logo_2011_rgb.png?1359481602 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/004/thumb/ormigo.png?1359481203 https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/blitz.png https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/sendgrid.png https://s3-eu-west-1.amazonaws.com/cctrl-www-production/solution_providers/logos/000/000/013/small/emind_klein.png?1364402125 ...

-u is optional though:

$ go-metainspector www.cloudcontrol.com
----> Title: cloudControl » Cloud App Platform » supercharging development
----> Author: cloudControl GmbH
...

Sometimes one site contains so many links and images that makes it ugly to print them all. By default it's cut on 10 links and images. Use -all to print them all:

$ go-metainspector -u www.cloudcontrol.com -all
----> Title: cloudControl » Cloud App Platform » supercharging development
----> Author: cloudControl GmbH
----> Description: Cloud hosting secure, easy and fair: Highly available and scalable cloud hosting with no administraton hassle and pay as you go billing
----> Charset: utf-8
----> Language: en
----> Feed URL: https://www.cloudcontrol.com/blog/feed
----> Keywords: cloudcontrol cloud control cloud hosting cloud computing cloud hosting web-hosting platform as a service paas 
----> Links: http://www.cloudcontrol.com/ http://www.cloudcontrol.com/pricing http://www.cloudcontrol.com/dev-center http://www.cloudcontrol.com/add-ons https://console.cloudcontrolled.com http://www.cloudcontrol.com/for-isvs http://www.cloudcontrol.com/for-developers http://www.cloudcontrol.com/add-ons/sendgrid http://www.cloudcontrol.com/add-ons/newrelic http://www.cloudcontrol.com/use-cases/ormigo http://www.cloudcontrol.com/use-cases/vamos http://www.cloudcontrol.com/use-cases/afp http://www.cloudcontrol.com/use-cases/adcloud http://www.cloudcontrol.com/use-cases/kinderfee http://www.cloudcontrol.com/use-cases/snipclip http://www.cloudcontrol.com/dev-center/Quickstart http://www.cloudcontrol.com/dev-center/Platform Documentation http://status.cloudcontrol.com http://www.cloudcontrol.com/dev-center/support https://console.cloudcontrolled.com http://www.cloudcontrol.com/team http://www.cloudcontrol.com/jobs http://www.cloudcontrol.com/blog http://www.cloudcontrol.com/contact http://www.cloudcontrol.com/add-on-provider-program http://www.cloudcontrol.com/solution-providers http://www.cloudcontrol.com/tos http://www.cloudcontrol.com/privacy-policy http://www.cloudcontrol.com/imprint 
----> Images: http://www.cloudcontrol.com/assets/spinner-9bde1d21899a52974160da652c0a6622.gif https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/sendgrid.png https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/newrelic.png https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/004/thumb/ormigo.png?1359481203 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/006/thumb/vamos-logo-rgb-vertical.png?1360313785 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/003/thumb/AFP-logo.png?1359481038 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/001/thumb/adcloud.png?1359563276 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/005/thumb/Kinderfee_Logo_Claim.png?1359481378 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/002/thumb/snipclip_logo_2011_rgb.png?1359481602 

Web

You can use go-metainspector via web. Check out and run the go-metainspector-site or just try it out: http://gometainspector.cloudcontrolled.com

##Contribute! You all are welcome to take a seat and make a contribution to this repo: reviews, issues, feature suggestions, possible code or functionality enhancements... Everything is appreciated!

##TODO (aka Nice To Have)

  • Extend documentation
  • Write a CHANGELOG
  • Mock http requests to speed up unit tests
  • Internal links, External links
  • Map() method wrapping all data
  • Set Timeout optionally (now is 20 secs)
  • Your suggestion

##License go-metainspector is MIT licensed, see here

About

Simple web scraping for Go.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 99.9%
  • Shell 0.1%