Skip to content

pombredanne/dumpparser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Travis_

Wikipedia dump parser for semanticizest

This program parses Wikipedia database dumps for consumption by semanticizest.

Installing

Make sure you have a Go compiler (1.2 or newer) and Git. On Debian/Ubuntu/Mint, that's:

sudo apt-get install git golang-go

On CentOS:

sudo yum -y install git golang

Set up a Go workspace, if you haven't already. For example:

mkdir /some/where/go
cd /some/where/go
export GOPATH=$(pwd)

Fetch and compile:

go get github.com/semanticize/st
go install github.com/semanticize/st/dumpparser
go install github.com/semanticize/st/semanticizest

You now have a working parser at ${GOPATH}/bin/dumpparser. Issue:

${GOPATH}/bin/dumpparser --help

to figure out how to generate a semanticizer model, then use this model from the REST API:

${GOPATH}/bin/semanticizest --http=:5002 your_model
curl http://localhost:5002/all -d 'Does the entity linking work?'

About

MediaWiki dump parser in Go

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%