Skip to content

goldenberg/naive_reverend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Naive Reverend

The Naive Reverend is an HTTP service for Bayesian textual classification using a bag of words model. Despite their simplicity, bag-of-words models can perform remarkably well for tasks like spam filtering and sentiment detection. They're fast and easy to implement. And maybe above all, writing one from scratch is a good way to drill Bayes' Theorem into your head and be forced to wrestle with some of the subtleties of floating point math on a computer.

About the name

In addition to being a statistician and philosopher, Thomas Bayes was a Presbyterian minister. In a bag of words classifier, we make the "naive" assumption that all features, in our case, words, are conditionally independent. In other words the probability of a word occuring in a class is independent of the words around it. Of course, that isn't true, but we can still build pretty good classifiers if we let ourselves make that assumption. We can then evaluate the accuracy of the classifier using a hold out test set.

Endpoints

/classify

/train

Store backends

Redis

In-memory

LevelDB

Should I use it?

Probably not. Aside from the fact that it has almost no tests, you can likely get much more accurate classification with a backoff language model using a library like kenlm, berkeleylm, or irstlm. All of these libraries use data structures that have been highly optimized for read performance and space efficiency. But they're significantly more expensive to update and retrain than just incrementing counts in a key value store.

About

HTTP service for Bayesian textual classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published