Mini Text Indexer

Description

Mini Text Indexer is an application which scans and indexes user definable text sequences in text-based files. You configure a set of paths and file patterns, then the Mini Text Indexer will scan these files looking for these patterns. When it finds them it creates an in-memory index which allows for quick searching for these matched patterns.

For example, let's pretend you have pointed Mini Text Indexer to a directory full of JavaScript code and configured it to look at files with a .js extension. Then let's pretend you have configured Mini Text Indexer to look for the following text patterns.

\$\("#(.*?)"\) // jQuery ID selectors
new\s+(.*?) // Instantiating objects with new

When configured with these patterns Mini Text Indexer will capture the above text and create an in-memory index. Mini Text Indexer provides HTTP endpoints to perform searches against this index to find what files have text that match your search.

Configuration

Mini Text Indexer is configured via a JSON file. You must tell this application about three things.

Where to look for files
What file patterns to index
What patterns to look for

The basic shell of the configuration file looks like this.

{
	"paths": [],
	"filePatterns": [],
	"textPatterns": []
}

Paths

Paths tell Mini Text Indexer where to look for files. This is a simple array of string directory paths.

{
	"paths": [
		"/code/js/project/models",
		"/code/js/project/services"
	]
}

File Patterns

This section tells Mini Text Indexer what file patterns to consider. File patterns are a simple contains match. If there is more than one file pattern defined it will match on any of the provided patterns. In the examples below we are saying we want to match anything that contains .js in the file name.

{
	"filePatterns": [
		".js",
		".hbs"
	]
}

Text Patterns

Text Patterns tells Mini Text Indexer what patterns to look for and index. A pattern consists of the following elements.

Regex pattern with zero or more capture groups
An index to the capture group which is to be used as the key for the index tree

When a regex pattern is matched it is stored in the index tree. The value that is stored as the key, and used in searches across the tree, should be an index to a capture group in the regular expression. A value of zero (0) tells Mini Text Indexer to use the whole capture as the key.

{
	"textPatterns": [
		{
			"pattern": "\\$\\(\"#(.*?)\"\\)",
			"key": 1
		}
	]
}

Startup Configuration

Mini Text Indexer is a command line server application. It has several command line flags that can control and customize its behavior.

ip - Address to bind the HTTP server to
port - Port to bind the HTTP server to
loglevel - Detail level of logging: debug, info

HTTP Interface

Mini Text Indexer provides an HTTP interface to perform searches against the index tree. Below are the endpoints available.

Search

GET /search?term=[searchTerm]

Performs a search against the index tree. This will return an array of terms that matches the specified search term.

The matching tree node contains a key which is the match to the provided search term. It then has an array of documents where the term is found. Each document has a name, followed by an array of match locations. Each location has the matched text, captured groups from the regular expression, and the starting location of the text in the file.

Parameters

term - Term to search for

Response

[
	{
		"key": "contentDiv",
		"documents": [
			{
				"documentName": "HomeController.js",
				"matches": [
					{
						"location": 100,
						"match": "$(\"#contentDiv\")",
						"captures": [
							"contentDiv"
						]
					}
				]
			}
		]
	},
	{
		"key": "contentDivabc",
		"documents": [
			{
				"documentName": "TestController.js",
				"matches": [
					{
						"location": 10,
						"match": "$(\"#contentDivabc\")",
						"captures": [
							"contentDivabc"
						]
					}
				]
			}
		]
	}
]

GET /getterm?term=[searchTerm]

Performs a search against the index tree. This will return a specific term that matches the specified search term.

The matching tree node contains a key which is the match to the provided search term. It then has an array of documents where the term is found. Each document has a name, followed by an array of match locations. Each location has the matched text, captured groups from the regular expression, and the starting location of the text in the file.

Parameters

term - Term to search for

Response

{
	"key": "contentDiv",
	"documents": [
		{
			"documentName": "HomeController.js",
			"matches": [
				{
					"location": 100,
					"match": "$(\"#contentDiv\")",
					"captures": [
						"contentDiv"
					]
				}
			]
		}
	]
}

License

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
catalog		catalog
config		config
controllers		controllers
document		document
listener		listener
middleware		middleware
tree		tree
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
flags.go		flags.go
middleware.go		middleware.go
minitextindexer.go		minitextindexer.go
routes.go		routes.go

License

adampresley/minitextindexer

Folders and files

Latest commit

History

Repository files navigation

Mini Text Indexer

Description

Configuration

Paths

File Patterns

Text Patterns

Startup Configuration

HTTP Interface

Search

GET /search?term=[searchTerm]

Parameters

Response

GET /getterm?term=[searchTerm]

Parameters

Response

License

About

Resources

License

Stars

Watchers

Forks

Languages