Abstract Table Utilities

Unix utilities that process ‘abstracted tables’, or streams of records. In the spirit of of the classic Unix utilities: head, tail, cut, grep, sort, and works with multiple encodings (tab, csv, etc.).

Installation

go get github.com/kyleburton/go-abtab/cmd/abtab
cd $GOPATH/src/github.com/kyleburton/go-abtab
bash install.sh

Overview

Record streams are identified using a URI structure. To convert a tab file in the pwd (foo.tab) to a csv file:

abcat -i tab://foo.tab -o csv://foo.csv

Tab separated values is the core file format. By default, files are expected to have a header line describing the columns in the file though a header can be specified as part of the url.

FieldSeparator=‘\t’
RecordSeparator=‘\n’

‘Memo’ fields, and embedded delimiters:
Embedded Newlines represnted as ‘\n’
Embedded Tabs represnted as ‘\t’
Embedded CR represnted as ‘\r’

Examples:

‘cat’ a tab delimited file to the terminal:

  abcat -i foo.tab

All utilities supoprt using a URL to specify the encoding:

  abcat -i tab://foo.tab

Convert from CSV to TAB:

  abcat -i csv://foo.csv -o tab://foo.tab

View a ‘portrait’ mode of the records in a file:

  abview -i csv://foo.csv

Cut columns out of a file by name:

  # Id Created_At Updated_At Email_Address
  abcut -i csv://foo.csv -f Id,First_Name,Last_Name

Grep for lines

  abgrep -i csv://foo.csv -e 'First_Name == "Bob"'
  abgrep -i csv://foo.csv -e '__LINE__ >= 300 && __LINE__ < 400'

Take a random sample: Flip a coin for each record:

  abgrep -e 'RandFloat()>0.5' -i tab://foo.tab

The expression is evaluated using: https://github.com/kyleburton/go-eval, a restricted sub-set of the go language. There will be a pre-defined variable for each field in the row as well all __LINE__ which contains the current record number.

Take a sample: record 300 to 399:

  abgrep -i csv://foo.csv -e '__LINE__ >= 300 && __LINE__ < 400'

Take a sample: Emit every Nth record:

  abgrep -e '(__LINE__%2)==0' -i tab://foo.tab
  abgrep -e '(__LINE__%5)==0' -i tab://foo.tab
  abgrep -e '(__LINE__%10)==0' -i tab://foo.tab

Head and Tail

Header plus the first 100 records:

  abhead -i csv://foo.csv -n 100

Header plus the last 100 records:

  abtail -i csv://foo.csv -n 100

Header, skips the first 100 records, emits the rest:

  abtail -i csv://foo.csv -n +100

Sorting

  absort -i csv://foo.csv -f Last_Name,First_Name

Joining/Merging

TBD

Expressions have access to the record number in the stream (lnum), the array of field values (rec) and each of the declared fields.

Modify a source: add columns, apply an expression to records.

Common Command Line Options

-tmp /path/to/tmp

Specfiy an alternatvie temp directory to use.

URL Encoding

Schema

Schema correlates to the driver / file encoding.

Path

Path correlates to the file path to read from or write to. You can interaact with stdin/stdout on Unix by specifying file paths of /dev/stdin and /dev/stdout respectively. Drivers may interpret these differently, ex: database drivers would interpret the first parts of the path as the database host, schema and table name.

Query String

These are for common options and driver specific options.

Common Options

header=f1,f2,f3

Specify the header for a file if it does not have one of its own.

-header

Indicates the source has no header and that one should be fabricated (F1, F2, F3, …). This is assumed if a header is supplied, and can be used to strip the header off of a destination file.

skipLines=N

Indicates to skip the first N lines of the source.

Supported Encoding Formats

tab

There are no custom options supported for the tab driver.

csv

There are no custom options supported for the csv driver.

fixed

The fixed width driver can be used to read or write fixed width file formats. It supports the following options:

xls

TBD

pg

NB: The pg driver supports read operations but does not support write operations.

abcat -i “pg://localhost:5432/database_name/table_name?order=somecol&limit=10&offset=0&user=pg_username&password=pg_password” -o csv://table.csv

mysql

TBD

Examples

  abcat -i "tab://fixtures/galbithink.org/92f10-19.tab?skipLines=4&header=Name,Count"
  abcat -i "tab://fixtures/galbithink.org/92f10-19.tab?skipLines=4&header=Name,Count" -o csv:///dev/stdin
  abfillrates -i "tab://fixtures/galbithink.org/92f10-19.tab?skipLines=4&header=Name,Count"

  abgrep -e 'Substr(Name,-1,0) == "y"' -i "tab://fixtures/galbithink.org/92f10-19.tab?skipLines=4&header=Name,Count" 2>&1 | less

TODO:

DONE implement an installer, and cmdline wrappers (abcat, abview, abgrep, …)
documentation: packages, apis, examples, command line utilities
DONE implement driver interface and types
- DONE driver type/struct
- DONE url parsing
- DONE driver options
DONE implement tab driver
DONE implement csv driver
DONE implement abcat utility
implement support for memo fields in tab driver (embedded newlines and tabs, escape on input, unescape on output)
DONE implement abcut utility
DONE implement abhead utility
DONE implement abtail utility
DONE implement abview utility
DONE implement abgrep utility
- DONE fork https://github.com/sbinet/go-eval
  - DONE make all necessary types public, or otherwise determine how to inject variables into the interpreter
  - DONE embed the go-eval interpreter into grep in order to evaluate expressions per record
DONE re-write grep’s expression fn as a stream filter
extend the ‘standard library’ for abgrep’s expression language
- what does this need to be useful?
DONE abgrep: implement Rand() functions
implement pg driver
implement absort utility
implement abmod driver
implement fixed driver
implement mysql driver

Fill Rates:

Implement a cardinality checker.

Support both a strict uniqueness count (requires keeping the entire file in
ram), and estimation of cardinatlity (using a bloom filter or hyper-log-log)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
cmd/abtab		cmd/abtab
fixtures		fixtures
pkg/abtab		pkg/abtab
scripts		scripts
tests		tests
.gitignore		.gitignore
Bakefile		Bakefile
README.textile		README.textile
build.sh		build.sh
install.sh		install.sh
local-build.sh		local-build.sh

kyleburton/go-abtab

Folders and files

Latest commit

History

Repository files navigation

Abstract Table Utilities

Installation

Overview

Common Command Line Options

URL Encoding

Schema

Path

Query String

Common Options

Supported Encoding Formats

tab

csv

fixed

xls

pg

mysql

Examples

About

Resources

Stars

Watchers

Forks

Languages