Download Manual pages More Tools

Welcome to

Keyaki-aid Tools

What?

These tools offer assistance with building the Keyaki Treebank, either by providing methods of access to parsed data, methods to (re-)process parsed data, or methods to visualise parsed data.

The tools are distributed freely.

How?

The tools are all command line scripts, written with sed, bash, gawk, and perl.

Acknowledgements

Development is funded by the project Development of and Linguistic Research with a Parsed Corpus of Japanese of the National Institute for Japanese Language and Linguistics.

Feedback

Feedback is extremely welcome. Please email: ajb129 __AT__ hotmail __DOT__ com.



Manual pages

    charactertree_to_tree(1)transform tree
    csearch_collect(1)collect parse data
    csearch_fix_numbering(1)(re)number parse data
    csearch_flatten(1)format trees
    csearch_unflatten(1)format trees
    grammar_tags(1)extract tag list
    inline_to_slice(1)convert format
    Keyaki(1)access data
    Keyaki_grep(1)grep content of Keyaki treebank
    Keyaki_location(1)print folder name
    obfuscate_to_tree(1)transform tree
    slice_to_inline(1)convert format
    slice_to_tree(1)output tree from vertical slices
    tnt_character(1)split tnt analysis on characters
    tree_to_charactertree(1)transform tree
    tree_to_obfuscate(1)transform tree
    tree_to_slice(1)slice tree vertically

In examples munge-trees by Mark Johnson is used to reformat trees. Anything else exceptional is noted.



charactertree_to_tree(1)charactertree_to_tree(1)

NAME

charactertree_to_tree - transform tree

SYNOPSIS

charactertree_to_tree

DESCRIPTION

Filter to collapse the nodes of a character tree.

OPTIONS

EXAMPLE

$ cat << EOF | charactertree_to_tree | munge-trees
> ( (IP-MAT (PP (NP (N (+ 授)
>                      (+ 業)))
>               (P (+ が)))
>           (NP-SBJ *が*)
>           (VB (+ 終)
>               (+ わ)
>               (+ る))
>           (PU (+ 。)))
>   (ID 7_textbook_kisonihongo;page_13;JP))
> EOF
-| ( (IP-MAT (PP (NP (N 授業))
-|               (P が))
-|           (NP-SBJ *が*)
-|           (VB 終わる)
-|           (PU 。))
-|   (ID 7_textbook_kisonihongo;page_13;JP))

SEE ALSO

tree_to_charactertree(1)



csearch_collect(1)csearch_collect(1)

NAME

csearch_collect - collect parse data

SYNOPSIS

csearch_collect [OPTIONS]

DESCRIPTION

Sends the content of all of the CorpusSearch files (default file extension: .psd) of the current directory to stdout.

OPTIONS

--filter causes program to act as a filter, reading tree input from stdin
[A-Za-z0-9][A-Za-z0-9]* is used as the extension name for gathered CorpusSearch files


csearch_fix_numbering(1)csearch_fix_numbering(1)

NAME

csearch_fix_numbering - (re)number parse data

SYNOPSIS

csearch_fix_numbering [i]

DESCRIPTION

Filter to (re)number CorpusSearch parse data from stdin.

Numbering starts from i if given.



csearch_flatten(1)csearch_flatten(1)

NAME

csearch_flatten - format trees

SYNOPSIS

csearch_flatten

DESCRIPTION

Transforms CorpusSearch files of the current directory so each tree occupies one line.

BE AWARE: there is no backup to return to original file states.



csearch_unflatten(1)csearch_unflatten(1)

NAME

csearch_unflatten - format trees

SYNOPSIS

csearch_unflatten

DESCRIPTION

Transforms CorpusSearch files of the current directory so each tree is possibly (re)numbered and occupies multiple lines.

BE AWARE: there is no backup to return to original file states.



grammar_tags(1)grammar_tags(1)

NAME

grammar_tags - extract tag list

SYNOPSIS

grammar_tags

DESCRIPTION

finds tags from a list of rules generated by munge-trees -c



inline_to_slice(1)inline_to_slice(1)

NAME

inline_to_slice - convert format

SYNOPSIS

inline_to_slice

DESCRIPTION

Filter to convert tagged constituent information in slice format.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | inline_to_slice
> 輪_N@0:0_NP@0:1_PP@0:2_IP-MAT@0:3 が_P@1:0_PP@0:2_IP-MAT@0:3 *が*_NP-SBJ@2:0_IP-MAT@0:3 回る_VB@3:0_IP-MAT@0:3 。_PU@4:0_IP-MAT@0:3 ID_1_textbook_kisonihongo;page_13;AT1-1;JP
> EOF
-|  IP-MAT@0:3 PP@0:2 NP@0:1 N@0:0 輪
-|  IP-MAT@0:3 PP@0:2 P@1:0 が
-|  IP-MAT@0:3 NP-SBJ@2:0 *が*
-|  IP-MAT@0:3 VB@3:0 回る
-|  IP-MAT@0:3 PU@4:0 。
-|  ID 1_textbook_kisonihongo;page_13;AT1-1;JP
-| 

SEE ALSO

slice_to_inline(1)



Keyaki(1)Keyaki(1)

NAME

Keyaki - access data

SYNOPSIS

Keyaki Keyaki number Keyaki number number Keyaki word Keyaki word number Keyaki number number -v Keyaki word number -v

DESCRIPTION

Allows easy access to the content of all Keyaki files.

No arguments returns LIST, a list of all files of Keyaki preceded by a number.

With number n given as the only argument, all content of the n-th file of LIST is returned.

With two numbers n and m given as the only arguments, the m-th tree of the the n-th file of LIST is returned.

A word given as the only argument returns all content of the file(s) whose name(s) (partially) match word.

A word and number m given as the only arguments returns the m-th tree of the file(s) whose name(s) (partially) match word.

Including the -v flag with either two numbers or a word and number opens an editor at the tree that would have been returned without the -v flag.

OPTIONS

--count|-count)with number of trees counts
--overview|-overview)overview mode
[0-9]*)number
-v)edit example with vim
-*)show this help message
*)collect word to grep


Keyaki_grep(1)Keyaki_grep(1)

NAME

Keyaki_grep - grep content of Keyaki treebank

SYNOPSIS

Keyaki_grep OPTIONS Keyaki_grep word Keyaki_grep word word ...

DESCRIPTION

Grep the word content of the Keyaki treebank. Words are returned together with reference information for accessing the corresponding tree.

OPTIONS

--realupdate)update database locally (this takes time ...)
--update)update database from github
--html)create html output
--text)create text output
--genre)specify a genre
--full)show full slice information
--fine)show more slice information
--keep)keep node position information
--part*)show parts output
--count*)count parts
--flame*)flame graph parts
--fragment*)show parse fragments as output
--undecorat*)show parse fragments undecorated
--pattern*)show patterns with parse fragments
--greedy)n-th greedy search (n intervening characters between segmented characters) must be followed by a number
--mine)use supplied space separated segmentation (default is character segmentation)
--comainu)use segmentation supplied by Comainu (default is character segmentation)
--liberal)allow overlaps with the segmentation (default is to be strict, but only at the far left and far right edges)
--strict)use supplied segmentation exactly (defaults to --mine segmentation, which can be overridden with an explicit --comainu)
--id)output ID list for trees found
-*)show this help message
*)collect words to grep


Keyaki_location(1)Keyaki_location(1)

NAME

Keyaki_location - print folder name

SYNOPSIS

Keyaki_location

DESCRIPTION

Sends the name of the folder that contains the Keyaki Treebank to stdout.

This command has no options.



obfuscate_to_tree(1)obfuscate_to_tree(1)

NAME

obfuscate_to_tree - transform tree

SYNOPSIS

obfuscate_to_tree filename

DESCRIPTION

Filter to add words to an obfuscated tree. Information on the words to add comes from a file given as a required argument that has one character per line.

OPTIONS

--number)number all characters
--example)show an example
-*)show this help message

EXAMPLE

$ cat WORDS
> 授
> 業
> が
> 終
> わ
> る
> 。
$ cat << EOF | obfuscate_to_tree WORDS | munge-trees -p
> ( (IP-MAT (PP (NP (N ⛔⛔))
>               (P ⛔))
>           (NP-SBJ *が*)
>           (VB ⛔⛔⛔)
>           (PU ⛔))
>   (ID 7_textbook_kisonihongo;page_13;JP))
> EOF
-| ( (IP-MAT (PP (NP (N 授業))
-|               (P が))
-|           (NP-SBJ *が*)
-|           (VB 終わる)
-|           (PU 。))
-|   (ID 7_textbook_kisonihongo;page_13;JP))

SEE ALSO

tree_to_obfuscate(1)



slice_to_inline(1)slice_to_inline(1)

NAME

slice_to_inline - convert format

SYNOPSIS

slice_to_inline

DESCRIPTION

Filter to convert slice format constituent information in inline tagged format.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | slice_to_inline
>  IP-MAT@0:3 PP@0:2 NP@0:1 N@0:0 輪
>  IP-MAT@0:3 PP@0:2 P@1:0 が
>  IP-MAT@0:3 NP-SBJ@2:0 *が*
>  IP-MAT@0:3 VB@3:0 回る
>  IP-MAT@0:3 PU@4:0 。
>  ID 1_textbook_kisonihongo;page_13;AT1-1;JP
> 
> EOF
-| 輪_N@0:0_NP@0:1_PP@0:2_IP-MAT@0:3 が_P@1:0_PP@0:2_IP-MAT@0:3 *が*_NP-SBJ@2:0_IP-MAT@0:3 回る_VB@3:0_IP-MAT@0:3 。_PU@4:0_IP-MAT@0:3 ID_1_textbook_kisonihongo;page_13;AT1-1;JP

SEE ALSO

inline_to_slice(1)



slice_to_tree(1)slice_to_tree(1)

NAME

slice_to_tree - output tree from vertical slices

SYNOPSIS

slice_to_tree [OPTIONS]

DESCRIPTION

Filter to return a bracketed tree from tree information given as vertical slices. A blank link of the input signals creation of a new tree.

OPTIONS

--top|-top)output with TOP as root
--example)show an example
*)show this help message

EXAMPLES

$ cat << EOF | slice_to_tree | munge-trees -p
> TOP IP-MAT NP-SBJ D The
> TOP IP-MAT NP-SBJ N gostak
> TOP IP-MAT VP VBP distims
> TOP IP-MAT VP NP-OB1 D the
> TOP IP-MAT VP NP-OB1 N doshes
> TOP IP-MAT . .
> 
> EOF
-| (TOP (IP-MAT (NP-SBJ (D The)
-|                      (N gostak))
-|              (VP (VBP distims)
-|                  (NP-OB1 (D the)
-|                          (N doshes)))
-|              (. .)))
$ cat << EOF | slice_to_tree --top | munge-trees -p
> can have :ARG0 mankind
> can have :ARG1 noblest cause
> can have :AT stake
> fight :ARG0 they
> fight :FOR freedom
> undertake :ARG0 they
> undertake :ARG1 noblest cause
> 
> EOF
-| (can (have (:ARG0 mankind)
-|            (:ARG1 (noblest cause))
-|            (:AT stake)))
-| (fight (:ARG0 they)
-|        (:FOR freedom))
-| (undertake (:ARG0 they)
-|            (:ARG1 (noblest cause)))

SEE ALSO

tree_to_slice(1)



tnt_character(1)tnt_character(1)

NAME

tnt_character - split tnt analysis on characters

SYNOPSIS

tnt_character

DESCRIPTION

Filter to split tnt analysis on characters.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tnt_character
> 授業	N
> が	P
> 終わる	VB
> 。	PU
> EOS
> EOF
-| 授	<N
-| 業	N>
-| が	<P>
-| 終	<VB
-| わ	VB
-| る	VB>
-| 。	<PU>
-| EOS

SEE ALSO

tree_to_tnt



tree_to_charactertree(1)tree_to_charactertree(1)

NAME

tree_to_charactertree - transform tree

SYNOPSIS

tree_to_charactertree

DESCRIPTION

Filter to transform tree with terminals split into characters.

OPTIONS

--number)number all characters
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tree_to_charactertree | munge-trees -p
> ( (IP-MAT (PP (NP (N 授業))
>               (P が))
>           (NP-SBJ *が*)
>           (VB 終わる)
>           (PU 。))
>   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))
> EOF
-| ( (IP-MAT (PP (NP (N (+ 授)
-|                      (+ 業)))
-|               (P (+ が)))
-|           (NP-SBJ *が*)
-|           (VB (+ 終)
-|               (+ わ)
-|               (+ る))
-|           (PU (+ 。)))
-|   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))

SEE ALSO

tree_to_tnt">tree_to_tnt(1)



tree_to_obfuscate(1)tree_to_obfuscate(1)

NAME

tree_to_obfuscate - transform tree

SYNOPSIS

tree_to_obfuscate

DESCRIPTION

Filter to obfuscate a tree by removing its words.

OPTIONS

--number)number all characters
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tree_to_obfuscate | munge-trees -p
> ( (IP-MAT (PP (NP (N 授業))
>               (P が))
>           (NP-SBJ *が*)
>           (VB 終わる)
>           (PU 。))
>   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))
> EOF
-| ( (IP-MAT (PP (NP (N ⛔⛔))
-|               (P ⛔))
-|           (NP-SBJ *が*)
-|           (VB ⛔⛔⛔)
-|           (PU ⛔))
-|   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))

SEE ALSO

obfuscate_to_tree(1)



tree_to_slice(1)tree_to_slice(1)

NAME

tree_to_slice - slice tree vertically

SYNOPSIS

tree_to_slice [OPTIONS]

DESCRIPTION

Filter to return all vertical slices of an input tree.

Requires Python with the NLTK library (http://www.nltk.org) to work.

OPTIONS

--top|-top)append TOP to each line of output
--example)show an example
*)show this help message

EXAMPLES

$ cat << EOF | tree_to_slice --top
> (IP-MAT (NP-SBJ (D The)
>                 (N gostak))
>         (VP (VBP distims)
>             (NP-OB1 (D the)
>                     (N doshes)))
>         (. .))
> EOF
-| TOP IP-MAT NP-SBJ D The
-| TOP IP-MAT NP-SBJ N gostak
-| TOP IP-MAT VP VBP distims
-| TOP IP-MAT VP NP-OB1 D the
-| TOP IP-MAT VP NP-OB1 N doshes
-| TOP IP-MAT . .
-| 
$ cat << EOF | tree_to_slice
> (TOP (can (have (:ARG0 mankind)
>                 (:ARG1 (noblest cause))
>                 (:AT stake)))
>      (fight (:ARG0 they)
>             (:FOR freedom))
>      (undertake (:ARG0 they)
>                 (:ARG1 (noblest cause))))
> EOF
-| TOP can have :ARG0 mankind
-| TOP can have :ARG1 noblest cause
-| TOP can have :AT stake
-| TOP fight :ARG0 they
-| TOP fight :FOR freedom
-| TOP undertake :ARG0 they
-| TOP undertake :ARG1 noblest cause
-| 

SEE ALSO

slice_to_tree(1)



Last updated: October 26, 2017