Download Manual pages More Tools

Welcome to

Treebank Utilities

What?

These utilities assist building treebanks, either by providing methods of access to parsed data, methods to (re-)process parsed data, or methods to visualise parsed data. This includes ways to process trees for alpinocorpus, dact, TGrep2, Tregex, CorpusSearch, TigerSearch, and Salto.

How?

The tools are all command line scripts, written with sed, bash, gawk (with XML library extension), xsltproc, Python (with NLTK and lxml), munge-trees, and tregex.

Acknowledgements

Development is funded by the project Development of and Linguistic Research with a Parsed Corpus of Japanese of the National Institute for Japanese Language and Linguistics and the Japan Society for the Promotion of Science (JSPS). Earlier development was funded by the Japan Science and Technology Agency (JST) and an NTT agreement dated 06/24/2014.

Feedback

Feedback is extremely welcome. Please email: ajb129 __AT__ hotmail __DOT__ com.



Manual pages

    alpino_to_html(1)render Alpino XML parse
    alpino_to_parse(1)transfrorm parse
    alpino_to_pattern(1)create search pattern
    alpino_to_svg(1)output SVG image
    alpino_to_tiger(1)transfrorm Alpino XML to TigerXML
    charactertree_to_tree(1)transform tree
    entr_tree(1)show up-to-date tree
    extract_data(1)access parsed data
    inline_to_slice(1)convert format
    multi-sentence_to_single(1)change multi-sentence trees
    obfuscate_to_tree(1)transform tree
    parse_binarize(1)modify treebank data
    parse_decorate(1)filter to change tree
    parse_discourse_split(1)creates filter program for discoure splits
    parse_to_alpino(1)transform to Alpino XML format
    parse_undecorate(1)remove node decorations
    select_data(1)returns i-th (to j-th) data
    slice_to_inline(1)convert format
    slice_to_tree(1)output tree from vertical slices
    table_to_tree(1)convert tabular information to trees
    tgrep_to_xpath(1)transform TGrep to XPath
    tiger_to_dot(1)transfrorm TIGER XML parse
    tnt_character(1)split tnt analysis on characters
    tree_clip_fragment(1)prune tree structure
    tree_to_charactertree(1)transform tree
    tree_to_obfuscate(1)transform tree
    tree_to_slice(1)slice tree vertically
    tree_to_table(1)convert trees to tabular format
    tree_to_tnt(1)convert trees to TnT format
    treetee(1)tee data
    tregex(1)wrapper script for stanford-tregex
    tregex_location(1)print tregex location
    tsurgeon_script(1)pipeline tsurgeon
    tsurgeon_script_animate(1)animate tsurgeon changes
    yield(1)obtain yield

In examples munge-trees by Mark Johnson is used to reformat trees. Anything else exceptional is noted.



alpino_to_html(1)alpino_to_html(1)

NAME

alpino_to_html - render Alpino XML parse

SYNOPSIS

alpino_to_html [OPTIONS]

DESCRIPTION

Filter to transform Alpino XML parsed data from stdin into svg rendered trees and included in a html page that is sent to stdout. Also produces auxiliary files tooltip.css and tooltip.js needed for successfully renderin the html output.

OPTIONS

--css|--js)make css and js files
--yield)include yield
--search)include search information
--id)include id information as h3 level heading
*)show this help message

SEE ALSO

parse_to_alpino(1)



alpino_to_parse(1)alpino_to_parse(1)

NAME

alpino_to_parse - transfrorm parse

SYNOPSIS

alpino_to_parse [OPTIONS]

DESCRIPTION

Filter to transform a parse from Alpino XML format into Penn bracketed format.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | alpino_to_parse
> <alpino_ds id="23_BUFFALO;EN" version="1.3">
>   <node cat="ip-mat" id="1" begin="0" end="4">
>     <node cat="np-sbj" id="2" begin="0" end="2">
>       <node pt="d" word="A" id="3" begin="0" end="1"/>
>       <node pt="n" word="cat" id="4" begin="1" end="2"/>
>     </node>
>     <node pt="vbp" word="enters" id="5" begin="2" end="3"/>
>     <node pt="." word="." id="6" begin="3" end="4"/>
>   </node>
>   <sentence>A cat enters .</sentence>
> </alpino_ds>
> EOF
-| ( (IP-MAT (NP-SBJ (D A) (N cat)) (VBP enters) (. .)) (ID 23_BUFFALO;EN))

SEE ALSO

parse_to_alpino(1)



alpino_to_pattern(1)alpino_to_pattern(1)

NAME

alpino_to_pattern - create search pattern

SYNOPSIS

alpino_to_pattern [OPTIONS]

DESCRIPTION

Filter to take an Alpino XML parsed tree from stdin and create a search pattern that is able to find the tree. Options allow changing the language in which patterns are written. The default is Alpino XML xpath.

OPTIONS

--xpath)create tregex patterns
--tregex)create tregex patterns
--tgrep2)create tgrep2 patterns
--tgreplite)create tgrep2 patterns
--csearch)create CorpusSearch patterns
--tiger)create TIGERSearch patterns
--example)show examples
*)show this help message

EXAMPLES

$ cat << EOF | alpino_to_pattern
> <node cat="ip-mat" id="1" begin="0" end="4">
>   <node cat="advp" id="2" begin="0" end="1">
>     <node pt="adv" word="__" id="3" begin="0" end="1"/>
>   </node>
>   <node cat="advp" id="4" begin="1" end="2">
>     <node pt="adv" word="__" id="5" begin="1" end="2"/>
>   </node>
>   <node pt="ax" word="で" id="6" begin="2" end="3"/>
>   <node pt="vb2" word="__" id="7" begin="3" end="4"/>
>   <sentence>__ __ で __</sentence>
> </node>
> EOF
-| //node[matches(@cat,'(^|\W)ip-mat($|\W)') and node[matches(@cat,'(^|\W)advp($|\W)') and node[matches(@pt,'(^|\W)adv($|\W)')] and number(@end) <= ../node[matches(@cat,'(^|\W)advp($|\W)') and node[matches(@pt,'(^|\W)adv($|\W)')] and number(@end) <= ../node[matches(@pt,'(^|\W)ax($|\W)') and @word='で' and number(@end) <= ../node[matches(@pt,'(^|\W)vb2($|\W)')]/number(@begin)]/number(@begin)]/number(@begin)] and node[matches(@cat,'(^|\W)advp($|\W)') and node[matches(@pt,'(^|\W)adv($|\W)')] and number(@end) <= ../node[matches(@pt,'(^|\W)ax($|\W)') and @word='で' and number(@end) <= ../node[matches(@pt,'(^|\W)vb2($|\W)')]/number(@begin)]/number(@begin)] and node[matches(@pt,'(^|\W)ax($|\W)') and @word='で' and number(@end) <= ../node[matches(@pt,'(^|\W)vb2($|\W)')]/number(@begin)] and node[matches(@pt,'(^|\W)vb2($|\W)')]]
$ cat << EOF | alpino_to_pattern --tregex
> <node cat="ip-mat" id="1" begin="0" end="4">
>   <node cat="advp" id="2" begin="0" end="1">
>     <node pt="adv" word="__" id="3" begin="0" end="1"/>
>   </node>
>   <node cat="advp" id="4" begin="1" end="2">
>     <node pt="adv" word="__" id="5" begin="1" end="2"/>
>   </node>
>   <node pt="ax" word="で" id="6" begin="2" end="3"/>
>   <node pt="vb2" word="__" id="7" begin="3" end="4"/>
>   <sentence>__ __ で __</sentence>
> </node>
> EOF
-| /IP-MAT\b/ < (/ADVP\b/ < (/ADV\b/ < __) $.. (/ADVP\b/ < (/ADV\b/ < __) $.. (/AX\b/ < /で\b/ $.. (/VB2\b/ < __))))
$ cat << EOF | alpino_to_pattern --csearch
> <node cat="ip-mat" id="1" begin="0" end="4">
>   <node cat="advp" id="2" begin="0" end="1">
>     <node pt="adv" word="__" id="3" begin="0" end="1"/>
>   </node>
>   <node cat="advp" id="4" begin="1" end="2">
>     <node pt="adv" word="__" id="5" begin="1" end="2"/>
>   </node>
>   <node pt="ax" word="で" id="6" begin="2" end="3"/>
>   <node pt="vb2" word="__" id="7" begin="3" end="4"/>
>   <sentence>__ __ で __</sentence>
> </node>
> EOF
-| query: ([1]IP-MAT* exists)
-|    AND ([1]IP-MAT* iDominates [2]ADVP*)
-|    AND ([2]ADVP* iDominates [3]ADV*)
-|    AND ([3]ADV* iDominates .*)
-|    AND ([2]ADVP* Precedes [4]ADVP*)
-|    AND ([4]ADVP* iDominates [5]ADV*)
-|    AND ([5]ADV* iDominates .*)
-|    AND ([4]ADVP* Precedes [6]AX*)
-|    AND ([6]AX* iDominates で)
-|    AND ([6]AX* Precedes [7]VB2*)
-|    AND ([7]VB2* iDominates .*)


alpino_to_svg(1)alpino_to_svg(1)

NAME

alpino_to_svg - output SVG image

SYNOPSIS

alpino_to_svg [OPTIONS]

DESCRIPTION

Filter to output an SVG image of an input tree.

To see working, send a tree into the following:

parse_indexed --iml --salsa --clean | add_zero --word | parse_to_alpino | alpino_to_svg --lines

OPTIONS

--html)embeddable into html
--line*)add lines to output
*)show this help message


alpino_to_tiger(1)alpino_to_tiger(1)

NAME

alpino_to_tiger - transfrorm Alpino XML to TigerXML

SYNOPSIS

alpino_to_tiger [OPTIONS]

DESCRIPTION

Filter to transform Alpino XML parsed data from stdin into TigerXML parsed data.

OPTIONS

--name)use given name
--addhead)add head to headless output
--headless)go headless
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | alpino_to_tiger
> <alpino_ds id="23_BUFFALO;EN" version="1.3">
>   <node cat="ip-mat" id="1" begin="0" end="4">
>     <node cat="np-sbj" id="2" begin="0" end="2">
>       <node pt="d" word="A" id="3" begin="0" end="1"/>
>       <node pt="n" word="cat" id="4" begin="1" end="2"/>
>     </node>
>     <node pt="vbp" word="enters" id="5" begin="2" end="3"/>
>     <node pt="." word="." id="6" begin="3" end="4"/>
>   </node>
>   <sentence>A cat enters .</sentence>
> </alpino_ds>
> EOF
-| <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-| <corpus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./schema/TigerXML.xsd" id="BUFFALO">
-|   <head>
-|     <meta>
-|       <name>BUFFALO</name>
-|       <format>TigerXML</format>
-|     </meta>
-|     <annotation>
-|       <feature name="word" domain="T">
-|         <value name="."/>
-|         <value name="A"/>
-|         <value name="cat"/>
-|         <value name="enters"/>
-|       </feature>
-|       <feature name="pos" domain="T">
-|         <value name=".">.</value>
-|         <value name="D">D</value>
-|         <value name="N">noun</value>
-|         <value name="VBP">VBP</value>
-|       </feature>
-|       <feature name="cat" domain="NT">
-|         <value name="IP" />
-|         <value name="NP" />
-|         <value name="TOP" />
-|       </feature>
-|       <edgelabel>
-|         <value name="--" />
-|         <value name="MAT" />
-|         <value name="SBJ" />
-|       </edgelabel>
-|       <secedgelabel>
-|         <value name="--" />
-|       </secedgelabel>
-|     </annotation>
-|   </head>
-|   <body>
-|     <s id="23_BUFFALO">
-|       <graph root="23_BUFFALO_0">
-|         <terminals>
-|           <t id="23_BUFFALO_3" word="A" pos="D" />
-|           <t id="23_BUFFALO_4" word="cat" pos="N" />
-|           <t id="23_BUFFALO_5" word="enters" pos="VBP" />
-|           <t id="23_BUFFALO_6" word="." pos="." />
-|         </terminals>
-|         <nonterminals>
-|           <nt id="23_BUFFALO_0" cat="TOP">
-|             <edge idref="23_BUFFALO_1" label="MAT" />
-|           </nt>
-|           <nt id="23_BUFFALO_2" cat="NP">
-|             <edge idref="23_BUFFALO_3" label="--" />
-|             <edge idref="23_BUFFALO_4" label="--" />
-|           </nt>
-|           <nt id="23_BUFFALO_1" cat="IP">
-|             <edge idref="23_BUFFALO_2" label="SBJ" />
-|             <edge idref="23_BUFFALO_5" label="--" />
-|             <edge idref="23_BUFFALO_6" label="--" />
-|           </nt>
-|         </nonterminals>
-|       </graph>
-|     </s>
-|   </body>
-| </corpus>

SEE ALSO

parse_to_alpino(1)



charactertree_to_tree(1)charactertree_to_tree(1)

NAME

charactertree_to_tree - transform tree

SYNOPSIS

charactertree_to_tree

DESCRIPTION

Filter to collapse the nodes of a character tree.

OPTIONS

EXAMPLE

$ cat << EOF | charactertree_to_tree | munge-trees
> ( (IP-MAT (PP (NP (N (+ 授)
>                      (+ 業)))
>               (P (+ が)))
>           (NP-SBJ *が*)
>           (VB (+ 終)
>               (+ わ)
>               (+ る))
>           (PU (+ 。)))
>   (ID 7_textbook_kisonihongo;page_13;JP))
> EOF
-| ( (IP-MAT (PP (NP (N 授業))
-|               (P が))
-|           (NP-SBJ *が*)
-|           (VB 終わる)
-|           (PU 。))
-|   (ID 7_textbook_kisonihongo;page_13;JP))

SEE ALSO

tree_to_charactertree(1)



entr_tree(1)entr_tree(1)

NAME

entr_tree - show up-to-date tree

SYNOPSIS

entr_tree [OPTIONS]

DESCRIPTION

Script to start draw-tree, showing the content of /home/glenda/tmp/tree. Whenever the content of /home/glenda/tmp/tree is changed to a new tree, the new tree is redrawn.

OPTIONS

--stop|-stop)stop the currently active entr_tree
-s|--size)set size of text
*)show this help message


extract_data(1)extract_data(1)

NAME

extract_data - access parsed data

SYNOPSIS

extract_data dir extract_data dir i extract_data dir i j extract_data dir pattern extract_data dir pattern j

DESCRIPTION

Allows easy access to the content of all (CorpusSearch parsed) files with .psd extension in directory dir. Returned trees include ID tags by default.

Directory name dir must be supplied, followed optionally by one or two numbers, i and j respectively, or else by a pattern that is with or without a number j.

With no numbers or pattern, a numbered list of all the .psd file goes to stdout.

With i supplied, the content of the i-th .psd file goes to stdout.

With a pattern supplied, the content of all files with names that contain pattern goes to stdout.

With i and j supplied, the j-th tree of the i-th .psd file goes to stdout.

With pattern and j supplied, the j-th tree of each file with a name containing pattern goes to stdout.

j can also be a range of numbers. For example, the command

extract_data dir 2 3,7

will send to stdout trees 3 to 7 of the 2nd *.psd file in dir.

Selecting non consecutive examples is also possible. For example extract_data dir 2 "3p;7"

will send to stdout trees 3 and 7 (but not trees 4-6) of the 2nd .psd file in dir.

Also,

extract_data dir 2 "3p;3p;3"

will send to stdout three instances of tree 3 There are further options: -v,--edit (for quick editing access) -id,--id (to return the example id only) -ptb,--ptb (to return the example with ID removed and given TOP as root node) For example extract_data dir 4 6 --edit will open an editor (vim) at the point of tree 6 in the 4th .psd file of dir.

To work this script requires munge-trees.

OPTIONS

--extension|-e)specify extension name of source data, e.g., --extension mrg (default psd)
--data|--dir)specify data location
--edit)echo edit number and file command
--show)echo location of file
--look)collect words
-*)show this help message
[0-9]*)collect numbers
*)collect words


inline_to_slice(1)inline_to_slice(1)

NAME

inline_to_slice - convert format

SYNOPSIS

inline_to_slice

DESCRIPTION

Filter to convert tagged constituent information in slice format.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | inline_to_slice
> 輪_N@0:0_NP@0:1_PP@0:2_IP-MAT@0:3 が_P@1:0_PP@0:2_IP-MAT@0:3 *が*_NP-SBJ@2:0_IP-MAT@0:3 回る_VB@3:0_IP-MAT@0:3 。_PU@4:0_IP-MAT@0:3 ID_1_textbook_kisonihongo;page_13;AT1-1;JP
> EOF
-|  IP-MAT@0:3 PP@0:2 NP@0:1 N@0:0 輪
-|  IP-MAT@0:3 PP@0:2 P@1:0 が
-|  IP-MAT@0:3 NP-SBJ@2:0 *が*
-|  IP-MAT@0:3 VB@3:0 回る
-|  IP-MAT@0:3 PU@4:0 。
-|  ID 1_textbook_kisonihongo;page_13;AT1-1;JP
-| 

SEE ALSO

slice_to_inline(1)



multi-sentence_to_single(1)multi-sentence_to_single(1)

NAME

multi-sentence_to_single - change multi-sentence trees

SYNOPSIS

multi-sentence_to_single

DESCRIPTION

Filter to change a multi-sentence parse tree into multiple parse trees.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | multi-sentence_to_single
> ( (multi-sentence (IP-MAT (NP-SBJ (D A) (N boy)) (BEP is) (ADJP-PRD2 (ADJ happy)) (PU .)) (IP-MAT (NP-SBJ (PRO He)) (VBP laughs) (PU .))) (ID 1_ex;EN))
> EOF
-| ( (IP-MAT (NP-SBJ (D A) (N boy)) (BEP is) (ADJP-PRD2 (ADJ happy)) (PU .)) (ID 1_ex;EN))
-| ( (IP-MAT (NP-SBJ (PRO He)) (VBP laughs) (PU .)) (ID 1_ex;EN))


obfuscate_to_tree(1)obfuscate_to_tree(1)

NAME

obfuscate_to_tree - transform tree

SYNOPSIS

obfuscate_to_tree filename

DESCRIPTION

Filter to add words to an obfuscated tree. Information on the words to add comes from a file given as a required argument that has one character per line.

OPTIONS

--number)number all characters
--example)show an example
-*)show this help message

EXAMPLE

$ cat WORDS
> 授
> 業
> が
> 終
> わ
> る
> 。
$ cat << EOF | obfuscate_to_tree WORDS | munge-trees -p
> ( (IP-MAT (PP (NP (N ⛔⛔))
>               (P ⛔))
>           (NP-SBJ *が*)
>           (VB ⛔⛔⛔)
>           (PU ⛔))
>   (ID 7_textbook_kisonihongo;page_13;JP))
> EOF
-| ( (IP-MAT (PP (NP (N 授業))
-|               (P が))
-|           (NP-SBJ *が*)
-|           (VB 終わる)
-|           (PU 。))
-|   (ID 7_textbook_kisonihongo;page_13;JP))

SEE ALSO

tree_to_obfuscate(1)



parse_binarize(1)parse_binarize(1)

NAME

parse_binarize - modify treebank data

SYNOPSIS

parse_binarize [OPTIONS]

DESCRIPTION

Binarize treebank data.

OPTIONS

--left)make tree output binary branching for left headed language (e.g., English)
--keepconj)do not add coordination at the IP level
--example)show examples
*)show this help message

EXAMPLES

$ cat << EOF | parse_binarize | munge-trees -p
> ( (IP-MAT (PP-SBJ (NP (N ゴスタック))
>                   (P-OPTR は))
>           (PP-OB1 (NP (N ドッシュ))
>                   (P-ROLE を))
>           (VB ディスティム)
>           (VB0 し)
>           (AX ます)
>           (PU 。))
>   (ID 20_BUFFALO;JP))
> EOF
-| ( (IP-MAT (IML (PP-SBJ (NP (N ゴスタック))
-|                        (P-OPTR は))
-|                (IML (PP-OB1 (NP (N ドッシュ))
-|                             (P-ROLE を))
-|                     (IML (IML (VB ディスティム)
-|                               (VB0 し))
-|                          (AX ます))))
-|           (PU 。))
-|   (ID 20_BUFFALO;JP))
$ cat << EOF | parse_binarize --left | munge-trees -p
> ( (IP-MAT (NP-SBJ (PRO I))
>           (VBD went)
>           (PP (P on)
>               (NP (D a)
>                   (N trip)))
>           (PP (P to)
>               (NP (NPR Kyoto)))
>           (NP-TMP (ADJ last)
>                   (N week))
>           (. .))
>   (ID 41_textbook_djg_basic;page_116;AT2-11;EN))
> EOF
-| ( (IP-MAT (NP-SBJ (PRO I))
-|           (IML (IML (IML (IML (VBD went)
-|                               (PP (P on)
-|                                   (NP (D a)
-|                                       (N trip))))
-|                          (PP (P to)
-|                              (NP (NPR Kyoto))))
-|                     (NP-TMP (ADJ last)
-|                             (N week)))
-|                (. .)))
-|   (ID 41_textbook_djg_basic;page_116;AT2-11;EN))


parse_decorate(1)parse_decorate(1)

NAME

parse_decorate - filter to change tree

SYNOPSIS

parse_decorate

DESCRIPTION

Filter that by default decorates nodes with extended tags. In particular various P roles are elaborated. Also extends selects word_1 of terminals.

OPTIONS

--script)send tsurgeon script to stdout
--keep)neither decorate nodes nor reposition functional information to remove stars (other consequences depend on presence of other flags)
--frame2)keep frame information
--frame*|--sense)keep frame information
--pruneframe)remove frame information
--comment)keep comments
--essence|-e)retain only essential aspects of parse
--luw)make long unit words
--removeluw)remove long unit words
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | parse_decorate | munge-trees -p
> ( (IP-MAT (NP-SBJ;{SPEAKER_28} *pro*)
>           (PP (NP (NPR O王))
>               (P を))
>           (VB 追お)
>           (MD う))
>   (ID 28_misc_BUFFALO;JP))
> EOF
-| ( (IP-MAT (NP-SBJ;{SPEAKER_28} *pro*)
-|           (PP (NP (NPR O王))
-|               (P-ROLE を))
-|           (VB 追お)
-|           (MD う))
-|   (ID 28_misc_BUFFALO;JP))

SEE ALSO

parse_undecorate(1)



parse_discourse_split(1)parse_discourse_split(1)

NAME

parse_discourse_split - creates filter program for discoure splits

SYNOPSIS

parse_discourse_split file

DESCRIPTION

Takes parsed sentences from stdin and calculates the discourse split of the sentences with respect to file which should contain the yield of the parsed sentences only with a segmentation of one discourse per line. Sends a bash filter program as stdout.

OPTIONS

--example)show an example
-*)show this help message
*)input file containing yield with one discourse per line

EXAMPLE

$ cat example.txt
-| A cat enters.  It departs.
-| A cat entered.
$ cat << EOF | parse_discourse_split example.txt
> ( (IP-MAT (NP-SBJ (D A) (N cat)) (VBP enters) (. .)) (ID 1_ex;EN))
> ( (IP-SUB (NP-SBJ (PRO It)) (VBP departs) (. .)) (ID 2_ex;EN))
> ( (IP-MAT (NP-SBJ (D A) (N cat)) (VBD entered) (. .)) (ID 3_ex;EN))
> EOF
-| #!/bin/bash
-| 
-| function implode () {
-| awk ' { printf("%s%s", NR == 1 ? "" : " ", $0) } ; END { printf("\n") } '
-| }
-| 
-| TEMP=${TMP:-"/tmp"}/output_split$$
-| 
-| cat > "${TEMP}"
-| 
-| cat "${TEMP}" | sed -n 1,2p | implode
-| cat "${TEMP}" | sed -n 3p | implode
-| 
-| rm "${TEMP}"
-| 

SEE ALSO

select_data(1)



parse_to_alpino(1)parse_to_alpino(1)

NAME

parse_to_alpino - transform to Alpino XML format

SYNOPSIS

parse_to_alpino [OPTIONS]

DESCRIPTION

Filter to take a Penn parsed tree from stdin and change to Alpino XML format.

OPTIONS

--raw)without extra post processing
--example)show examples
*)show this help message

EXAMPLE

$ cat << EOF | parse_to_alpino
> (IP (ADVP (ADV __)) (ADVP (ADV __)) (AX で) (VB2 __))
> EOF
-| 
-| <alpino_ds id="_" version="1.3">
-|   <node cat="ip" id="1" begin="0" end="4">
-|     <node cat="advp" id="2" begin="0" end="1">
-|       <node pt="adv" id="3" begin="0" end="1">
-|       </node>
-|     </node>
-|     <node cat="advp" id="4" begin="1" end="2">
-|       <node pt="adv" id="5" begin="1" end="2">
-|       </node>
-|     </node>
-|     <node pt="ax" word="で" id="6" begin="2" end="3">
-|     </node>
-|     <node pt="vb2" id="7" begin="3" end="4">
-|     </node>
-|   </node>
-|   <sentence>__ __ で __</sentence>
-| </alpino_ds>
-| 


parse_undecorate(1)parse_undecorate(1)

NAME

parse_undecorate - remove node decorations

SYNOPSIS

parse_undecorate [OPTIONS]

DESCRIPTION

Filter to undecorate nodes by removing extended tags and creating star information.

Requires tsurgeon_script(1) to work.

OPTIONS

--script)send tsurgeon script to stdout
--extra)make extra changes, notably to remove SORT information
--essence|-e)retain only essential aspects of parse
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | parse_undecorate | munge-trees -p
> ( (IP-MAT (PP (NP (IP-REL (NP-SBJ *T*)
>                           (PP (NP (N 両耳受聴))
>                               (P-ROLE によって))
>                           (VB 得る))
>                   (N 情報))
>               (P-ROLE に)
>               (P-OPTR は))
>           (PP-SBJ (NP (CONJP (NP (N パワースペクトル情報))
>                              (P-CONN と))
>                       (NP (N 両耳間位相差)))
>                   (P-ROLE が))
>           (VB あり)
>           (AX ます))
>   (ID example;JP))
> EOF
-| ( (IP-MAT (PP (NP (IP-REL (NP-SBJ *T*)
-|                           (PP (NP (N 両耳受聴))
-|                               (P によって))
-|                           (VB 得る))
-|                   (N 情報))
-|               (P に)
-|               (P は))
-|           (PP (NP (CONJP (NP (N パワースペクトル情報))
-|                          (P と))
-|                   (NP (N 両耳間位相差)))
-|               (P が))
-|           (NP-SBJ *が*)
-|           (VB あり)
-|           (AX ます))
-|   (ID example;JP))

SEE ALSO

parse_decorate(1)



select_data(1)select_data(1)

NAME

select_data - returns i-th (to j-th) data

SYNOPSIS

select_data select_data i [j]

DESCRIPTION

Filter to select the i-th through to the j-th data from stdin.

Assumes data is separated either by EOS or a blank line.

Requires that j >= i.

If j is missing, the i-th data only is returned.

If i is also missing and there is no -n (or --number) option given, the number of data instances is printed as the only output.

OPTIONS

-n|--number)send stdin to stdout, but with items numbered
*)show this help message


slice_to_inline(1)slice_to_inline(1)

NAME

slice_to_inline - convert format

SYNOPSIS

slice_to_inline

DESCRIPTION

Filter to convert slice format constituent information in inline tagged format.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | slice_to_inline
>  IP-MAT@0:3 PP@0:2 NP@0:1 N@0:0 輪
>  IP-MAT@0:3 PP@0:2 P@1:0 が
>  IP-MAT@0:3 NP-SBJ@2:0 *が*
>  IP-MAT@0:3 VB@3:0 回る
>  IP-MAT@0:3 PU@4:0 。
>  ID 1_textbook_kisonihongo;page_13;AT1-1;JP
> 
> EOF
-| 輪_N@0:0_NP@0:1_PP@0:2_IP-MAT@0:3 が_P@1:0_PP@0:2_IP-MAT@0:3 *が*_NP-SBJ@2:0_IP-MAT@0:3 回る_VB@3:0_IP-MAT@0:3 。_PU@4:0_IP-MAT@0:3 ID_1_textbook_kisonihongo;page_13;AT1-1;JP

SEE ALSO

inline_to_slice(1)



slice_to_tree(1)slice_to_tree(1)

NAME

slice_to_tree - output tree from vertical slices

SYNOPSIS

slice_to_tree [OPTIONS]

DESCRIPTION

Filter to return a bracketed tree from tree information given as vertical slices. A blank link of the input signals creation of a new tree.

OPTIONS

--top|-top)output with TOP as root
--example)show an example
*)show this help message

EXAMPLES

$ cat << EOF | slice_to_tree | munge-trees -p
> TOP IP-MAT NP-SBJ D The
> TOP IP-MAT NP-SBJ N gostak
> TOP IP-MAT VP VBP distims
> TOP IP-MAT VP NP-OB1 D the
> TOP IP-MAT VP NP-OB1 N doshes
> TOP IP-MAT . .
> 
> EOF
-| (TOP (IP-MAT (NP-SBJ (D The)
-|                      (N gostak))
-|              (VP (VBP distims)
-|                  (NP-OB1 (D the)
-|                          (N doshes)))
-|              (. .)))
$ cat << EOF | slice_to_tree --top | munge-trees -p
> can have :ARG0 mankind
> can have :ARG1 noblest cause
> can have :AT stake
> fight :ARG0 they
> fight :FOR freedom
> undertake :ARG0 they
> undertake :ARG1 noblest cause
> 
> EOF
-| (can (have (:ARG0 mankind)
-|            (:ARG1 (noblest cause))
-|            (:AT stake)))
-| (fight (:ARG0 they)
-|        (:FOR freedom))
-| (undertake (:ARG0 they)
-|            (:ARG1 (noblest cause)))

SEE ALSO

tree_to_slice(1)



table_to_tree(1)table_to_tree(1)

NAME

table_to_tree - convert tabular information to trees

SYNOPSIS

table_to_tree [OPTIONS]

DESCRIPTION

Filter to convert tabular information that orients tree structure around parts-of-speech nodes into a trees.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | table_to_tree | munge-trees -p
> (IP-MAT(NP-SBJ* D @0_0 The
> * ADJ @0_1 quick
> * ADJ @0_2 brown
> *) N @0_3 fox
> * VBD @0_4 jumped
> (PP* P @0_5 over
> (NP* D @0_6 the
> * ADJ @0_7 lazy
> *)) N @0_8 dog
> *) . @0_9 .
> 
> EOF
-| (IP-MAT (NP-SBJ (D The)
-|                 (ADJ quick)
-|                 (ADJ brown)
-|                 (N fox))
-|         (VBD jumped)
-|         (PP (P over)
-|             (NP (D the)
-|                 (ADJ lazy)
-|                 (N dog)))
-|         (. .))

SEE ALSO

tree_to_table(1)



tgrep_to_xpath(1)tgrep_to_xpath(1)

NAME

tgrep_to_xpath - transform TGrep to XPath

SYNOPSIS

tgrep_to_xpath [OPTIONS]

DESCRIPTION

Filter to take a TGrep search expression and output an XPath search expression for Alpino XML format.

OPTIONS

--debug)print any error messages on stdout
--example)show examples
*)show this help message

EXAMPLE

$ cat << EOF | tgrep_to_xpath
> PP < /IP/ > IP
> EOF
-| //node[self::node[@word='PP'] and node[matches(@word,'IP')] and parent::node[self::node[@word='IP']]]


tiger_to_dot(1)tiger_to_dot(1)

NAME

tiger_to_dot - transfrorm TIGER XML parse

SYNOPSIS

tiger_to_dot [OPTIONS]

DESCRIPTION

Filter to transform TIGER XML parsed data from stdin into a graphviz dot graph drawing script following the tree drawing method of PaQu.

OPTIONS

*)show this help message

SEE ALSO

alpino_to_tiger(1)



tnt_character(1)tnt_character(1)

NAME

tnt_character - split tnt analysis on characters

SYNOPSIS

tnt_character

DESCRIPTION

Filter to split tnt analysis on characters.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tnt_character
> 授業	N
> が	P
> 終わる	VB
> 。	PU
> EOS
> EOF
-| 授	<N
-| 業	N>
-| が	<P>
-| 終	<VB
-| わ	VB
-| る	VB>
-| 。	<PU>
-| EOS

SEE ALSO

tree_to_tnt



tree_clip_fragment(1)tree_clip_fragment(1)

NAME

tree_clip_fragment - prune tree structure

SYNOPSIS

tree_clip_fragment [OPTIONS]

DESCRIPTION

Filter to prune structure from a parsed tree in Penn bracketed format.

OPTIONS

--left)push negated items to the left
--right)push negated items to the right
--example)show examples
*)show this help message

EXAMPLE

$ cat << EOF | tree_clip_fragment
> ( (IP-MAT (PP (NP (IP-EMB (VB __)) (N 日々)))) (ID example;JP))
> EOF
-| ( (NP (IP-EMB (VB __)) (N 日々)) (ID example;JP))


tree_to_charactertree(1)tree_to_charactertree(1)

NAME

tree_to_charactertree - transform tree

SYNOPSIS

tree_to_charactertree

DESCRIPTION

Filter to transform tree with terminals split into characters.

OPTIONS

--number)number all characters
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tree_to_charactertree | munge-trees -p
> ( (IP-MAT (PP (NP (N 授業))
>               (P が))
>           (NP-SBJ *が*)
>           (VB 終わる)
>           (PU 。))
>   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))
> EOF
-| ( (IP-MAT (PP (NP (N (+ 授)
-|                      (+ 業)))
-|               (P (+ が)))
-|           (NP-SBJ *が*)
-|           (VB (+ 終)
-|               (+ わ)
-|               (+ る))
-|           (PU (+ 。)))
-|   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))

SEE ALSO

charactertree_to_tree(1)



tree_to_obfuscate(1)tree_to_obfuscate(1)

NAME

tree_to_obfuscate - transform tree

SYNOPSIS

tree_to_obfuscate

DESCRIPTION

Filter to obfuscate a tree by removing its words.

OPTIONS

--number)number all characters
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tree_to_obfuscate | munge-trees -p
> ( (IP-MAT (PP (NP (N 授業))
>               (P が))
>           (NP-SBJ *が*)
>           (VB 終わる)
>           (PU 。))
>   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))
> EOF
-| ( (IP-MAT (PP (NP (N ⛔⛔))
-|               (P ⛔))
-|           (NP-SBJ *が*)
-|           (VB ⛔⛔⛔)
-|           (PU ⛔))
-|   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))

SEE ALSO

obfuscate_to_tree(1)



tree_to_slice(1)tree_to_slice(1)

NAME

tree_to_slice - slice tree vertically

SYNOPSIS

tree_to_slice [OPTIONS]

DESCRIPTION

Filter to return all vertical slices of an input tree.

Requires Python with the NLTK library (http://www.nltk.org) to work.

OPTIONS

--top|-top)append TOP to each line of output
--example)show an example
*)show this help message

EXAMPLES

$ cat << EOF | tree_to_slice --top
> (IP-MAT (NP-SBJ (D The)
>                 (N gostak))
>         (VP (VBP distims)
>             (NP-OB1 (D the)
>                     (N doshes)))
>         (. .))
> EOF
-| TOP IP-MAT NP-SBJ D The
-| TOP IP-MAT NP-SBJ N gostak
-| TOP IP-MAT VP VBP distims
-| TOP IP-MAT VP NP-OB1 D the
-| TOP IP-MAT VP NP-OB1 N doshes
-| TOP IP-MAT . .
-| 
$ cat << EOF | tree_to_slice
> (TOP (can (have (:ARG0 mankind)
>                 (:ARG1 (noblest cause))
>                 (:AT stake)))
>      (fight (:ARG0 they)
>             (:FOR freedom))
>      (undertake (:ARG0 they)
>                 (:ARG1 (noblest cause))))
> EOF
-| TOP can have :ARG0 mankind
-| TOP can have :ARG1 noblest cause
-| TOP can have :AT stake
-| TOP fight :ARG0 they
-| TOP fight :FOR freedom
-| TOP undertake :ARG0 they
-| TOP undertake :ARG1 noblest cause
-| 

SEE ALSO

slice_to_tree(1)



tree_to_table(1)tree_to_table(1)

NAME

tree_to_table - convert trees to tabular format

SYNOPSIS

tree_to_table

DESCRIPTION

Filter to convert trees to a tabular format that orients tree structure around parts-of-speech nodes. Blank lines separate tabular trees.

OPTIONS

--overt)number only overt words
--pad)word number output should be padded with zeros
--basic)output without word numbering
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tree_to_table
> (TOP (IP-MAT (NP-SBJ (D The)
>                      (ADJ quick)
>                      (ADJ brown)
>                      (N fox))
>              (VBD jumped)
>              (PP (P over)
>                  (NP (D the)
>                      (ADJ lazy)
>                      (N dog)))
>              (. .)))
> EOF
-| (TOP(IP-MAT(NP-SBJ* D @0_0 The
-| * ADJ @0_1 quick
-| * ADJ @0_2 brown
-| *) N @0_3 fox
-| * VBD @0_4 jumped
-| (PP* P @0_5 over
-| (NP* D @0_6 the
-| * ADJ @0_7 lazy
-| *)) N @0_8 dog
-| *)) . @0_9 .
-| 

SEE ALSO

table_to_tree(1)



tree_to_tnt(1)tree_to_tnt(1)

NAME

tree_to_tnt - convert trees to TnT format

SYNOPSIS

tree_to_tnt

DESCRIPTION

Filter to convert trees to TnT format where each line contains one word token and one part-of-speech tag separated by a single tab character. EOS indicates end-of-sentence.

OPTIONS

--framenet)support for seeding framenet information
--pron)separate pronouncement information
--merge)remove EOS indicator
--clip)clip extended tag information
--all|-all)keep all terminal nodes
--number|-n)number EOS
--sent|-s)provide SENT as tag for EOS
--all)keep all terminals
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | tree_to_tnt
> ( (IP-MAT (PP (NP (N 授業))
>               (P が))
>           (NP-SBJ *が*)
>           (VB 終わる)
>           (PU 。))
>   (ID 7_textbook_kisonihongo;page_13;AT1-7;JP))
> EOF
-| 授業	N
-| が	P-ROLE
-| 終わる	VB
-| 。	PU
-| EOS

SEE ALSO

tnt_character(1)



treetee(1)treetee(1)

NAME

treetee - tee data

SYNOPSIS

treetee [OPTIONS]

DESCRIPTION

both cats and sends data to /home/glenda/tmp/tree.

OPTIONS

--pool) ## pool the data and send output one tree at a time before waiting for user interaction
*) ## show this help message


tregex(1)tregex(1)

NAME

tregex - wrapper script for stanford-tregex

SYNOPSIS

tregex PATTERN [OPTIONS] treebank_file

DESCRIPTION

Wrapper script to run stanford-tregex (http://nlp.stanford.edu/software/tregex.shtml) in tgrep command line mode.

OPTIONS

-Csuppresses printing of matches, so only the number of matches is printed.
-wcauses the whole of a tree that matches to be printed.
-fcauses the filename to be printed.
-i <filename>causes the pattern to be matched to be read from <filename> rather than the command line. Don't specify a pattern when this option is used.
-oSpecifies that each tree node can be reported only once as the root of a match (by default a node will be printed once for every way the pattern matches).
-scauses trees to be printed all on one line (by default they are pretty printed).
-ncauses the number of the tree in which the match was found to be printed before every match.
-ucauses only the label of each matching node to be printed, not complete subtrees.
-tcauses only the yield (terminal words) of the selected node to be printed (or the yield of the whole tree, if the -w option is used).
-h <node-handle>If a -h option is given, the root tree node will not be printed. Instead, for each node-handle specified, the node matched and given that handle will be printed. Multiple nodes can be printed by using the -h option multiple times on a single command line.
-hf <headfinder-class-name>use the specified {@link HeadFinder} class to determine headship relations.
-hfArg <string>pass a string argument in to the {@link HeadFinder} class's constructor. -hfArg can be used multiple times to pass in multiple arguments.
-trf <TreeReaderFactory-class-name>use the specified {@link TreeReaderFactory} class to read trees from files.
-vprint every tree that contains no matches of the specified pattern, but print no matches to the pattern.
-xInstead of the matched subtree, print the matched subtree's identifying number as defined in tgrep2:a unique identifier for the subtree and is in the form s:n, where s is an integer specifying the sentence number in the corpus (starting with 1), and n is an integer giving the order in which the node is encountered in a depth-first search starting with 1 at top node in the sentence tree.
-extract <code> <tree-file>extracts the subtree s:n specified by code from the specified tree-file. Overrides all other behavior of tregex. Can't specify multiple encodings etc. yet.
-extractFile <code-file> <tree-file>extracts every subtree specified by the subtree codes in code-file, which must appear exactly one per line, from the specified tree-file. Overrides all other behavior of tregex. Can't specify multiple encodings etc. yet.
-filtercauses this to act as a filter, reading tree input from stdin
-Tcauses all trees to be printed as processed (for debugging purposes). Otherwise only matching nodes are printed.
-encoding <charset_encoding>option allows specification of character encoding of trees.


tregex_location(1)tregex_location(1)

NAME

tregex_location - print tregex location

SYNOPSIS

tregex_location

DESCRIPTION

Send location details of tregex jar files to stdout.



tsurgeon_script(1)tsurgeon_script(1)

NAME

tsurgeon_script - pipeline tsurgeon

SYNOPSIS

tsurgeon_script script

DESCRIPTION

This wrapper script runs stanford-tregex.jar in the tsurgeon mode as a filter changing stdin. At least one tsurgeon script must be supplied.

Before tsurgeon gets to see any script content, the script content is passed through the m4 macro processor.

A tsurgeon script is a file containing a list of pattern and transformation operation list pairs. That is, it is a sequence of pairs of a Tregex pattern on one or more lines, then a blank line (empty or whitespace), then a list of transformation operations one per line to apply when the pattern is matched, and then another blank line (empty or whitespace). Note the need for blank lines: The code crashes if they are not present as separators.

The character % introduces a comment that extends to the end of the line. All other intended uses of % must be escaped as \% .

Also lines that begin .R and end .E are commented out. It is not necessary to close .R, in which case all lines are commented out from the instance of .R to the end of the given script.

OPTIONS

-*)show this help message
*)file name from where to source the tsurgeon script content

SEE ALSO

tregex(1), tsurgeon_script_animate(1)



tsurgeon_script_animate(1)tsurgeon_script_animate(1)

NAME

tsurgeon_script_animate - animate tsurgeon changes

SYNOPSIS

tsurgeon_script_animate script

DESCRIPTION

This wrapper script runs stanford-tregex.jar in the tsurgeon mode as a filter changing stdin. A tsurgeon script name must be supplied.

Before tsurgeon gets to see any script content, the script content is passed through the m4 macro processor.

Input must be in one tree per line format.

The output shows snapshot trees of the series of changes that the tsurgeon script brings about to each input tree.

The character % introduces a comment that extends to the end of the line. All other intended uses of % must be escaped as \% .

Also lines that begin .R and end .E are commented out. It is not necessary to close .R, in which case all lines are commented out from the instance of .R to the end of the given script.



yield(1)yield(1)

NAME

yield - obtain yield

SYNOPSIS

yield [OPTIONS]

DESCRIPTION

Write the yield (i.e., the words) of a tree to stdout on one line.

Requires tsurgeon_script to work.

OPTIONS

--keep)keep punctuation unaltered
--ascii)keep punctuation unaltered
--luw)keep long unit words
--all)return all terminals
-j|--nospaces)removes spaces between words.
-i|--id|--retain)retain ID information from input in CorpusSearch format
--mark)preseve marked nodes
--pos)remove word layer so parts-of-speech are returned
--name)output named entity information
--unhyphen)output named entity information
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | yield --name
> ( (IP-MAT (NP-SBJ;{PERSON} (NPR Rose))
>           (VBD rose)
>           (IP-INF-PRP (TO to)
>                       (VB put)
>                       (NP-OB1 (ADJ rose)
>                               (NS roes))
>                       (PP (P-ROLE on)
>                           (NP (NP-POS;{PERSON} (PRO her))
>                               (NS rows)
>                               (PP (P-ROLE of)
>                                   (NP (NS roses))))))
>           (PU .))
>   (ID 49_misc_BUFFALO;EN))
> EOF
-| <ENAMEX TYPE="PERSON">Rose</ENAMEX> rose to put rose roes on <ENAMEX TYPE="PERSON">her</ENAMEX> rows of roses .


Last updated: May 01, 2019