Download Manual pages More Tools

Welcome to

Tree Search Tools

What?

These tools provide ways to process trees for alpinocorpus, dact, PaQu, TGrep2, Tregex, CorpusSearch, TigerSearch, and Salto.

The tools are distributed freely.

How?

The tools are all command line scripts, written with sed, bash, gawk (with XML library extension), xsltproc, Python (with NLTK and lxml), and munge-trees.

Feedback

Feedback is extremely welcome. Please email: ajb129 __AT__ hotmail __DOT__ com.



Manual pages

    add_zero(1)modify parse
    alpino_to_dot(1)transfrorm Alpino XML parse
    alpino_to_html(1)render Alpino XML parse
    alpino_to_parse(1)transfrorm parse
    alpino_to_pattern(1)create search pattern
    alpino_to_svg(1)output SVG image
    alpino_to_tiger(1)transfrorm Alpino XML to TigerXML
    parse_to_alpino(1)transform to Alpino XML format
    tgrep_to_xpath(1)transform TGrep to XPath
    tiger_to_dot(1)transfrorm TIGER XML parse
    tree_clip_fragment(1)prune tree structure


add_zero(1)add_zero(1)

NAME

add_zero - modify parse

SYNOPSIS

add_zero [OPTIONS]

DESCRIPTION

Filter to modify a parse by adding ZERO as a node to dominate all null elements.

OPTIONS

--word)add WORD to all non ZERO terminals
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | add_zero | munge-trees -p
> ( (IP-MAT (NP-SBJ *speaker*)
>           (PP-OB1 (NP (IP-REL (NP-OB1 *T*)
>                               (NP-SBJ *pro*)
>                               (VB 落とし)
>                               (AXD た))
>                       (N お金))
>                   (P-ROLE を))
>           (VB 拾い)
>           (AX まし)
>           (AXD た)
>           (PU 。))
>   (ID 853_textbook_particles;o_page_170;AT68-2;JP))
> EOF
-| ( (IP-MAT (NP-SBJ (ZERO *speaker*))
-|           (PP-OB1 (NP (IP-REL (NP-OB1 (ZERO *T*))
-|                               (NP-SBJ (ZERO *pro*))
-|                               (VB 落とし)
-|                               (AXD た))
-|                       (N お金))
-|                   (P-ROLE を))
-|           (VB 拾い)
-|           (AX まし)
-|           (AXD た)
-|           (PU 。))
-|   (ID 853_textbook_particles;o_page_170;AT68-2;JP))


alpino_to_dot(1)alpino_to_dot(1)

NAME

alpino_to_dot - transfrorm Alpino XML parse

SYNOPSIS

alpino_to_dot [OPTIONS]

DESCRIPTION

Filter to transform Alpino XML parsed data from stdin into a graphviz dot graph drawing script following the tree drawing method of PaQu.

OPTIONS

--lower)use lower case for nodes
--pron)use pron information for terminals
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | alpino_to_dot
> <alpino_ds id="23_BUFFALO;EN" version="1.3">
>   <node cat="ip-mat" id="1" begin="0" end="4">
>     <node cat="np-sbj" id="2" begin="0" end="2">
>       <node pt="d" word="A" id="3" begin="0" end="1"/>
>       <node pt="n" word="cat" id="4" begin="1" end="2"/>
>     </node>
>     <node pt="vbp" word="enters" id="5" begin="2" end="3"/>
>     <node pt="." word="." id="6" begin="3" end="4"/>
>   </node>
>   <sentence>A cat enters .</sentence>
> </alpino_ds>
> EOF
-| strict graph gr {
-|   dpi="65"
-|   ranksep=".4 equally"
-|   nodesep=.05
-|   ordering=out
-|   splines=polyline
-|   node [shape=plaintext, height=0, width=0, fontsize=12, fontname="Helvetica"];
-|   n1 [label="IP-MAT", tooltip="<table class=\"attr\"><tr><td class=\"lbl\">id<td>1<tr><td class=\"lbl\">begin<td>0<tr><td class=\"lbl\">end<td>4</table>"];
-|   n2 [label="NP-SBJ", tooltip="<table class=\"attr\"><tr><td class=\"lbl\">id<td>2<tr><td class=\"lbl\">begin<td>0<tr><td class=\"lbl\">end<td>2</table>"];
-|   n3 [label="D", tooltip="<table class=\"attr\"><tr><td class=\"lbl\">id<td>3<tr><td class=\"lbl\">begin<td>0<tr><td class=\"lbl\">end<td>1</table>"];
-|   n4 [label="N", tooltip="<table class=\"attr\"><tr><td class=\"lbl\">id<td>4<tr><td class=\"lbl\">begin<td>1<tr><td class=\"lbl\">end<td>2</table>"];
-|   n5 [label="VBP", tooltip="<table class=\"attr\"><tr><td class=\"lbl\">id<td>5<tr><td class=\"lbl\">begin<td>2<tr><td class=\"lbl\">end<td>3</table>"];
-|   n6 [label=".", tooltip="<table class=\"attr\"><tr><td class=\"lbl\">id<td>6<tr><td class=\"lbl\">begin<td>3<tr><td class=\"lbl\">end<td>4</table>"];
-|   node [fontname="Helvetica", shape=box, color="#d3d3d3", style=filled];
-|   t3 [label="A", tooltip="<table class=\"attr\"></table>"];
-|   t4 [label="cat", tooltip="<table class=\"attr\"></table>"];
-|   t5 [label="enters", tooltip="<table class=\"attr\"></table>"];
-|   t6 [label=".", tooltip="<table class=\"attr\"></table>"];
-|   { rank=same; t3 t4 t5 t6 }
-|   edge [penwidth="1.5", sametail=true, color="#000000"];
-|   n1 -- n2;
-|   n1 -- n5;
-|   n1 -- n6;
-|   n2 -- n3;
-|   n2 -- n4;
-|   n3 -- t3;
-|   n4 -- t4;
-|   n5 -- t5;
-|   n6 -- t6;
-| }
-| 

The above rendered with "dot -Tsvg" gives:

gr n1 IP-MAT n2 NP-SBJ n1--n2 n5 VBP n1--n5 n6 . n1--n6 n3 D n2--n3 n4 N n2--n4 t3 A n3--t3 t4 cat n4--t4 t5 enters n5--t5 t6 . n6--t6

SEE ALSO

alpino_to_html(1), parse_to_alpino(1)



alpino_to_html(1)alpino_to_html(1)

NAME

alpino_to_html - render Alpino XML parse

SYNOPSIS

alpino_to_html [OPTIONS]

DESCRIPTION

Filter to transform Alpino XML parsed data from stdin into svg rendered trees and included in a html page that is sent to stdout. Also produces auxiliary files tooltip.css and tooltip.js needed for successfully renderin the html output.

OPTIONS

--dot|--graphviz)draw tree with graphviz
--mouseover)introduce mouseover instructions
--css|--js)make css and js files
--lower)nodes with lower case
--yield)include yield
--search)include search information
--id)include id information as h3 level heading
*)show this help message

SEE ALSO

alpino_to_dot(1), parse_to_alpino(1)



alpino_to_parse(1)alpino_to_parse(1)

NAME

alpino_to_parse - transfrorm parse

SYNOPSIS

alpino_to_parse [OPTIONS]

DESCRIPTION

Filter to transform a parse from Alpino XML format into Penn bracketed format.

OPTIONS

--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | alpino_to_parse
> <alpino_ds id="23_BUFFALO;EN" version="1.3">
>   <node cat="ip-mat" id="1" begin="0" end="4">
>     <node cat="np-sbj" id="2" begin="0" end="2">
>       <node pt="d" word="A" id="3" begin="0" end="1"/>
>       <node pt="n" word="cat" id="4" begin="1" end="2"/>
>     </node>
>     <node pt="vbp" word="enters" id="5" begin="2" end="3"/>
>     <node pt="." word="." id="6" begin="3" end="4"/>
>   </node>
>   <sentence>A cat enters .</sentence>
> </alpino_ds>
> EOF
-| ( (IP-MAT (NP-SBJ (D A) (N cat)) (VBP enters) (. .)) (ID 23_BUFFALO;EN))

SEE ALSO

parse_to_alpino(1)



alpino_to_pattern(1)alpino_to_pattern(1)

NAME

alpino_to_pattern - create search pattern

SYNOPSIS

alpino_to_pattern [OPTIONS]

DESCRIPTION

Filter to take an Alpino XML parsed tree from stdin and create a search pattern that is able to find the tree. Options allow changing the language in which patterns are written. The default is Alpino XML xpath.

OPTIONS

--xpath)create tregex patterns
--tregex)create tregex patterns
--tgrep2)create tgrep2 patterns
--tgreplite)create tgrep2 patterns
--csearch)create CorpusSearch patterns
--tiger)create TIGERSearch patterns
--example)show examples
*)show this help message

EXAMPLES

$ cat << EOF | alpino_to_pattern
> <node cat="ip-mat" id="1" begin="0" end="4">
>   <node cat="advp" id="2" begin="0" end="1">
>     <node pt="adv" word="__" id="3" begin="0" end="1"/>
>   </node>
>   <node cat="advp" id="4" begin="1" end="2">
>     <node pt="adv" word="__" id="5" begin="1" end="2"/>
>   </node>
>   <node pt="ax" word="で" id="6" begin="2" end="3"/>
>   <node pt="vb2" word="__" id="7" begin="3" end="4"/>
>   <sentence>__ __ で __</sentence>
> </node>
> EOF
-| //node[matches(@cat,'(^|\W)ip-mat($|\W)') and node[matches(@cat,'(^|\W)advp($|\W)') and node[matches(@pt,'(^|\W)adv($|\W)')] and number(@end) <= ../node[matches(@cat,'(^|\W)advp($|\W)') and node[matches(@pt,'(^|\W)adv($|\W)')] and number(@end) <= ../node[matches(@pt,'(^|\W)ax($|\W)') and @word='で' and number(@end) <= ../node[matches(@pt,'(^|\W)vb2($|\W)')]/number(@begin)]/number(@begin)]/number(@begin)] and node[matches(@cat,'(^|\W)advp($|\W)') and node[matches(@pt,'(^|\W)adv($|\W)')] and number(@end) <= ../node[matches(@pt,'(^|\W)ax($|\W)') and @word='で' and number(@end) <= ../node[matches(@pt,'(^|\W)vb2($|\W)')]/number(@begin)]/number(@begin)] and node[matches(@pt,'(^|\W)ax($|\W)') and @word='で' and number(@end) <= ../node[matches(@pt,'(^|\W)vb2($|\W)')]/number(@begin)] and node[matches(@pt,'(^|\W)vb2($|\W)')]]
$ cat << EOF | alpino_to_pattern --tregex
> <node cat="ip-mat" id="1" begin="0" end="4">
>   <node cat="advp" id="2" begin="0" end="1">
>     <node pt="adv" word="__" id="3" begin="0" end="1"/>
>   </node>
>   <node cat="advp" id="4" begin="1" end="2">
>     <node pt="adv" word="__" id="5" begin="1" end="2"/>
>   </node>
>   <node pt="ax" word="で" id="6" begin="2" end="3"/>
>   <node pt="vb2" word="__" id="7" begin="3" end="4"/>
>   <sentence>__ __ で __</sentence>
> </node>
> EOF
-| /IP-MAT\b/ < (/ADVP\b/ < (/ADV\b/ < __) $.. (/ADVP\b/ < (/ADV\b/ < __) $.. (/AX\b/ < /で\b/ $.. (/VB2\b/ < __))))
$ cat << EOF | alpino_to_pattern --csearch
> <node cat="ip-mat" id="1" begin="0" end="4">
>   <node cat="advp" id="2" begin="0" end="1">
>     <node pt="adv" word="__" id="3" begin="0" end="1"/>
>   </node>
>   <node cat="advp" id="4" begin="1" end="2">
>     <node pt="adv" word="__" id="5" begin="1" end="2"/>
>   </node>
>   <node pt="ax" word="で" id="6" begin="2" end="3"/>
>   <node pt="vb2" word="__" id="7" begin="3" end="4"/>
>   <sentence>__ __ で __</sentence>
> </node>
> EOF
-| query: ([1]IP-MAT* exists)
-|    AND ([1]IP-MAT* iDominates [2]ADVP*)
-|    AND ([2]ADVP* iDominates [3]ADV*)
-|    AND ([3]ADV* iDominates .*)
-|    AND ([2]ADVP* Precedes [4]ADVP*)
-|    AND ([4]ADVP* iDominates [5]ADV*)
-|    AND ([5]ADV* iDominates .*)
-|    AND ([4]ADVP* Precedes [6]AX*)
-|    AND ([6]AX* iDominates で)
-|    AND ([6]AX* Precedes [7]VB2*)
-|    AND ([7]VB2* iDominates .*)


alpino_to_svg(1)alpino_to_svg(1)

NAME

alpino_to_svg - output SVG image

SYNOPSIS

alpino_to_svg [OPTIONS]

DESCRIPTION

Filter to output an SVG image of an input tree.

OPTIONS

--html)embeddable into html
--line*)add lines to output
*)show this help message


alpino_to_tiger(1)alpino_to_tiger(1)

NAME

alpino_to_tiger - transfrorm Alpino XML to TigerXML

SYNOPSIS

alpino_to_tiger [OPTIONS]

DESCRIPTION

Filter to transform Alpino XML parsed data from stdin into TigerXML parsed data.

OPTIONS

--name)use given name
--addhead)add head to headless output
--headless)go headless
--example)show an example
*)show this help message

EXAMPLE

$ cat << EOF | alpino_to_tiger
> <alpino_ds id="23_BUFFALO;EN" version="1.3">
>   <node cat="ip-mat" id="1" begin="0" end="4">
>     <node cat="np-sbj" id="2" begin="0" end="2">
>       <node pt="d" word="A" id="3" begin="0" end="1"/>
>       <node pt="n" word="cat" id="4" begin="1" end="2"/>
>     </node>
>     <node pt="vbp" word="enters" id="5" begin="2" end="3"/>
>     <node pt="." word="." id="6" begin="3" end="4"/>
>   </node>
>   <sentence>A cat enters .</sentence>
> </alpino_ds>
> EOF
-| <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-| <corpus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./schema/TigerXML.xsd" id="BUFFALO">
-|   <head>
-|     <meta>
-|       <name>BUFFALO</name>
-|       <format>TigerXML</format>
-|     </meta>
-|     <annotation>
-|       <feature name="word" domain="T">
-|         <value name="."/>
-|         <value name="A"/>
-|         <value name="cat"/>
-|         <value name="enters"/>
-|       </feature>
-|       <feature name="pos" domain="T">
-|         <value name=".">.</value>
-|         <value name="D">D</value>
-|         <value name="N">noun</value>
-|         <value name="VBP">VBP</value>
-|       </feature>
-|       <feature name="cat" domain="NT">
-|         <value name="IP" />
-|         <value name="NP" />
-|         <value name="TOP" />
-|       </feature>
-|       <edgelabel>
-|         <value name="--" />
-|         <value name="MAT" />
-|         <value name="SBJ" />
-|       </edgelabel>
-|       <secedgelabel>
-|         <value name="--" />
-|       </secedgelabel>
-|     </annotation>
-|   </head>
-|   <body>
-|     <s id="23_BUFFALO">
-|       <graph root="23_BUFFALO_0">
-|         <terminals>
-|           <t id="23_BUFFALO_3" word="A" pos="D" />
-|           <t id="23_BUFFALO_4" word="cat" pos="N" />
-|           <t id="23_BUFFALO_5" word="enters" pos="VBP" />
-|           <t id="23_BUFFALO_6" word="." pos="." />
-|         </terminals>
-|         <nonterminals>
-|           <nt id="23_BUFFALO_0" cat="TOP">
-|             <edge idref="23_BUFFALO_1" label="MAT" />
-|           </nt>
-|           <nt id="23_BUFFALO_2" cat="NP">
-|             <edge idref="23_BUFFALO_3" label="--" />
-|             <edge idref="23_BUFFALO_4" label="--" />
-|           </nt>
-|           <nt id="23_BUFFALO_1" cat="IP">
-|             <edge idref="23_BUFFALO_2" label="SBJ" />
-|             <edge idref="23_BUFFALO_5" label="--" />
-|             <edge idref="23_BUFFALO_6" label="--" />
-|           </nt>
-|         </nonterminals>
-|       </graph>
-|     </s>
-|   </body>
-| </corpus>

SEE ALSO

alpino_to_parse(1), parse_to_alpino(1)



parse_to_alpino(1)parse_to_alpino(1)

NAME

parse_to_alpino - transform to Alpino XML format

SYNOPSIS

parse_to_alpino [OPTIONS]

DESCRIPTION

Filter to take a Penn parsed tree from stdin and change to Alpino XML format.

OPTIONS

--raw)without extra post processing
--example)show examples
*)show this help message

EXAMPLE

$ cat << EOF | parse_to_alpino
> (IP (ADVP (ADV __)) (ADVP (ADV __)) (AX で) (VB2 __))
> EOF
-| 
-| <alpino_ds id="_" version="1.3">
-|   <node cat="ip" id="1" begin="0" end="4">
-|     <node cat="advp" id="2" begin="0" end="1">
-|       <node pt="adv" id="3" begin="0" end="1">
-|       </node>
-|     </node>
-|     <node cat="advp" id="4" begin="1" end="2">
-|       <node pt="adv" id="5" begin="1" end="2">
-|       </node>
-|     </node>
-|     <node pt="ax" word="で" id="6" begin="2" end="3">
-|     </node>
-|     <node pt="vb2" id="7" begin="3" end="4">
-|     </node>
-|   </node>
-|   <sentence>__ __ で __</sentence>
-| </alpino_ds>
-| 


tgrep_to_xpath(1)tgrep_to_xpath(1)

NAME

tgrep_to_xpath - transform TGrep to XPath

SYNOPSIS

tgrep_to_xpath [OPTIONS]

DESCRIPTION

Filter to take a TGrep search expression and output an XPath search expression for Alpino XML format.

OPTIONS

--debug)print any error messages on stdout
--example)show examples
*)show this help message

EXAMPLE

$ cat << EOF | tgrep_to_xpath
> PP < /IP/ > IP
> EOF
-| //node[self::node[@word='PP'] and node[matches(@word,'IP')] and parent::node[self::node[@word='IP']]]


tiger_to_dot(1)tiger_to_dot(1)

NAME

tiger_to_dot - transfrorm TIGER XML parse

SYNOPSIS

tiger_to_dot [OPTIONS]

DESCRIPTION

Filter to transform TIGER XML parsed data from stdin into a graphviz dot graph drawing script following the tree drawing method of PaQu.

OPTIONS

*)show this help message

SEE ALSO

alpino_to_tiger(1)



tree_clip_fragment(1)tree_clip_fragment(1)

NAME

tree_clip_fragment - prune tree structure

SYNOPSIS

tree_clip_fragment [OPTIONS]

DESCRIPTION

Filter to prune structure from a parsed tree in Penn bracketed format.

OPTIONS

--left)push negated items to the left
--right)push negated items to the right
--example)show examples
*)show this help message

EXAMPLE

$ cat << EOF | tree_clip_fragment
> ( (IP-MAT (PP (NP (IP-EMB (VB __)) (N 日々)))) (ID example;JP))
> EOF
-| ( (NP (IP-EMB (VB __)) (N 日々)) (ID example;JP))


Last updated: April 15, 2018