The Keyaki Treebank
The Keyaki Treebank is a parsed corpus that aims to instantiate a
coherent descriptive grammar of the Japanese language,
allowing searches for a wide variety of grammatical phenomena.
Bracket search accepts as a search term a tree specified with traditional Penn Treebank “bracketed notation”. Results will be trees that contain the input reference tree, which the results page will graphically present. While the reference tree might be a full tree, entering a partial tree is also possible, where the partial tree may, but need not, include terminal nodes.
As an introductory example, consider the query:
This will return trees where the noun “人” is inside a noun phrase. Note that the search term will have matched noun phrases that contain elements in addition to “(N 人)”, as well as noun phrases with labels that contain tag extensions, e.g., NP-SBJ, NP-PRD, etc.
Now consider the query:
This will return trees where the noun “人” is inside a noun phrase and followed by a relative clause of any kind, which is captured because of the wild card presence (“__”).
As a final example, consider the query:
This query consists of a partial tree with no commitment for how the tree terminates, and so will find trees where an IP dominates a PP that in turn dominates an IP.
The following tag prefixes can be added to change the search behaviour:
Questions or comments? Please click here to raise a github issue.