Alastair Butler - Homepage

Contact details:
Faculty of Humanities and Social Sciences
Hirosaki University
1 Bunkyo-cho, Hirosaki-shi, Aomori, 036-8560 , JAPAN

email: ajb129 __AT__ hotmail __DOT__ com, or ajb129 __AT__ hirosaki-u __DOT__ ac __DOT__ jp

Thanks for visiting my homepage. This page offers a summary of my research interests, and then gathers links to my work.

Research interests

My research (see Tools, Developing Resources, Publications) mixes syntactic analysis, formal semantics, and logic/functional programming, with the aim of building models and methods which aid characterising and understanding properties of natural language. A particular concern is to study how dependencies are managed within languages. Typically my research involves implementing ideas with computer programs to observe consequences with scale. My ambition is to reach for more understanding of how natural language works, while also building tools of practical utility, from automated methods assisting annotation creation through to systems for extracting information, preparing training data and checking language performance for automated systems and language learners, and exploring extinct or low resource languages.

(More details)

Teaching

Help with searching for grammatical constructions:

Help with finding constructions of Grammar and Beyond 3B in the TSPC

Studying English with Computer Parsing:

Studying English with semantic dependencies:

English Puzzle Graphs

Help with building annotation:

How to build tree annotation with a spreadsheet

The Tsurgeon program by Roger Levy and Galen Andrew is a powerful tool for manipulating bracketed trees. The following may help you get started using the tool:

a guide to using Tsurgeon from the command line, and
a wrapper script for using Tsurgeon from the command line.

Tools

The following are tools developed for parsed corpus building as well as wider research efforts.

Treebank Semantics — automatically obtain meaning representations from utterances of natural language given as parsed expressions following treebank guidelines.
HARUNIWA2 Parser — pipeline for parsing Japanese trained on data of the NPCMJ.

Developing Resources

The following are corpus resources that can be viewed on the web.

Treebank Semantics Parsed Corpus (TSPC) — contains parsed corpus annotation for a sample of English texts (literature, law, newswire, nonfiction, poetry, textbooks, Wikipedia, Ted talks, etc.).

The Oxford-NINJAL Corpus of Old Japanese (ONCOJ) — full corpus of Old Japanese poetic texts, including the Man’yōshū, parsed.

Publications

(More details)

Legacy Resources

SUSANNETS — is a conversion of Geoffrey Sampson's SUSANNE treebank into the same format as the TSPC. Note that SUSANNETS is now a component of the TSPC.

NINJAL Parsed Corpus of Modern Japanese (NPCMJ) — as an official product of the National Institute for Japanese Language and Linguistics, this extends and develops the Keyaki Treebank, and presents the parsed data with web based interfaces.

Keyaki Treebank — annotates phrase structure with functional and zero information for Japanese sentences. Note that the Keyaki Treebank is no longer developed, with all effort being directed to improving the NPCMJ.

The Man'yoshu97 Parsed Corpus (MYS97) — provides detailed parsed annotation for the first 97 poems of the Man’yōshū, which contains the oldest attested forms of the Japanese language, Old Japanese. Note that the MYS97 is no longer developed, with all effort being directed to improving the ONCOJ.

Last updated: August 27, 2022