Alastair Butler - Homepage

Contact details:
Faculty of Humanities and Social Sciences
Hirosaki University
1 Bunkyo-cho, Hirosaki-shi, Aomori, 036-8560 , JAPAN

email: ajb129 __AT__ hotmail __DOT__ com, or ajb129 __AT__ hirosaki-u __DOT__ ac __DOT__ jp

Alastair Butler

Thanks for visiting my homepage. This page offers a summary of my research interests, and then gathers links to my work.

Research interests

My research (see Tools, Developing Resources, Publications) mixes syntactic analysis, formal semantics, and logic/functional programming, with the aim of building models and methods which aid characterising and understanding properties of natural language. A particular concern is to study how dependencies are managed within languages. Typically my research involves implementing ideas with computer programs to observe consequences with scale. My ambition is to reach for more understanding of how natural language works, while also building tools of practical utility, from automated methods assisting annotation creation through to systems for extracting information, preparing training data and checking language performance for automated systems and language learners, and exploring extinct or low resource languages.

(More details)


Help with searching for grammatical constructions:

Studying English with Computer Parsing:

Studying English with semantic dependencies:

Help with building annotation:

The Tsurgeon program by Roger Levy and Galen Andrew is a powerful tool for manipulating bracketed trees. The following may help you get started using the tool:


The following are tools developed for parsed corpus building as well as wider research efforts.

Developing Resources

The following are corpus resources that can be viewed on the web.

Treebank Semantics Parsed Corpus (TSPC) — contains parsed corpus annotation for a sample of English texts (literature, law, newswire, nonfiction, poetry, textbooks, Wikipedia, Ted talks, etc.).


The Oxford-NINJAL Corpus of Old Japanese (ONCOJ) — full corpus of Old Japanese poetic texts, including the Man’yōshū, parsed.



    Linguistic Expressions and Semantic Processing. A Practical Approach     The Semantics of Grammatical Dependencies     The Syntax and Semantics of Split Constructions. A Comparative Study

(More details)

Legacy Resources

SUSANNETS — is a conversion of Geoffrey Sampson's SUSANNE treebank into the same format as the TSPC. Note that SUSANNETS is now a component of the TSPC.


NINJAL Parsed Corpus of Modern Japanese (NPCMJ) — as an official product of the National Institute for Japanese Language and Linguistics, this extends and develops the Keyaki Treebank, and presents the parsed data with web based interfaces.


Keyaki Treebank — annotates phrase structure with functional and zero information for Japanese sentences. Note that the Keyaki Treebank is no longer developed, with all effort being directed to improving the NPCMJ.

Keyaki Treebank

The Man'yoshu97 Parsed Corpus (MYS97) — provides detailed parsed annotation for the first 97 poems of the Man’yōshū, which contains the oldest attested forms of the Japanese language, Old Japanese. Note that the MYS97 is no longer developed, with all effort being directed to improving the ONCOJ.


Last updated: August 27, 2022