Alastair Butler - Homepage

Contact details:
Faculty of Humanities and Social Sciences
Hirosaki University
1 Bunkyo-cho, Hirosaki-shi, Aomori, 036-8560 , JAPAN

email: ajb129 __AT__ hotmail __DOT__ com, or ajb129 __AT__ hirosaki-u __DOT__ ac __DOT__ jp

Thanks for visiting my homepage. This page offers a summary of my research interests, and then gathers links to my work.

Research interests

Being interested in all aspects of natural language syntax and semantics, my research (see Tools, Resources, Publications) mixes generative linguistics, dynamic semantics, and functional programming, with the aim of building models and methods which aid characterising and understanding properties of natural language. In this regard, a particular concern is to study how dependencies, formally captured with operator bindings, are managed. Typically my research includes implementing ideas with computer programs to: (i) check whether things work out, and (ii) observe consequences with scale. My ambition is to reach for more understanding of how natural language works, while also aiming to build tools of practical value, from automated methods assisting parsed corpus annotation through to systems for extracting information, preparing training data, checking language performance (both of automated systems and second language learners), and exploring extinct or low resource languages.

(More details)


The following gathers a list of tools developed for parsed corpus building as well as wider research efforts.

Treebank Semantics — automatically obtain meaning representations from utterances of natural language given as parsed expressions following treebank guidelines.

View Semantics — illustrates ways to further process results from Treebank Semantics.

HARUNIWA2 Parser — pipeline for parsing Japanese trained on data of the Keyaki Treebank.

English parser — pipeline for parsing English trained on data of the Treebank Semantics Parsed Corpus and the SUSANNETS treebank.

Treebank Utilities — assists building treebanks, by providing methods of access to parsed data, methods to (re-)process parsed data, methods to visualise parsed data, etc.


The following are corpus resources that can be viewed on the web and downloaded.

Treebank Semantics Parsed Corpus (TSPC) — contains parsed corpus annotation for a sample of English texts (literature, law, newswire, nonfiction, poetry, textbook, wikipedia, ted talks).

SUSANNETS — is a conversion of Geoffrey Sampson's SUSANNE treebank into the same format as the TSPC.

LUCYTS — is a conversion of Geoffrey Sampson's LUCY Corpus into the same scheme as the TSPC.

LOBTS — is a (growing) parsed fragment of the Lancaster-Oslo-Bergen corpus of modern English (LOB), using the same scheme as the TSPC.

Keyaki Treebank — annotates phrase structure with functional and zero information for Japanese sentences.

NINJAL Parsed Corpus of Modern Japanese (NPCMJ) — as an official product of the National Institute for Japanese Language and Linguistics, this extends and develops the Keyaki Treebank, adding lemmas and romanisation, and presents the parsed data with web based interfaces.

The Man'yoshu97 Parsed Corpus (MYS97) — provides detailed parsed annotation for the first 97 poems of the Man’yōshū, which contains the oldest attested forms of the Japanese language, Old Japanese.

The Oxford-NINJAL Corpus of Old Japanese (ONCOJ) — full corpus of Old Japanese poetic texts, including the Man’yōshū, parsed.



(More details)

Last updated: September 26, 2018