The Man'yoshu97 Parsed Corpus (MYS97)

Authors: Stephen Wright Horn and Alastair Butler


The Man'yoshu is a compilation of texts containing some of the oldest attested forms of the Japanese language, Old Japanese (OJ). This corpus fragment is an example of an application of the principles developed in the Penn Historical Parsed Corpora family to the description of OJ. It further incorporates techniques in development by the following projects: Oxford-NINJAL Corpus of Old Japanese, Corpus of Historical Japanese, Keyaki Treebank, and NINJAL Parsed Corpus of Modern Japanese.

Highlights include:

The parsed data, and further results of analysis (e.g., derived indexing, word dependencies, generated semantic representations), are made accessible through a web based interface.

We invite you to:

Explore the MYS97


Text: The MYS97 contains the first 97 poems of the Man'yoshu (159 trees, 2549 words): the entirety of Book 1 and 13 poems from Book 2.

Source: The text of the corpus fragment is a transliteration of the Man'yoshu that corresponds to its recension in the Shogakkan Shinpen Zenshu edition.

Segmentation: The segmentation analysis is taken from the UniDic/Mecab morphological analyser.


Part of the research that went into the development of this corpus fragment was supported by a Hakuho Foundation Japanese Research Fellowship through the Hakuho Zaidan (Awardee; Stephen Wright Horn, 2015-2016), with the help of the National Institute for Japanese Language and Linguistics as recieving organization (Hosting Professor: Toshinobu Ogiso).