The Man'yoshu97 Parsed Corpus (MYS97)

Authors: Stephen Wright Horn and Alastair Butler

Introduction

The Man'yoshu is a compilation of texts containing some of the oldest attested forms of the Japanese language, Old Japanese (OJ). This corpus fragment is an example of an application of the principles developed in the Penn Historical Parsed Corpora family to the description of OJ. It further incorporates techniques in development by the following projects: Oxford-NINJAL Corpus of Old Japanese, Corpus of Historical Japanese, Keyaki Treebank, and NINJAL Parsed Corpus of Modern Japanese.

Highlights include:

The parsed data, and further results of analysis (e.g., derived indexing, word dependencies, generated semantic representations), are made accessible through a web based interface.

We invite you to:

Explore and download the MYS97

Data

Text: The MYS97 contains the first 97 poems of the Man'yoshu (159 trees, 2549 words): the entirety of Book 1 and 13 poems from Book 2.

Source: The text of the corpus fragment is a transliteration of the Man'yoshu that corresponds to its recension in the Shogakkan Shinpen Zenshu edition.

Segmentation: The segmentation analysis is taken from the UniDic/Mecab morphological analyser.

Acknowledgements

Part of the research that went into the development of this corpus fragment was supported by a Hakuho Foundation Japanese Research Fellowship through the Hakuho Zaidan (Awardee; Stephen Wright Horn, 2015-2016), with the help of the National Institute for Japanese Language and Linguistics as recieving organization (Hosting Professor: Toshinobu Ogiso).