Center for Corpus Development

Introduction

The aim of the Center for Corpus Development is to develop language resources of the Japanese language in collaboration with the Research Department, thereby enhancing the research activity of Japanese universities.

More concretely, the development of machinereadable dictionaries for automatic morphological analysis and various corpus search tools is underway, in addition to the development of language corpora.

Language corpora developed by the Center include the Corpus of Spontaneous Japanese, Taiyo Corpus, Balanced Corpus of Contemporary Written Japanese, and Corpus of Historical Japanese. In addition to these, the 20 billion-word-sized NINJAL Web Japanese Corpus will soon be released.

Language resources other than language corpora include UniDic (machine-readable dictionary for automatic morphological analysis), and online corpus search tools such as Shonagon and Chunagon.

These language resources as a whole comprise an important infrastructure for the study of the Japanese language; more than 1400 research articles made mentions to the resources as of February 2016.

An important objective of the activity in the coming years is the development of a new online environment that enables the collective search of multiple language corpora. In addition to the above-mentioned corpora new corpora that will be developed by the Research Department —including corpora of learners’ Japanese, dialect speech, conversational speech, and so forth— will also be included in the target corpora of the collective search.

Researchers List

Alphabetical Order, : Additional Post

Research and Education Staff

Temporary Researcher

  • KATO Sachi
    Adjunct Researcher
  • KONDO Asuko
    Adjunct Researcher
  • NISHIKAWA Kenya
    Adjunct Researcher
  • OMURA Mai
    Adjunct Researcher
  • WATANABE Michiko
    Adjunct Researcher