Top> Research Products> KOTONOHA

KOTONOHA

Today's linguistic studies, both basic and applied, show a marked tendency to become more and more data-oriented. There is accordingly a growing need for large and reliable corpora. NIJLA's initiative to meet this requirement is the KOTONOHA project. KOTONOHA (meaning 'word of language' in classical Japanese) is a cover term for a series of language corpora that NIJLA is compiling. As shown in the figure, there are already two component corpora of KOTONOHA that are publicly available, the Taiyo Corpus and Corpus of Spontaneous Japanese (CSJ). These corpora are designed specifically for the study of the written register of Modern Japanese, and the spoken register of present-day Japanese.

For more information about Taiyo Corpus and CSJ, see websites of "Taiyō Corpus" and "CSJ". Also shown in the upper right corner of the figure is the Balanced Corpus of Contemporary Written Japanese (BCCWJ). This is a corpus of more than 100 million words under construction, which is expected to be publicly available in 2011. Click here for more about the BCCWJ.