| Abbreviation | : | Text Classification Indices |
|---|---|---|
| Project Leader | : | KASHINO Wakako Associate Professor in the Department of Corpus Studies, National Institute for Japanese Language and Linguistics |
| Research field | : | Japanese Linguistics |
| Keywords | : | Text classification, Writing style, Corpus |
The generally available text classification indices for books are limited to NDC for genre and Japan book classification codes (C codes) for marketing targets and sales outlets, and are not sufficient for studying texts and using corpora. Therefore, the project will design and verify a classification scheme for capturing the varieties of format, content, and expression necessary for text research and for the utilization of corpora in connection with book texts.
First, an index is provided to indicate whether the text structure is a simple type (e.g., chapter and verse structure) or an atypical type (e.g., conversation, Q&A format, illustrations, a glossary, etc.). Second, an index is provided to classify mainly texts with simple structure according to the features of their content and expression: difficult or easy, stiff or relaxed, polite or chatty, written or spoken, subjective or objective, etc.
The classification indices will actually be assigned, some manually and some automatically, to the more than 10,000 text examples to be included in the "Balanced Corpus of Contemporary Written Japanese," and will be verified systematically.