Top> Research Activities> Collaborative Research Projects> Incubation/Discovery-Type> Development of Classification Indices to Treat a Variety of Texts

Development of Classification Indices to Treat a Variety of Texts

Abbreviation:Text Classification Indices
Project Leader:KASHINO Wakako
Associate Professor in the Department of Corpus Studies, National Institute for Japanese Language and Linguistics
Research field:Japanese Linguistics
Keywords:Text classification, Writing style, Corpus

Summary

The generally available text classification indices for books are limited to NDC for genre and Japan book classification codes (C codes) for marketing targets and sales outlets, and are not sufficient for studying texts and using corpora. Therefore, the project will design and verify a classification scheme for capturing the varieties of format, content, and expression necessary for text research and for the utilization of corpora in connection with book texts.

First, an index is provided to indicate whether the text structure is a simple type (e.g., chapter and verse structure) or an atypical type (e.g., conversation, Q&A format, illustrations, a glossary, etc.). Second, an index is provided to classify mainly texts with simple structure according to the features of their content and expression: difficult or easy, stiff or relaxed, polite or chatty, written or spoken, subjective or objective, etc.
The classification indices will actually be assigned, some manually and some automatically, to the more than 10,000 text examples to be included in the "Balanced Corpus of Contemporary Written Japanese," and will be verified systematically.