| Abbreviation | : | Vocabulary and Sentence Structure |
|---|---|---|
| Project Leader | : | YAMAZAKI Makoto Associate Professor in the Department of Corpus Studies, National Institute for Japanese Language and Linguistics |
| Research field | : | Japanese Linguistics |
| Keywords | : | Time-series distribution of vocabulary, Sentence structure, Cohesion |
Conventional research on vocabulary has considered vocabulary to be static in set-theoretic terms. For example, a frequency table is created for an item under investigation, or the percentage distribution of word types or part-of-speech categories is calculated. Such analyses have not used the concept of a time axis in texts.
However, because each of the individual words composing the vocabulary is used in its own contexts, it is possible to do dynamic lexicography by targeting actual usage. In other words, lexicography can be based on the time-series distribution of vocabulary in texts.
As an example, this study is conducting a quantitative analysis of the dynamic vocabulary formed during the text production process from the viewpoint of sentence structure. The data to be used include a collection of complete texts in the "Balanced Corpus of Contemporary Written Japanese," which is being built at the Center for Corpus Development. Using these texts, the study will explore the development of a method of visually describing vocabulary distribution, and the relationship between the frequency of a word and the circumstances in which it appears--especially the relationship between sentence structure and the circumstances in which a word (independent word or function word) appears. The study will also investigate and analyze the correlations with characteristics (expressive intention, genre, writing style, etc.) of the texts under investigation to illuminate the sentence formation function inherent in vocabulary.