Study of the History of the Japanese Language Using Statistics and Machine-Learning

Project leader:OGISO Toshinobu
Associate Professor, Department of Corpus Studies, NINJAL


With the advance of NLP technologies and the development of electronic dictionaries, morphological analysis for historical Japanese texts has now become viable. This has opened the way to applying innovative research methods such as statistical and corpus-based analysis to the field of the history of the Japanese language.

In this project, machine-learning is used to develop tools for the construction of a historical corpus of the Japanese language with a sophisticated annotation schema, and existing software is adapted to create a user interface for research. Use of these tools enables us to explore the possibilities of applying statistical methods such as multivariate analysis to a historical corpus of Japanese for the first time.

The software developed in this project will be employed in the construction of the historical corpus that is currently being planned at NINJAL.