Study of the History of the Japanese Language Using Statistics and Machine-Learning

Project leader	:	OGISO Toshinobu Associate Professor, Department of Corpus Studies, NINJAL
Project Period	:	November 2010 - October 2013

Summary

With the advance of NLP technologies and the development of electronic dictionaries, morphological analysis for historical Japanese texts has now become viable. This has opened the way to applying innovative research methods such as statistical and corpus-based analysis to the field of the history of the Japanese language.

In this project, machine-learning is used to develop tools for the construction of a historical corpus of the Japanese language with a sophisticated annotation schema, and existing software is adapted to create a user interface for research. Use of these tools enables us to explore the possibilities of applying statistical methods such as multivariate analysis to a historical corpus of Japanese for the first time.

The software developed in this project will be employed in the construction of the historical corpus that is currently being planned at NINJAL.

National Institute for Japanese Language and Linguistics

Study of the History of the Japanese Language Using Statistics and Machine-Learning

Summary

Share This Page