Co-creation of Research Infrastructure Through the Integration of Diverse Lexical Resources

Project Leader
OGISO Toshinobu (Professor, NINJAL)
Project Period
April 2022 -


Background and Purpose

The National Institute for Japanese Language and Linguistics (NINJAL) has developed various corpora of the Japanese language and conducted empirical research using them. These corpora are today indispensable for Japanese language research. NINJAL has also developed a number of sets of research data on Japanese vocabulary (lexical resources), such as "Bunrui Goihyō" (Word List by Semantic Principles) and "UniDic." These research resources are not only widely used as basic materials for academic research but also in industry.

The purpose of this project is to develop new diverse lexical resources in addition to these, and to conduct research by linking them to the corpora. This will increase the overall value of the research resources, including the corpora itself, and greatly expand the scope of research and application.


Objectives and Methods

In this project, five groups will work together to develop the following diverse vocabulary resources and conduct surveys and research utilizing them.

  • A Survey on the Use of Online Dictionary Resources by Japanese Language Learners
    Conduct a survey of learners' use of dictionary tools and develop and release open source prototypes of the basic words needed for learner dictionary tools that can help solve problems.
  • Spatial Information Addition to Language Resources
    Connect linguistic information to geospatial space by adding spatial information to language map databases, classical place name databases, and ancient dialect dictionaries. Through this, spatial information will be added to lexical resources.
  • Building Lexical Resources for Learner Dictionaries
    As data necessary for the construction of a learner's dictionary, create a vocabulary list by level according to learning objectives, and assign the information necessary for the construction of a learner's dictionary to the "Bunrui Goihyō."
  • Extended Development of the Database on Japanese Word History and Frequency
    By accumulating lexical statistics obtained from the Corpus of Historical Japanese and other sources, databases of dictionaries, linguistic maps and articles, and lexical research literature information, a database on Japanese word history and frequency will be developed and expanded to provide a comprehensive view of the history of Japanese vocabulary.
  • Development of a Japanese Function Expression Wordbank for Japanese Language Learners
    Construct and publish the "Japanese Case Particle Database," which assigns semantic roles to the case components of 1,200 Japanese verb example data, and the "Japanese Sentence Pattern Bank," which also contains 1,200 examples.
Project Structure

These five groups will work together to develop an integrated ID for lexical resources that will serve as the key to linking the entire data set and corpora and create an environment that enables the comprehensive use of NINJAL's linguistic resources. As the basis of the data, the headwords of Shogakukan Nihon Kokugo Daijiten, the largest Japanese language dictionary in Japan, will be utilized. In addition, each group will collaborate in holding symposia and tutorials, publish research results, and promote research using lexical resources and their applications.

Share This Page