Development of All-Words WSD Systems and Construction of a Correspondence Table between WLSP (Word List by Semantic Principles) and IJD (Iwanami Japanese Dictionary) by These Systems

Project Leader: SHINNOU Hiroyuki (Ibaraki University)
Project Period: October 2016 - March 2018

Summary

Background and Purpose

The purpose of this research is the creation and publication of 1) an all-words WSD system taking senses from the Iwanami Kokugo Jiten, 2) an all words WSD system assigning ID numbers from the Bunrui-goi-hyou to senses, and 3) a correspondence chart for the Bunrui-goi-hyou and the Iwanami Kokugo Jiten. In spite of being a primitive processing step for semantic parsing, word sense disambiguation (WSD) is not being employed in actual systems. This is because ordinary WSD restricts the set of target words. When the all-words WSD system (which does not restrict the set of target words) is used, semantic parsing systems become more realistic, and this advances research in semantic parsing. In addition, while the code in the Bunrui-goi-hyou is a concept, its explanation is inadequate. By drawing correspondences with the Iwanami Kokugo Jiten, what that concept means becomes explicit. Furthermore, by drawing correspondences with the Iwanami Kokugo Jiten, portions of the Bunrui-goi-hyou which are inadequate, or conversely poritons which are more detailed than the Iwanami Kokugo Jiten can be ascertained, and from this standpoint an evaluation of the Bunrui-goi-hyou becomes possible.

Objectives and Methods

An all-words WSD system is a system which assigns senses to all the words in a text –a system which is necessary for conducting a realistic semantic parsing. In this study 2 different types of all-words WSD systems will be produced. One will take its senses from the Iwanami Kokugo Jiten and the other will take its senses from the code numbers of the Bunrui-goi-hyou. Of these systems the former will basically be constructed by teacher assisted learning using the core data from the BCCWJ annotated with senses from the Iwanami Kokugo Jiten as training data. The latter will basically be constructed using the aforementioned data annotated with senses from the Bunrui-goi-hyou. Employing these two all-words WSD, a chart giving correspondences between the Iwanami Kokugo Jiten and the Bunrui-goi-hyou will be produced. Specifically, for a word w in a sentence s, let the sense of w from the Iwanami Kokugo Jiten be g, and let the code number of w from the Bunrui-goi-hyou be h. We approximate P(g, h|s, w) as P(g|s, w)*P(h|s, w). In order to be able to estimate P(g, h|s, w), P(g|s, w) and P(h|s, w) from the two aforementioned all-words WSD, we can learn the model P(g, h) as a result. Building a correspondence chart for the Bunrui-goi-hyou and the Iwanami Kokugo Jiten based on this model, we can extend the completeness of the Bunrui-goi-hyou and conduct an evaluation of it.

Project Members

National Institute for Japanese Language and Linguistics