Unshared Task at LENLS 13

Theory and System analysis with FraCaS, MultiFraCaS and JSeM Test Suites

NOTE: Unshared Task attendance is FREE to all!

Workshop Site: National Institute for Japanese Language and Linguistics
10-2 Midori-cho, Tachikawa City, Tokyo, 190-8561, Japan
http://www.ninjal.ac.jp/english/utility/access

Room Number: 201

Date: November 13th, 2016

Contact Person: Alastair Butler (National Institute for Japanese Language and Linguistics (NINJAL))

Contact Email: lenls13[[at]]easychair.org

LENLS 13 website: http://www.is.ocha.ac.jp/~bekki/lenls

Unshared Task website: http://www.compling.jp/fracas_task/index.html

Introduction

This one day task focused on undertaking Theory and System analysis with FraCaS and FraCaS inspired Test Suites is to be held as part of Logic and Engineering of Natural Language Semantics 13 (LENLS 13). See http://www.is.ocha.ac.jp/~bekki/lenls/ for full information about LENLS 13, which is to take place on November 13-15, 2016. LENLS is an annual international workshop on formal syntax, semantics and pragmatics.

The FraCaS test suite was created by the FraCaS Consortium as a benchmark for measuring and comparing the competence of semantic theories and semantic processing systems. It contains inference problems that collectively demonstrate basic linguistic phenomena that a semantic theory has to account for; including quantification, plurality, anaphora, ellipsis, tense, comparatives, and propositional attitudes. Each problem has the form: there is some natural language input T, then there is a natural language claim H, giving the task to determine whether H follows from T. Problems are designed to include exactly one target phenomenon, to exclude other phenomena, and to be independent of background knowledge.

Following FraCaS, overlapping test suites are now available for a number of languages (notably in addition to the original English: Farsi, German, Greek, Japanese, and Mandarin), which together cover both universal semantic phenomena as well as language-specific phenomena. With the problem sets categorised according to the semantic phenomena they involve, it is possible to focus on obtaining results for specific phenomena (within a language or cross-linguistically), as well as strive for wide coverage.

The data

We have invited papers to apply either theoretical or computational analyses or other ideas to any of the following datasets, or subsets thereof, and describe findings:

FraCaS textual inference test suite (English)
Download machine readable version: http://www-nlp.stanford.edu/~wcmac/downloads
For the original: ftp://ftp.cogsci.ed.ac.uk/pub/FRACAS/del16.ps.gz

MultiFraCaS (Farsi, German, Greek, Mandarin)
Download: http://www.ling.gu.se/~cooper/multifracas

Japanese Semantics Test Suite (JSeM)
Download JSeM_beta.zip from: http://researchmap.jp/community-inf/JSeM

Goals

Shared tasks typically provide "gold" analysed data with clear evaluation criteria for competing systems and have become popular within NLP fields. The concept of a so-called "unshared task" is an alternative to shared tasks. In an unshared task, there are neither quantitative performance measures nor set problems that have to be solved. Instead, participants are given a common ground (e.g., data) and an open-ended prompt.

With the availability of FraCaS, MultiFraCaS and JSeM Test Suites, the aim of this unshared task is for participants to put these resources to work as the basis for inspiring analysis, e.g., for showcasing a semantic theory or semantic processing system, or syntactic annotation model for the data.

We are also interested to hear about the creation of complementary data for other languages not yet represented by the existing test suites, or with work concerning properties of the existing test suites, or with cross-linguistic comparisons using the test suites, etc.

Being an unshared task, use made of the datasets is up to the authors. Any of the data sets might serve as a benchmark for testing the approach taken (or even a computational model, for participants who go that far) and reporting success levels on the problems (if applicable).

Program

9:30 -- 10:00	Reception and coffee
10:00 -- 10:15	Opening remarks
10:15 -- 11:15	Robin Cooper (joint work with Stergios Chatzikyriakidis and Simon Dobnik), Testing the FraCaS test suite
11:15 -- 11:30	Coffee Break
11:30 -- 11:50	Daisuke Bekki, FraCaS, JSeM and the 'Inferences as Tests' paradigm
11:50 -- 12:10	Alastair Butler, Ai Kubota, Shota Hiyama and Kei Yoshimoto, Treebank annotation of FraCaS and JSeM
12:10 -- 12:30	Oleg Kiselyov, Transformational Semantics on a tree bank
12:30 -- 14:00	Lunch
14:00 -- 15:00	Tim O'Gorman, Improving AMR performance on FraCaS
15:00 -- 15:15	Coffee Break
15:15 -- 15:45	Ran Tian and Kentaro Inui, Unshared Task of Natural Language Inference
15:45 -- 16:15	Koji Mineshima (joint work with Pascual Martínez-Goméz, Ribeka Tanaka, Yusuke Miyao and Daisuke Bekki), How ccg2lambda solves FraCaS/JSeM
16:15 -- 16:45	Yusuke Kubota, FraCaS meets transformational grammar
16:45 -- 17:00	Coffee Break
17:00 -- 18:00	Masaaki Nagata, Can semantics contribute to neural machine translation?

Talk Abstracts

Testing the FraCaS test suite

Robin Cooper (joint work with Stergios Chatzikyriakidis and Simon Dobnik)
University of Gothenburg

In this talk I will present some of the background to the project FraCaS which led to the FraCaS test suite. I will also talk about the MultiFraCaS project and some of our ideas for extending this in the future. The examples in the original test suite were created by semanticists. I will discuss a number of ways in which one could go about verifying these examples. In particular, I will present some preliminary work we have been doing using web-based forms to collect judgements via crowd-sourcing. We will discuss the implications of some preliminary results that we have obtained, in particular the possibility of developing a probabilistic semantics.

Crowd-sourcing allows us to extend the notion of inference from strict logical inference to inference that is gradient and is prevalent in lexical meaning. The probabilities obtained through crowd-sourcing tell us the likelihood of a native speaker to make a particular conclusion. Eventually, we hope to extend the entire FraCas suite this way.

The original aim of the test suite was to provide a way of evaluating computational systems that perform natural language inference. I will talk about some work applying type theory to this task, with as yet partial coverage of the test suite.

The crowd-sourcing methods we have been using to evaluate the test suite can also be used to give an empirical basis to predictions made by semantic theories that are difficult to ascertain by only relying on intuitions of a single linguist. I will present some work we have been doing on the semantics of verbal restructuring, a phenomenon whose semantics has been debated and disagreed upon in the literature, and discuss the preliminary results we have.

FraCaS, JSeM and the 'Inferences as Tests' paradigm

Daisuke Bekki
Ochanomizu University/JST CREST/AIST AIRC/NII

Treebank annotation of FraCaS and JSeM

Alastair Butler, Ai Kubota, Shota Hiyama and Kei Yoshimoto
National Institute for Japanese Language and Linguistics and Tohoku University

This talk will describe treebank annotation of the FraCaS (English data) and JSeM (Japanese data) test suites, with a shared annotation scheme in the style of the Penn Historical family of corpora. Syntactic analysis is often taken to be a necessary prerequisite for building semantic analysis, and we will argue that it is helpful to cash out what are likely to be shared syntactic assumptions with gold standard trees as transformable references of analysis. We will also detail work of transforming syntactic trees into meaning representations following Treebank Semantics (Butler 2015), and explore overlaps that arise when corresponding English and Japanese data are considered together.

Transformational Semantics on a tree bank

Oleg Kiselyov
Tohoku University

Recently introduced Transformational Semantics TS formalizes, restraints and makes rigorous the transformational approach epitomized by QR and Transformational Grammars: deriving a meaning (in the form of a logical formula or a logical form) by a series of transformations from a suitably abstract (tecto-) form of a sentence. Unlike QR, each transformation in TS is rigorously and precisely defined, typed, and deterministic. The restraints of TS and the sparsity of the choice points (in the order of applying the deterministic transformation steps) make it easier to derive negative predictions and control over-generation.

The rigorous nature of TS makes it easier to carry analyses mechanically, by a computer. The current implementation takes a form of a domain-specific language embedded in Haskell. It is intended as a `semantic calculator', to interactively try various transformations, observe their results or failures. We report on the first experiments for using the calculator in the `batch mode', to process tree bank data.

Improving AMR performance on FraCaS

Tim O'Gorman
University of Colorado Boulder

Abstract Meaning Representation (Banarescu et al. 2014) is a useful formalism, in part, because of its amenability to large-scale manual semantics annotation. AMR annotates the meaning of a sentence directly (rather than over a representation of the syntax), and represents the intended meaning of a sentence in context, rather than building underspecified representations of what it means out of context. Such an approach enables quick and useful annotation of even the most ungrammatical sentences, and provides a clear representation that can be easily understood and easily parsed. However, this flexibility comes at the price of relatively weak treatments of classic semantic phenomena such as quantification, tense, and monotonicity.

This talk will illustrate how the AMR-style annotation of meaning would handle the complex issues in the FraCaS test suite. The problematic portions of the test suite will then be used to discuss how AMR semantic coverage might be improved. I will first discuss the ongoing efforts to improve the AMR treatment of quantification. Secondly, I will discuss the use of event annotation methodologies to express tense, aspect and modality. These exemplify the trade-offs and hard decisions that are faced when designing a semantic representation designed for large-scale annotation.

Unshared Task of Natural Language Inference

Ran Tian and Kentaro Inui
Tohoku University

We propose a natural language inference engine which, at its lowest level, relies on algebraic calculations of word vectors to compose meanings of phrases and exploit semantic similarities. Upon that, it can handle negation and universal quantifiers logically, with an expressivity equivalent to first-order logic and a complete inference mechanism. We plan to use the FraCaS dataset to test the inference ability of our system.

How ccg2lambda solves FraCaS/JSeM

Koji Mineshima (joint work with Pascual Martínez-Goméz, Ribeka Tanaka, Yusuke Miyao and Daisuke Bekki)
Ochanomizu University/JST CREST

I present an on-going work on developing formal compositional semantics and inference system for English and Japanese wide-coverage statistical CCG parsers. The focus of the talk is on how our system can solve linguistically challenging inference problems compiled in the FraCaS and JSeM datasets. I will introduce a pipeline ("ccg2lambda") presented in Mineshima et al. (EMNLP2015, 2016) and Martínez-Goméz et al. (ACL2016) and evaluate the current system on FraCaS and JSeM. I will also discuss how to extend the system with semantic underspecification using the idea of Dependent Type Semantics (DTS; see ESSLLI2016 lecture course: http://esslli2016.unibz.it/?page_id=216), and thereby illustrate one way in which a logic-based system solves some of the most challenging problems in FraCaS, in particular, those in the "nominal anaphora" section of the dataset.

FraCaS meets transformational grammar

Yusuke Kubota
University of Tsukuba

I aim to do two things in this talk: (i) attempt (at least some beginnings of) a meta-evaluation of the linguistic significance of the work reported in Mineshima et al. (EMNLP 2015, 2016) and (ii) try to motivate the use of a more powerful linguistic formalism than the one employed in Mineshima et al.'s work for more or less the the same task. For the latter component, I discuss possible advantages (and disadvantages) of using Hybrid Type-Logical Categorial Grammar (Hybrid TLCG) as a replacement for the CCG syntax of Mineshima et al.'s system. I will argue that the use of Hybrid TLCG is especially promising in dealing with complex linguistic phenomena (such as ellipsis) with which Mineshima et al.'s system struggles due to the inflexibility of the CCG syntax it adopts. In particular, Hybrid TLCG enables incorporating various analytic techniques introduced in the 'mainstream' transformational generative syntax and semantics over the last several decades much more straightforwardly than is practically possible with CCG. This has the further potential advantage of bringing computational linguistics and mainstream 'pencil-and-paper' theoretical linguistics much closer to each other than they currently are.

Can semantics contribute to neural machine translation?

Masaaki Nagata
NTT Communication Science Laboratories

Neural machine translation (NMT) is a recently developed translation technology which outperformed "conventional" statistical machine translation (SMT). Unlike SMT which requires syntax to overcome word order difference between distant languages such as Japanese and English, NMT is good at word reordering and it seems no linguistic theory is required. In this talk, I will illustrate by examples that it is semantics that can contribute to solve the remaining problems in NMT such as zero pronoun resolution and article generation.

Organisers

Daisuke Bekki (Ochanomizu University/JST CREST/AIST AIRC/NII)
Alastair Butler (National Institute for Japanese Language and Linguistics (NINJAL))
Ai Kubota (National Institute for Japanese Language and Linguistics (NINJAL))
Yusuke Kubota (University of Tsukuba)
Koji Mineshima (Ochanomizu University/JST CREST)

Unshared Task at LENLS 13

Theory and System analysis with FraCaS, MultiFraCaS and JSeM Test Suites

NOTE: Unshared Task attendance is FREE to all!

Robin Cooper (joint work with Stergios Chatzikyriakidis and Simon Dobnik) University of Gothenburg

Daisuke Bekki Ochanomizu University/JST CREST/AIST AIRC/NII

Alastair Butler, Ai Kubota, Shota Hiyama and Kei Yoshimoto National Institute for Japanese Language and Linguistics and Tohoku University

Oleg Kiselyov Tohoku University

Tim O'Gorman University of Colorado Boulder

Ran Tian and Kentaro Inui Tohoku University

Koji Mineshima (joint work with Pascual Martínez-Goméz, Ribeka Tanaka, Yusuke Miyao and Daisuke Bekki) Ochanomizu University/JST CREST

Yusuke Kubota University of Tsukuba

Masaaki Nagata NTT Communication Science Laboratories