Julie Hunter


2023

pdf bib
FREDSum: A Dialogue Summarization Corpus for French Political Debates
Virgile Rennard | Guokan Shang | Damien Grari | Julie Hunter | Michalis Vazirgiannis
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent advances in deep learning, and especially the invention of encoder-decoder architectures, have significantly improved the performance of abstractive summarization systems. While the majority of research has focused on written documents, we have observed an increasing interest in the summarization of dialogues and multi-party conversations over the past few years. In this paper, we present a dataset of French political debates for the purpose of enhancing resources for multi-lingual dialogue summarization. Our dataset consists of manually transcribed and annotated political debates, covering a range of topics and perspectives. We highlight the importance of high-quality transcription and annotations for training accurate and effective dialogue summarization models, and emphasize the need for multilingual resources to support dialogue summarization in non-English languages. We also provide baseline experiments using state-of-the-art methods, and encourage further research in this area to advance the field of dialogue summarization. Our dataset will be made publicly available for use by the research community, enabling further advances in multilingual dialogue summarization.

pdf bib
Limits for learning with language models
Nicholas Asher | Swarnadeep Bhar | Akshay Chaturvedi | Julie Hunter | Soumya Paul
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

With the advent of large language models (LLMs), the trend in NLP has been to train LLMs on vast amounts of data to solve diverse language understanding and generation tasks. The list of LLM successes is long and varied. Nevertheless, several recent papers provide empirical evidence that LLMs fail to capture important aspects of linguistic meaning. Focusing on universal quantification, we provide a theoretical foundation for these empirical findings by proving that LLMs cannot learn certain fundamental semantic properties including semantic entailment and consistency as they are defined in formal semantics. More generally, we show that LLMs are unable to learn concepts beyond the first level of the Borel Hierarchy, which imposes severe limits on the ability of LMs, both large and small, to capture many aspects of linguistic meaning. This means that LLMs will operate without formal guarantees on tasks that require entailments and deep linguistic understanding.

pdf bib
A simple but effective model for attachment in discourse parsing with multi-task learning for relation labeling
Zineb Bennis | Julie Hunter | Nicholas Asher
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

In this paper, we present a discourse parsing model for conversation trained on the STAC. We fine-tune a BERT-based model to encode pairs of discourse units and use a simple linear layer to predict discourse attachments. We then exploit a multi-task setting to predict relation labels. The multitask approach effectively aids in the difficult task of relation type prediction; our f1 score of 57 surpasses the state of the art with no loss in performance for attachment, confirming the intuitive interdependence of these two tasks. Our method also improves over previous discourse parsing models in allowing longer input sizes and in permitting attachments in which one node has multiple parents, an important feature of multiparty conversation.

pdf bib
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
Laurent Prevot | Julie Hunter | Philippe Muller
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

While discourse parsing has made considerable progress in recent years, discourse segmentation of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labelling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labelled data is enough to obtain good results (although significantly lower than in the first experiment using all the annotated data available).

pdf bib
Abstractive Meeting Summarization: A Survey
Virgile Rennard | Guokan Shang | Julie Hunter | Michalis Vazirgiannis
Transactions of the Association for Computational Linguistics, Volume 11

A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls. Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved language generation systems, opening the door to improved forms of abstractive summarization—a form of summarization particularly well-suited for multi-party conversation. In this paper, we provide an overview of the challenges raised by the task of abstractive meeting summarization and of the data sets, models, and evaluation metrics that have been used to tackle the problems.

2021

pdf bib
Weakly supervised discourse segmentation for multiparty oral conversations
Lila Gravellier | Julie Hunter | Philippe Muller | Thomas Pellegrini | Isabelle Ferrané
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Discourse segmentation, the first step of discourse analysis, has been shown to improve results for text summarization, translation and other NLP tasks. While segmentation models for written text tend to perform well, they are not directly applicable to spontaneous, oral conversation, which has linguistic features foreign to written text. Segmentation is less studied for this type of language, where annotated data is scarce, and existing corpora more heterogeneous. We develop a weak supervision approach to adapt, using minimal annotation, a state of the art discourse segmenter trained on written text to French conversation transcripts. Supervision is given by a latent model bootstrapped by manually defined heuristic rules that use linguistic and acoustic information. The resulting model improves the original segmenter, especially in contexts where information on speaker turns is lacking or noisy, gaining up to 13% in F-score. Evaluation is performed on data like those used to define our heuristic rules, but also on transcripts from two other corpora.

2017

pdf bib
Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication
Nicholas Asher | Julie Hunter | Alex Lascarides
Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication

2016

pdf bib
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Nicholas Asher | Julie Hunter | Mathieu Morey | Benamara Farah | Stergos Afantenos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main goal of the STAC project is to study the discourse structure of multi-party dialogues in order to understand the linguistic strategies adopted by interlocutors to achieve their conversational goals, especially when these goals are opposed. The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides full discourse structures for multi-party dialogues. It has other remarkable features that make it an interesting resource for other topics: interleaved threads, creative language, and interactions between linguistic and extra-linguistic contexts.

2015

pdf bib
Integrating Non-Linguistic Events into Discourse Structure
Julie Hunter | Nicholas Asher | Alex Lascarides
Proceedings of the 11th International Conference on Computational Semantics

2014

pdf bib
Because We Say So
Julie Hunter | Laurence Danlos
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)