Rosie Jones


2021

pdf bib
Detecting Extraneous Content in Podcasts
Sravana Reddy | Yongze Yu | Aasish Pappu | Aswin Sivaraman | Rezvaneh Rezapour | Rosie Jones
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of podcast summarization and show that we can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.

pdf bib
Modeling Language Usage and Listener Engagement in Podcasts
Sravana Reddy | Mariya Lazarova | Yongze Yu | Rosie Jones
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While there is an abundance of advice to podcast creators on how to speak in ways that engage their listeners, there has been little data-driven analysis of podcasts that relates linguistic style with engagement. In this paper, we investigate how various factors – vocabulary diversity, distinctiveness, emotion, and syntax, among others – correlate with engagement, based on analysis of the creators’ written descriptions and transcripts of the audio. We build models with different textual representations, and show that the identified features are highly predictive of engagement. Our analysis tests popular wisdom about stylistic elements in high-engagement podcasts, corroborating some pieces of advice and adding new perspectives on others.

pdf bib
Proceedings of the 2nd Workshop on NLP for Music and Spoken Audio (NLP4MusA)
Sergio Oramas | Elena Epure | Luis Espinosa-Anke | Rosie Jones | Massimo Quadrana | Mohamed Sordo | Kento Watanabe
Proceedings of the 2nd Workshop on NLP for Music and Spoken Audio (NLP4MusA)

2020

pdf bib
100,000 Podcasts: A Spoken English Document Corpus
Ann Clifton | Sravana Reddy | Yongze Yu | Aasish Pappu | Rezvaneh Rezapour | Hamed Bonab | Maria Eskevich | Gareth Jones | Jussi Karlgren | Ben Carterette | Rosie Jones
Proceedings of the 28th International Conference on Computational Linguistics

Podcasts are a large and growing repository of spoken audio. As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with automatic speech recognition they represent a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts. We demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.

pdf bib
Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA)
Sergio Oramas | Luis Espinosa-Anke | Elena Epure | Rosie Jones | Mohamed Sordo | Massimo Quadrana | Kento Watanabe
Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA)

2008

pdf bib
Syntactic and Semantic Structure in Web Search Queries
Rosie Jones
Proceedings of the Australasian Language Technology Association Workshop 2008

pdf bib
The Linguistic Structure of English Web-Search Queries
Cory Barr | Rosie Jones | Moira Regelson
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
N Semantic Classes are Harder than Two
Ben Carterette | Rosie Jones | Wiley Greiner | Cory Barr
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2001

pdf bib
You’re Not From ’Round Here, Are You? Naive Bayes Detection of Non-Native Utterances
Laura Mayfield Tomokiyo | Rosie Jones
Second Meeting of the North American Chapter of the Association for Computational Linguistics