Sravana Reddy


2021

pdf bib
Detecting Extraneous Content in Podcasts
Sravana Reddy | Yongze Yu | Aasish Pappu | Aswin Sivaraman | Rezvaneh Rezapour | Rosie Jones
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of podcast summarization and show that we can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.

pdf bib
Modeling Language Usage and Listener Engagement in Podcasts
Sravana Reddy | Mariya Lazarova | Yongze Yu | Rosie Jones
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While there is an abundance of advice to podcast creators on how to speak in ways that engage their listeners, there has been little data-driven analysis of podcasts that relates linguistic style with engagement. In this paper, we investigate how various factors – vocabulary diversity, distinctiveness, emotion, and syntax, among others – correlate with engagement, based on analysis of the creators’ written descriptions and transcripts of the audio. We build models with different textual representations, and show that the identified features are highly predictive of engagement. Our analysis tests popular wisdom about stylistic elements in high-engagement podcasts, corroborating some pieces of advice and adding new perspectives on others.

2020

pdf bib
100,000 Podcasts: A Spoken English Document Corpus
Ann Clifton | Sravana Reddy | Yongze Yu | Aasish Pappu | Rezvaneh Rezapour | Hamed Bonab | Maria Eskevich | Gareth Jones | Jussi Karlgren | Ben Carterette | Rosie Jones
Proceedings of the 28th International Conference on Computational Linguistics

Podcasts are a large and growing repository of spoken audio. As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with automatic speech recognition they represent a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts. We demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.

2016

pdf bib
Obfuscating Gender in Social Media Writing
Sravana Reddy | Kevin Knight
Proceedings of the First Workshop on NLP and Computational Social Science

pdf bib
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
John DeNero | Mark Finlayson | Sravana Reddy
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2015

pdf bib
A Web Application for Automated Dialect Analysis
Sravana Reddy | James Stanford
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2012

pdf bib
G2P Conversion of Proper Names Using Word Origin Information
Sonjia Waxmonsky | Sravana Reddy
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Decoding Running Key Ciphers
Sravana Reddy | Kevin Knight
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Unsupervised Discovery of Rhyme Schemes
Sravana Reddy | Kevin Knight
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
What We Know About The Voynich Manuscript
Sravana Reddy | Kevin Knight
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2010

pdf bib
An MDL-based approach to extracting subword units for grapheme-to-phoneme conversion
Sravana Reddy | John Goldsmith
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Understanding Eggcorns
Sravana Reddy
Proceedings of the Workshop on Computational Approaches to Linguistic Creativity

pdf bib
Substring-based Transliteration with Conditional Random Fields
Sravana Reddy | Sonjia Waxmonsky
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)