Jackson Liscombe

2010

pdf bib
How to Drink from a Fire Hose: One Person Can Annoscribe One Million Utterances in One Month
David Suendermann | Jackson Liscombe | Roberto Pieraccini
Proceedings of the SIGDIAL 2010 Conference

pdf bib abs
WITcHCRafT: A Workbench for Intelligent exploraTion of Human ComputeR conversaTions
Alexander Schmitt | Gregor Bertrand | Tobias Heinroth | Wolfgang Minker | Jackson Liscombe
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present Witchcraft, an open-source framework for the evaluation of prediction models for spoken dialogue systems based on interaction logs and audio recordings. The use of Witchcraft is two fold: first, it provides an adaptable user interface to easily manage and browse thousands of logged dialogues (e.g. calls). Second, with help of the underlying models and the connected machine learning framework RapidMiner the workbench is able to display at each dialogue turn the probability of the task being completed based on the dialogue history. It estimates the emotional state, gender and age of the user. While browsing through a logged conversation, the user can directly observe the prediction result of the models at each dialogue step. By that, Witchcraft allows for spotting problematic dialogue situations and demonstrates where the current system and the prediction models have design flaws. Witchcraft will be made publically available to the community and will be deployed as open-source project.

pdf bib abs
The Influence of the Utterance Length on the Recognition of Aged Voices
Alexander Schmitt | Tim Polzehl | Wolfgang Minker | Jackson Liscombe
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper addresses the recognition of elderly callers based on short and narrow-band utterances, which are typical for Interactive Voice Response (IVR) systems. Our study is based on 2308 short utterances from a deployed IVR application. We show that features such as speaking rate, jitter and shimmer that are considered as most meaningful ones for determining elderly users underperform when used in the IVR context while pitch and intensity features seem to gain importance. We further demonstrate the influence of the utterance length on the classifiers performance: for both humans and classifier, the distinction between aged and non-aged voices becomes increasingly difficult the shorter the utterances get. Our setup based on a Support Vector Machine (SVM) with linear kernel reaches a comparably poor performance of 58% accuracy, which can be attributed to an average utterance length of only 1.6 seconds. The automatic distinction between aged and non-aged utterances drops to random when the utterance length falls below 1.2 seconds.

Jackson Liscombe

2010

2009

2007

2006

Co-authors

Venues