Sara Candeias


2016

pdf bib
The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation
Jorge Proença | Dirce Celorico | Sara Candeias | Carla Lopes | Fernando Perdigão
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces the LetsRead Corpus of European Portuguese read speech from 6 to 10 years old children. The motivation for the creation of this corpus stems from the inexistence of databases with recordings of reading tasks of Portuguese children with different performance levels and including all the common reading aloud disfluencies. It is also essential to develop techniques to fulfill the main objective of the LetsRead project: to automatically evaluate the reading performance of children through the analysis of reading tasks. The collected data amounts to 20 hours of speech from 284 children from private and public Portuguese schools, with each child carrying out two tasks: reading sentences and reading a list of pseudowords, both with varying levels of difficulty throughout the school grades. In this paper, the design of the reading tasks presented to children is described, as well as the collection procedure. Manually annotated data is analyzed according to disfluencies and reading performance. The considered word difficulty parameter is also confirmed to be suitable for the pseudoword reading tasks.

pdf bib
A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
Alex Becker | Fabio Kepler | Sara Candeias
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art models currently used in tools for spoken languages. There are several issues and difficulties in creating this kind of resource, and our presented tool already deals with some of them, like adequate text representation of a sign and many to many alignments between words and signs.

2015

pdf bib
Coupling Natural Language Processing and Animation Synthesis in Portuguese Sign Language Translation
Inês Almeida | Luísa Coheur | Sara Candeias
Proceedings of the Fourth Workshop on Vision and Language

pdf bib
From European Portuguese to Portuguese Sign Language
Inês Almeida | Luísa Coheur | Sara Candeias
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies

2014

pdf bib
HESITA(te) in Portuguese
Sara Candeias | Dirce Celorico | Jorge Proença | Arlindo Veiga | Carla Lopes | Fernando Perdigão
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Hesitations, so-called disfluencies, are a characteristic of spontaneous speech, playing a primary role in its structure, reflecting aspects of the language production and the management of inter-communication. In this paper we intend to present a database of hesitations in European Portuguese speech - HESITA - as a relevant base of work to study a variety of speech phenomena. Patterns of hesitations, hesitation distribution according to speaking style, and phonetic properties of the fillers are some of the characteristics we extrapolated from the HESITA database. This database also represents an important resource for improvement in synthetic speech naturalness as well as in robust acoustic modelling for automatic speech recognition. The HESITA database is the output of a project in the speech-processing field for European Portuguese held by an interdisciplinary group in intimate articulation between engineering tools and experience and the linguistic approach.

2013

pdf bib
Acoustic, Phonetic and Prosodic Features of Parkinson’s disease Speech
Jorge Proença | Arlindo Veiga | Sara Candeias | Fernando Perdigão
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

2011

pdf bib
Generating a Pronunciation Dictionary for European Portuguese Using a Joint-Sequence Model with Embedded Stress Assignment
Arlindo Veiga | Sara Candeias | Fernando Perdigão
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology