Emanuela Cresti


2004

pdf bib
The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages
Emanuela Cresti | Fernanda Bacelar do Nascimento | Antonio Moreno Sandoval | Jean Veronis | Philippe Martin | Khalid Choukri
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The C-ORAL-ROM project has delivered a multilingual corpus of spontaneous speech for the main romance languages (Italian, French, Portuguese and Spanish). The collection aims to represent the variety of speech acts performed in everyday language and to enable the description of prosodic and syntactic structures in the four romance languages. Sampling criteria are defined in a corpus design scheme. C-ORAL-ROM adopts two different sampling strategies, one for the formal and one for the informal part: While a set of typical domains of application is selected to document the formal use of language, the informal part documents speech variation using parameters referring to the event’s structure (dialogue vs. monologue) and the sociological domain of use (family-private vs public). The four romance corpora are tagged with respect to terminal and non terminal prosodic breaks. Terminal breaks are assumed to be the more relevant cues for the identification of relevant linguistic domains in spontaneous speech (utterances). Relations with other concurrent criteria are discussed. The multimedia storage of the C-ORAL-ROM corpus is based on this principle; each textual string ending with a terminal break is aligned, through the Win Pitch speech software, to its acoustic counterpart, generating the data base of all utterances.

2002

pdf bib
The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus
Emanuela Cresti | Massimo Moneglia | Fernanda Bacelar do Nascimento | Antonio Moreno Sandoval | Jean Veronis | Philippe Martin | Kalid Choukri | Valerie Mapelli | Daniele Falavigna | Antonio Cid | Claude Blum
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)