Hadda Cherroun


2023

pdf bib
AraBERT and mBert: Insights from Psycholinguistic Diagnostics
Basma Sayah | Attia Nehar | Hadda Cherroun | Slimane Bellaouar
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

2021

pdf bib
User Generated Content and Engagement Analysis in Social Media case of Algerian Brands
Aicha Chorana | Hadda Cherroun
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

2019

pdf bib
A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects
Ilyes Zine | Mohamed Cherif Zeghad | Soumia Bougrine | Hadda Cherroun
Proceedings of the 3rd International Conference on Natural Language and Speech Processing

2017

pdf bib
Toward a Web-based Speech Corpus for Algerian Dialectal Arabic Varieties
Soumia Bougrine | Aicha Chorana | Abdallah Lakhdari | Hadda Cherroun
Proceedings of the Third Arabic Natural Language Processing Workshop

The success of machine learning for automatic speech processing has raised the need for large scale datasets. However, collecting such data is often a challenging task as it implies significant investment involving time and money cost. In this paper, we devise a recipe for building largescale Speech Corpora by harnessing Web resources namely YouTube, other Social Media, Online Radio and TV. We illustrate our methodology by building KALAM’DZ, An Arabic Spoken corpus dedicated to Algerian dialectal varieties. The preliminary version of our dataset covers all major Algerian dialects. In addition, we make sure that this material takes into account numerous aspects that foster its richness. In fact, we have targeted various speech topics. Some automatic and manual annotations are provided. They gather useful information related to the speakers and sub-dialect information at the utterance level. Our corpus encompasses the 8 major Algerian Arabic sub-dialects with 4881 speakers and more than 104.4 hours segmented in utterances of at least 6 s.