Table of contents

Anthology Volume Year Papers
W16-26 Proceedings of the 10th Web as Corpus Workshop 2016 16
W14-04 Proceedings of the 9th Web as Corpus Workshop (WaC-9) 2014 7
W10-15 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop 2010 6
W06-17 Proceedings of the 2nd International Workshop on Web as Corpus 2006 11


Pdf Export Search Proceedings of the 10th Web as Corpus Workshop


Pdf Export Search Proceedings of the 10th Web as Corpus Workshop
[W16-2600]: Paul Cook | Stefan Evert | Roland Schäfer | Egon Stemle

Pdf Export Search Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison
[W16-2601]: Roland Schäfer | Felix Bildhauer

Pdf Export Search Efficient construction of metadata-enhanced web corpora
[W16-2602]: Adrien Barbaresi

Pdf Export Search Topically-focused Blog Corpora for Multiple Languages
[W16-2603]: Andrew Salway | Dag Elgesem | Knut Hofland | Øystein Reigem | Lubos Steskal

Pdf Export Search The Challenges and Joys of Analysing Ongoing Language Change in Web-based Corpora: a Case Study
[W16-2604]: Anne Krause

Pdf Export Search Using the Web and Social Media as Corpora for Monitoring the Spread of Neologisms. The case of 'rapefugee', 'rapeugee', and 'rapugee'.
[W16-2605]: Quirin Würschinger | Mohammad Fazleh Elahi | Desislava Zhekova | Hans-Jörg Schmid

Pdf Export Search EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora
[W16-2606]: Michael Beißwenger | Sabine Bartsch | Stefan Evert | Kay-Michael Würzner

Pdf Export Search SoMaJo: State-of-the-art tokenization for German web and social media texts
[W16-2607]: Thomas Proisl | Peter Uhrig

Pdf Export Search UdS-(retrain|distributional|surface): Improving POS Tagging for OOV Words in German CMC and Web Data
[W16-2608]: Jakob Prange | Andrea Horbach | Stefan Thater

Pdf Export Search Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search
[W16-2609]: Gideon Mendels | Erica Cooper | Julia Hirschberg

Pdf Export Search A Global Analysis of Emoji Usage
[W16-2610]: Nikola Ljubešić | Darja Fišer

Pdf Export Search Genre classification for a corpus of academic webpages
[W16-2611]: Erika Dalan | Serge Sharoff

Pdf Export Search On Bias-free Crawling and Representative Web Corpora
[W16-2612]: Roland Schäfer

Pdf Export Search EmpiriST: AIPHES - Robust Tokenization and POS-Tagging for Different Genres
[W16-2613]: Steffen Remus | Gerold Hintz | Chris Biemann | Christian M. Meyer | Darina Benikova | Judith Eckle-Kohler | Margot Mieskes | Thomas Arnold

Pdf Export Search bot.zen $@$ EmpiriST 2015 - A minimally-deep learning PoS-tagger (trained for German CMC and Web data)
[W16-2614]: Egon Stemle

Pdf Export Search LTL-UDE $@$ EmpiriST 2015: Tokenization and PoS Tagging of Social Media Text
[W16-2615]: Tobias Horsmann | Torsten Zesch



Pdf Export Search Proceedings of the 9th Web as Corpus Workshop (WaC-9)


Pdf Export Search Proceedings of the 9th Web as Corpus Workshop (WaC-9)
[W14-0400]: Felix Bildhauer | Roland Schäfer

Pdf Export Search Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources
[W14-0401]: Adrien Barbaresi

Pdf Export Search Focused Web Corpus Crawling
[W14-0402]: Roland Schäfer | Adrien Barbaresi | Felix Bildhauer

Pdf Export Search Less Destructive Cleaning of Web Documents by Using Standoff Annotation
[W14-0403]: Maik Stührenberg

Pdf Export Search Some Issues on the Normalization of a Corpus of Products Reviews in Portuguese
[W14-0404]: Magali Sanches Duran | Lucas Avanço | Sandra Aluísio | Thiago Pardo | Maria da Graça Volpe Nunes

Pdf Export Search {bs,hr,sr}WaC - Web Corpora of Bosnian, Croatian and Serbian
[W14-0405]: Nikola Ljubešić | Filip Klubička

Pdf Export Search The PAISÀ Corpus of Italian Web Texts
[W14-0406]: Verena Lyding | Egon Stemle | Claudia Borghetti | Marco Brunello | Sara Castagnoli | Felice Dell'Orletta | Henrik Dittmann | Alessandro Lenci | Vito Pirrelli



Pdf Export Search Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop


Pdf Export Search Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
[W10-1500]: Adam Kilgarriff | Dekang Lin

Pdf Export Search NoWaC: a large web-based corpus for Norwegian
[W10-1501]: Emiliano Raul Guevara

Pdf Export Search Building a Korean Web Corpus for Analyzing Learner Language
[W10-1502]: Markus Dickinson | Ross Israel | Sun-Hee Lee

Pdf Export Search Sketching Techniques for Large Scale NLP
[W10-1503]: Amit Goyal | Jagadeesh Jagaralamudi | Hal Daumé III | Suresh Venkatasubramanian

Pdf Export Search Building Webcorpora of Academic Prose with BootCaT
[W10-1504]: George Dillon

Pdf Export Search Google Web 1T 5-Grams Made Easy (but not for the computer)
[W10-1505]: Stefan Evert



Pdf Export Search Proceedings of the 2nd International Workshop on Web as Corpus


Pdf Export Search Proceedings of the 2nd International Workshop on Web as Corpus
[W06-1700]:

Pdf Export Search Web-based frequency dictionaries for medium density languages
[W06-1701]: András Kornai | Péter Halácsy | Viktor Nagy | Csaba Oravecz | Viktor Trón | Dániel Varga

Pdf Export Search BE: A search engine for NLP research
[W06-1702]: Mike Cafarella | Oren Etzioni

Pdf Export Search A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the Web
[W06-1703]: Masatsugu Tonoike | Mitsuhiro Kida | Toshihiro Takagi | Yasuhiro Sasaki | Takehito Utsuro | S. Sato

Pdf Export Search CUCWeb: A Catalan corpus built from the Web
[W06-1704]: Gemma Boleda | Stefan Bott | Rodrigo Meza | Carlos Castillo | Toni Badia | Vicente López

Pdf Export Search Annotated Web as corpus
[W06-1705]: Paul Rayson | James Walkerdine | William H. Fletcher | Adam Kilgarriff

Pdf Export Search Web coverage of the 2004 US Presidential election
[W06-1706]: Arno Scharl | Albert Weichselbraun

Pdf Export Search Corporator: A tool for creating RSS-based specialized corpora
[W06-1707]: Cédrick Fairon

Pdf Export Search The problem of ontology alignment on the Web: A first report
[W06-1708]: Davide Fossati | Gabriele Ghidoni | Barbara Di Eugenio | Isabel Cruz | Huiyong Xiao | Rajen Subba

Pdf Export Search Using the Web as a phonological corpus: A case study from Tagalog
[W06-1709]: Kie Zuraw

Pdf Export Search Web corpus mining by instance of Wikipedia
[W06-1710]: Rüdiger Gleim | Alexander Mehler | Matthias Dehmer