Benjamin K. Tsou

Also published as: B. K. T’sou, Benjamin K Tsou, Benjamin K. T’sou, Benjamin K.Y. Tsou, Benjamin Tsou

2023

pdf bib abs
Post-editing of Technical Terms based on Bilingual Example Sentences
Elsie K. Y. Chan | John Lee | Chester Cheng | Benjamin Tsou
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

As technical fields become ever more specialized, and with continuous emergence of novel technical terms, it may not be always possible to avail of bilingual experts in the field to perform translation. This paper investigates the performance of bilingual non-experts in Computer-Assisted Translation. The translators were asked to identify and correct errors in MT output of technical terms in patent materials, aided only by example bilingual sentences. Targeting English-to-Chinese translation, we automatically extract the example sentences from a bilingual corpus of English and Chinese patents. We identify the most frequent translation candidates of a term, and then select the most relevant example sentences for each candidate according to semantic similarity. Even when given only two example sentences for each translation candidate, the non-expert translators were able to post-edit effectively, correcting 67.2% of the MT errors while mistakenly revising correct MT output in only 17% of the cases.

pdf bib abs
Comparing Chinese-English MT Performance Involving ChatGPT and MT Providers and the Efficacy of AI mediated Post-Editing
Larry Cady | Benjamin Tsou | John Lee
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

The recent introduction of ChatGPT has caused much stir in the translation industry because of its impressive translation performance against leaders in the industry. We review some ma-jor issues based on the BLEU comparisons of Chinese-to-English (C2E) and English-to-Chinese (E2C) machine translation (MT) performance by ChatGPT against a range of leading MT providers in mostly technical domains. Based on sample aligned sentences from a sizable bilingual Chinese-English patent corpus and other sources, we find that while ChatGPT perform better generally, it does not consistently perform better than others in all areas or cases. We also draw on novice translators as post-editors to explore a major component in MT post-editing: Optimization of terminology. Many new technical words, including MWEs (Multi-Word Expressions), are problematic because they involve terminological developments which must balance between proper encapsulation of technical innovation and conforming to past traditions . Drawing on the above-mentioned corpus we have been developing an AI mediated MT post-editing (MTPE) system through the optimization of precedent rendition distribution and semantic association to enhance the work of translators and MTPE practitioners.

2020

pdf bib
A corpus-based comparative study of light verbs in three Chinese speech communities
Benjamin K Tsou | Ka-Fai Yip
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Bilingual Multi-word Expressions, Multiple-correspondence, and their cultivation from parallel patents: The Chinese-English case
Benjamin K. Tsou | Ka Po Chow | John Lee | Ka-Fai Yip | Yaxuan Ji | Kevin Wu
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib abs
Using Bilingual Patents for Translation Training
John Lee | Benjamin Tsou | Tianyuan Cai
Proceedings of the 28th International Conference on Computational Linguistics

While bilingual corpora have been instrumental for machine translation, their utility for training translators has been less explored. We investigate the use of bilingual corpora as pedagogical tools for translation in the technical domain. In a user study, novice translators revised Chinese translations of English patents through bilingual concordancing. Results show that concordancing with an in-domain bilingual corpus can yield greater improvement in translation quality of technical terms than a general-domain bilingual corpus.

2019

pdf bib abs
Towards a Proactive MWE Terminological Platform for Cross-Lingual Mediation in the Age of Big Data
Benjamin K. Tsou | Kapo Chow | Junru Nie | Yuan Yuan
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)

The emergence of China as a global economic power in the 21st Century has brought about surging needs for cross-lingual and cross-cultural mediation, typically performed by translators. Advances in Artificial Intelligence and Language Engineering have been bolstered by Machine learning and suitable Big Data cultivation. They have helped to meet some of the translator’s needs, though the technical specialists have not kept pace with the practical and expanding requirements in language mediation. One major technical and linguistic hurdle involves words outside the vocabulary of the translator or the lexical database he/she consults, especially Multi-Word Expressions (Compound Words) in technical subjects. A further problem is in the multiplicity of renditions of a term in the target language. This paper discusses a proactive approach following the successful extraction and application of sizable bilingual Multi-Word Expressions (Compound Words) for language mediation in technical subjects, which do not fall within the expertise of typical translators, who have inadequate appreciation of the range of new technical tools available to help him/her. Our approach draws on the personal reflections of translators and teachers of translation and is based on the prior R&D efforts relating to 300,000 comparable Chinese-English patents. The subsequent protocol we have developed aims to be proactive in meeting four identified practical challenges in technical translation (e.g. patents). It has broader economic implication in the Age of Big Data (Tsou et al, 2015) and Trade War, as the workload, if not, the challenges, increasingly cannot be met by currently available front-line translators. We shall demonstrate how new tools can be harnessed to spearhead the application of language technology not only in language mediation but also in the “teaching” and “learning” of translation. It shows how a better appreciation of their needs may enhance the contributions of the technical specialists, and thus enhance the resultant synergetic benefits.

pdf bib
Difficulty-aware Distractor Generation for Gap-Fill Items
Chak Yan Yeung | John Lee | Benjamin Tsou
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association

Implementation of legal bilingualism in Hong Kong after 1997 has necessitated the production of voluminous and extensive court proceedings and judgments in both Chinese and English. For the former, Cantonese, a dialect of Chinese, is the home language of more than 90% of the population in Hong Kong and so used in the courts. To record speech in Cantonese verbatim, a Chinese Computer-Aided Transcription system has been developed. The transcription system converts stenographic codes into Chinese text, i.e. from phonetic to orthographic representation of the language. The main challenge lies in the resolution of the sever ambiguity resulting from homocode problems in the conversion process. Cantonese Chinese is typified by problematic homonymy, which presents serious challenges. The N-gram statistical model is employed to estimate the most probable character string of the input transcription codes. Domain-specific corpora have been compiled to support the statistical computation. To improve accuracy, scalable techniques such as domain-specific transcription and special encoding are used. Put together, these techniques deliver 96% transcription accuracy.

pdf bib abs
Toward a Pan-Chinese Thesaurus
Benjamin K. Tsou | Oi Yee Kwong
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we propose a corpus-based approach to the construction of a Pan-Chinese lexical resource, starting out with the aim to enrich existing Chinese thesauri in the Pan-Chinese context. The resulting thesaurus is thus expected to contain not only the core senses and usages of Chinese lexical items but also usages specific to individual Chinese speech communities. We introduce the ideas behind the construction of the resource, outline the steps to be taken, and discuss some preliminary analyses. The work is backed up by a unique and large Chinese synchronous corpus containing textual data from various Chinese speech communities including Hong Kong, Beijing, Taipei and Singapore.

2005

pdf bib
A Synchronous Corpus-Based Study on the Usage and Perception of Judgement Terms in the Pan-Chinese Context
Oi Yee Kwong | Benjamin K. Tsou
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 4, December 2005: Special Issue on Selected Papers from CLSW-5

pdf bib
Using Multiple Discriminant Analysis Approach for Linear Text Segmentation
Jingbo Zhu | Na Ye | Xinzhi Chang | Wenliang Chen | Benjamin K Tsou
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Semantic Role Tagging for Chinese at the Lexical Level
Oi Yee Kwong | Benjamin K. Tsou
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Data Homogeneity and Semantic Role Tagging in Chinese
Oi Yee Kwong | Benjamin K. Tsou
Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition

2004

pdf bib
Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words
Raymond W.M. Yuen | Terence Y.W. Chan | Tom B.Y. Lai | O.Y. Kwong | Benjamin K.Y. Tsou
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
A Synchronous Corpus-Based Study of Verb-Noun Fluidity in Chinese
Oi Yee Kwong | Benjamin K. Tsou
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

pdf bib
Categorial Fluidity in Chinese and its Implications for Part-of-speech Tagging
Oi Yee Kwong | Benjamin K. Tsou
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Covering Ambiguity Resolution in Chinese Word Segmentation Based on Contextual Information
Xiao Luo | Maosong Sun | Benjamin K. Tsou
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Alignment and Extraction of Bilingual Legal Terminology from Context Profiles
Oi Yee Kwong | Benjamin K. Tsou | Tom B.Y. Lai | Robert W.P. Luk | Lawrence Y.L. Cheung | Francis C.Y. Chik
COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology

2001

pdf bib
Proceedings of the 15th Pacific Asia Conference on Language, Information and Computation
Benjamin K. T’sou | Olivia O.Y. Kwong | Tom B.Y. Lai
Proceedings of the 15th Pacific Asia Conference on Language, Information and Computation

pdf bib abs
Evaluating Chinese-English translation systems for personal name coverage
Benjamin K. Tsou | Oi Yee Kwong
Workshop on MT2010: Towards a Road Map for MT

This paper discusses the challenges which Chinese-English machine translation (MT) systems face in translating personal names. We show that the translation of names between Chinese and English is complicated by different factors, including orthographic, phonetic, geographic and social ones. Four existing systems were tested for their capability in translating personal names from Chinese to English. Test data embodying geographic and sociolinguistic differences were obtained from a synchronous Chinese corpus of news media texts. It is obvious that systems vary considerably in their ability to identify personal names in the source language and render them properly in the target language. Given the criticality of personal name translation to the overall intelligibility of a translated text, the coverage of personal names should be one of the important criteria in the evaluation of MT performance. Moreover, name translation, which calls for a hybrid approach, would remain a central issue to the future development of MT systems, especially for online and real-time applications.

pdf bib
Identification of Chinese Personal Names in Unrestricted Texts
Lawrence Cheung | Benjamin K. Tsou | Maosong Sun
Proceedings of the 16th Pacific Asia Conference on Language, Information and Computation

2000

pdf bib
Textual Information Segmentation by Cohesive Ties
Samuel W.K. Chan | Benjamin K. T’sou | C.F. Choy
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

pdf bib
Mining Discourse Markers for Chinese Textual Summarization
Samuel W. K. Chan | Tom B. Y. Lai | W. J. Gao | Benjamin K. T’sou
NAACL-ANLP 2000 Workshop: Automatic Summarization

pdf bib
Enhancement of a Chinese Discourse Marker Tagger with C4.5
Benjamin K. T’sou | Tom B.Y Lai | Samuel W.K. Chan | Weijun Gao | Xuegang Zhan
Second Chinese Language Processing Workshop

1999

pdf bib
Anaphora Resolution as Lexical Cohesion Identification
Samuel W.K. Chan | Benjamin K. T’sou
Proceedings of the 13th Pacific Asia Conference on Language, Information and Computation

pdf bib abs
MT evaluation
Margaret King | Eduard Hovy | Benjamin K. Tsou | John White | Yusoff Zaharin
Proceedings of Machine Translation Summit VII

This panel deals with the general topic of evaluation of machine translation systems. The first contribution sets out some recent work on creating standards for the design of evaluations. The second, by Eduard Hovy. takes up the particular issue of how metrics can be differentiated and systematized. Benjamin K. T'sou suggests that whilst men may evaluate machines, machines may also evaluate men. John S. White focuses on the question of the role of the user in evaluation design, and Yusoff Zaharin points out that circumstances and settings may have a major influence on evaluation design.