Jaihyun Park


2023

pdf bib
A Quantitative Discourse Analysis of Asian Workers in the US Historical Newspapers
Jaihyun Park | Ryan Cordell
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages

The digitization of historical texts invites researchers to explore the large-scale corpus of historical texts with computational methods. In this study, we present computational text analysis on a relatively understudied topic of how Asian workers are represented in historical newspapers in the United States. We found that the word “coolie” was semantically different in some States (e.g., Massachusetts, Rhode Island, Wyoming, Oklahoma, and Arkansas) with the different discourses around coolie. We also found that then-Confederate newspapers and then-Union newspapers formed distinctive discourses by measuring over-represented words. Newspapers from then-Confederate States associated coolie with slavery-related words. In addition, we found Asians were perceived to be inferior to European immigrants and subjected to the target of racism. This study contributes to supplementing the qualitative analysis of racism in the United States with quantitative discourse analysis.

pdf bib
Understanding Gender Stereotypes in Video Game Character Designs: A Case Study of Honor of Kings
Bingqing Liu | Kyrie Zhixuan Zhou | Danlei Zhu | Jaihyun Park
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages

In this paper, we conduct a comprehensive analysis of gender stereotypes in the character design in Honor of Kings, a popular MOBA game in China. We probe gender stereotypes through the lens of role assignments, visual designs, lines, and background stories, combining qualitative analysis and text mining based on moral foundations. Male heroes are commonly designed as masculine fighters with power, and female heroes are designed as feminine “ornaments” with ideal looks. We contribute with a multi-modal dataset for understanding gender bias in games and a moral-, visual-, and role-based inspection of gender.

2022

pdf bib
Raison d’être of the benchmark dataset: A Survey of Current Practices of Benchmark Dataset Sharing Platforms
Jaihyun Park | Sullam Jeoung
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

This paper critically examines the current practices of benchmark dataset sharing in NLP and suggests a better way to inform reusers of the benchmark dataset. As the dataset sharing platform plays a key role not only in distributing the dataset but also in informing the potential reusers about the dataset, we believe data-sharing platforms should provide a comprehensive context of the datasets. We survey four benchmark dataset sharing platforms: HuggingFace, PaperswithCode, Tensorflow, and Pytorch to diagnose the current practices of how the dataset is shared which metadata is shared and omitted. To be specific, drawing on the concept of data curation which considers the future reuse when the data is made public, we advance the direction that benchmark dataset sharing platforms should take into consideration. We identify that four benchmark platforms have different practices of using metadata and there is a lack of consensus on what social impact metadata is. We believe the problem of missing a discussion around social impact in the dataset sharing platforms has to do with the failed agreement on who should be in charge. We propose that the benchmark dataset should develop social impact metadata and data curator should take a role in managing the social impact metadata.