Soman K. P.

Also published as: Soman K P, Soman K.p.


2023

pdf bib
Improving Reinfocement Learning Agent Training using Text based Guidance: A study using Commands in Dravidian Languages
Nikhil Chowdary Paleti | Sai Aravind Vadlapudi | Sai Aashish Menta | Sai Akshay Menta | Vishnu Vardhan Gorantla V N S L | Janakiram Chandu | Soman K P | Sachin Kumar S
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Reinforcement learning (RL) agents have achieved remarkable success in various domains, such as game-playing and protein structure prediction. However, most RL agents rely on exploration to find optimal solutions without explicit guidance. This paper proposes a methodology for training RL agents using text-based instructions in Dravidian Languages, including Telugu, Tamil, and Malayalam along with using the English language. The agents are trained in a modified Lunar Lander environment, where they must follow specific paths to successfully land the lander. The methodology involves collecting a dataset of human demonstrations and textual instructions, encoding the instructions into numerical representations using text-based embeddings, and training RL agents using state-of-the-art algorithms. The results demonstrate that the trained Soft Actor-Critic (SAC) agent can effectively understand and generalize instructions in different languages, outperforming other RL algorithms such as Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG).

pdf bib
Enhancing Telugu News Understanding: Comparative Study of ML Algorithms for Category Prediction
Manish Rama Gopal Nadella | Venkata Krishna Rayalu Garapati | Eswar Sudhan S.k. | Gouthami Jangala | Soman K.p. | Sachin Kumar
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

As one of the most extensively used languages in India, Telugu has a sizable audience and a huge library of news articles. Predicting the categories of Telugu news items not only helps with efficient organization but also makes it possible to do trend research, advertise in a certain demographic, and provide individualized recommendations. In order to identify the most effective method for accurate Telugu news category prediction, this study compares and contrasts various machine learning (ML) techniques, including support vector machines (SVM), random forests, and naive Bayes. Accuracy, precision, recall, and F1-score will be utilized as performance indicators to gauge how well these algorithms perform. The outcomes of this comparative analysis will address the particular difficulties and complexities of the Telugu language and add to the body of knowledge on news category prediction. For Telugu-speaking consumers, the study intends to improve news organization and recommendation systems, giving them more relevant and customized news consumption experiences. Our result emphasize that, although other models can be taken into account for further research and comparison, W2Vec-skip gram with polynomial SVM is the best performing combination.

2018

pdf bib
Amrita_student at SemEval-2018 Task 1: Distributed Representation of Social Media Text for Affects in Tweets
Nidhin A Unnithan | Shalini K. | Barathi Ganesh H. B. | Anand Kumar M | Soman K. P.
Proceedings of the 12th International Workshop on Semantic Evaluation

In this paper we did an analysis of “Affects in Tweets” which was one of the task conducted by semeval 2018. Task was to build a model which is able to do regression and classification of different emotions from the given tweets data set. We developed a base model for all the subtasks using distributed representation (Doc2Vec) and applied machine learning techniques for classification and regression. Distributed representation is an unsupervised algorithm which is capable of learning fixed length feature representation from variable length texts. Machine learning techniques used for regression is ’Linear Regression’ while ’Random Forest Tree’ is used for classification purpose. Empirical results obtained for all the subtasks by our model are shown in this paper.

pdf bib
CENNLP at SemEval-2018 Task 1: Constrained Vector Space Model in Affects in Tweets
Naveen J R | Barathi Ganesh H. B. | Anand Kumar M | Soman K P
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper discusses on task 1, “Affect in Tweets” sharedtask, conducted in SemEval-2018. This task comprises of various subtasks, which required participants to analyse over different emotions and sentiments based on the provided tweet data and also measure the intensity of these emotions for subsequent subtasks. Our approach in these task was to come up with a model on count based representation and use machine learning techniques for regression and classification related tasks. In this work, we use a simple bag of words technique for supervised text classification model as to compare, that even with some advance distributed representation models we can still achieve significant accuracy. Further, fine tuning on various parameters for the bag of word, representation model we acquired better scores over various other baseline models (Vinayan et al.) participated in the sharedtask.

pdf bib
TeamCEN at SemEval-2018 Task 1: Global Vectors Representation in Emotion Detection
Anon George | Barathi Ganesh H. B. | Anand Kumar M | Soman K P
Proceedings of the 12th International Workshop on Semantic Evaluation

Emotions are a way of expressing human sentiments. In the modern era, social media is a platform where we convey our emotions. These emotions can be joy, anger, sadness and fear. Understanding the emotions from the written sentences is an interesting part in knowing about the writer. In the amount of digital language shared through social media, a considerable amount of data reflects the sentiment or emotion towards some product, person and organization. Since these texts are from users with diverse social aspects, these texts can be used to enrich the application related to the business intelligence. More than the sentiment, identification of intensity of the sentiment will enrich the performance of the end application. In this paper we experimented the intensity prediction as a text classification problem that evaluates the distributed representation text using aggregated sum and dimensionality reduction of the glove vectors of the words present in the respective texts .

pdf bib
CENNLP at SemEval-2018 Task 2: Enhanced Distributed Representation of Text using Target Classes for Emoji Prediction Representation
Naveen J R | Hariharan V | Barathi Ganesh H. B. | Anand Kumar M | Soman K P
Proceedings of the 12th International Workshop on Semantic Evaluation

Emoji is one of the “fastest growing language ” in pop-culture, especially in social media and it is very unlikely for its usage to decrease. These are generally used to bring an extra level of meaning to the texts, posted on social media platforms. Providing such an added info, gives more insights to the plain text, arising to hidden interpretation within the text. This paper explains our analysis on Task 2, ” Multilingual Emoji Prediction” sharedtask conducted by Semeval-2018. In the task, a predicted emoji based on a piece of Twitter text are labelled under 20 different classes (most commonly used emojis) where these classes are learnt and further predicted are made for unseen Twitter text. In this work, we have experimented and analysed emojis predicted based on Twitter text, as a classification problem where the entailing emoji is considered as a label for every individual text data. We have implemented this using distributed representation of text through fastText. Also, we have made an effort to demonstrate how fastText framework can be useful in case of emoji prediction. This task is divide into two subtask, they are based on dataset presented in two different languages English and Spanish.

pdf bib
AmritaNLP at SemEval-2018 Task 10: Capturing discriminative attributes using convolution neural network over global vector representation.
Vivek Vinayan | Anand Kumar M | Soman K P
Proceedings of the 12th International Workshop on Semantic Evaluation

The “Capturing Discriminative Attributes” sharedtask is the tenth task, conjoint with SemEval2018. The task is to predict if a word can capture distinguishing attributes of one word from another. We use GloVe word embedding, pre-trained on openly sourced corpus for this task. A base representation is initially established over varied dimensions. These representations are evaluated based on validation scores over two models, first on an SVM based classifier and second on a one dimension CNN model. The scores are used to further develop the representation with vector combinations, by considering various distance measures. These measures correspond to offset vectors which are concatenated as features, mainly to improve upon the F1score, with the best accuracy. The features are then further tuned on the validation scores, to achieve highest F1score. Our evaluation narrowed down to two representations, classified on CNN models, having a total dimension length of 1204 & 1203 for the final submissions. Of the two, the latter feature representation delivered our best F1score of 0.658024 (as per result).

2016

pdf bib
AmritaCEN at SemEval-2016 Task 11: Complex Word Identification using Word Embedding
Sanjay S.P | Anand Kumar M | Soman K P
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)