A PERFORMANCE EVALUATION OF ACTIVE LEARNING BY GREEDY APPROACH AND THOMPSON SAMPLING APPROACH ON TEXT-BASED DATA

Authors

  • Arnon PROMJUN
  • Seksan KIATSUPAIBUL

Abstract

This study investigates the effectiveness of active learning strategies for selecting informative data points to enhance model performance in text classification tasks. Specifically, it compares random sampling, greedy selection, and Thompson sampling with Laplace approximation in the context of labeling tweets related to tourism in Bangkok. A logistic regression model was trained over 100 iterations using data selected by each method. The results indicate that greedy selection consistently outperformed the other approaches in the early stages, enabling rapid model improvement. However, its effectiveness declined in later stages as the availability of informative tweets decreased. Thompson sampling with Laplace approximation exhibited slower initial performance and required more time for data selection, but demonstrated steady improvement across iterations. In contrast, random sampling was the fastest method but failed to significantly enhance model performance, maintaining low accuracy throughout the experiment. These findings suggest that greedy selection is well-suited for applications requiring quick learning, while Thompson sampling holds promise for long-term learning scenarios. The insights gained from this research can inform the development of active learning frameworks in natural language processing tasks, including sentiment analysis and customer opinion mining across various industries.

Downloads

Published

2025-05-06