Text Learning Project
Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine on-line opinions from the Internet and learn customers' preferences for economic or marketing research, or for leveraging a strategic advantage. In this paper, we propose a two-stage Bayesian algorithm that is able to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. Experimental results on the Movie Reviews data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several state-of-the-art machine learning methods. Our findings suggest that sentiments are captured by conditional dependence relations among words, rather than by keywords or high-frequency words.
Keywords: Opinion, Sentiments, Sematic Orientation, Semantic Learning, Information Retrieval, Text Analysis, Text Classification, Bayesian Network, Markov Blanket, Tabu Search, Local Dependencies
Xue Bai, Rema Padman, and Edoardo Airoldi. Sentiment Extraction from Unstructured Text using Tabu Search-Enhanced Markov Blanket. Workshop on Mining the Semantic Web, at the 10th ACM SIGKDD Conference, Seattle, WA, 2004. [Earlier version available under the same title as Carnegie Mellon University, School of Computer Science, Technical Report CMU-ISRI-04-127. Pittsburgh: July 2004. (PDF, (PS)].