ANALYZING SENTIMENTS IN ONE GO: A SUPERVISEDJOINT TOPIC MODELING APPROACH

 

ABSTRACT

In this work, we focus on modeling user-generated review and overall rating pairs, and aim to identify semantic aspects andaspect-level sentiments from review data as well as to predict overall sentiments of reviews. We propose a novel probabilisticsupervised joint aspect and sentiment model (SJASM) to deal with the problems in one go under a unified framework. SJASMrepresents each review document in the form of opinion pairs, and can simultaneously model aspect terms and corresponding opinionwords of the review for hidden aspect and sentiment detection. It also leverages sentimental overall ratings, which often comes withonline reviews, as supervision data, and can infer the semantic aspects and aspect-level sentiments that are not only meaningful butalso predictive of overall sentiments of reviews. Moreover, we also develop efficient inference method for parameter estimation ofSJASM based on collapsed Gibbs sampling. We evaluate SJASM extensively on real-world review data, and experimental resultsdemonstrate that the proposed model outperforms seven well-established baseline methods for sentiment analysis tasks.

EXISTING SYSTEM:

By formulating overall sentiment analysis as a classificationproblem, Pang et al. built supervised models on standardn-gram text features to classify review documents intopositive or negative sentiments. Moreover, to prevent a sentimentclassifier from considering non-subjective sentences,Pang and Lee  used a subjectivity detector to filter outnon-subjective sentences of each review, and then appliedthe classifier to resulting subjectivity extracts for sentimentprediction. A similar two-stage method was also proposedin for document-level sentiment analysis. A variety offeatures (indicators) have been evaluated for overall sentimentclassification tasks. Zhao et al.  employed a conditionalrandom fields based model to incorporate contextualdependency and label redundancy constraint featuresfor sentence-level sentiment classification, while Yang andCardie incorporated lexical and discourse constraintsat intra-/inter-sentence level via a similar model for theproblem.Sentiment analysis of social media data, such as tweets,blogs, and forums, has attracted extensive attention, whichcan be perhaps viewed as sentiment analysis at documentor sentence level.Abbasi et al. [first selected stylistic and syntacticfeatures via entropy weighted genetic method, and then,they trained a supervised classification model on the featuresfor sentiment prediction in Web forums. To analyzeoverall sentiments of blog (and review) documents, Melvilleet al. incorporated background/prior lexical knowledgebased on a pre-compiled sentiment lexicon into a supervisedpooling multinomial text classification model. Hu et al. combined sentimental consistency and emotional contagionwith supervised learning for sentiment classification inmicro-blogging.Unsupervised linguistic methods rely on developing syntacticrules or dependency patterns to cope with fine-grainedsentiment analysis problem. Qiu et al. proposed asyntactic parsing based double propagation method forfeature-specific sentiment analysis. Based on dependencygrammar  they first defined eight syntactic rules, andemployed the rules to recognize pair-wise word dependencyfor each review sentence. Then, given opinion wordseeds, they iteratively extracted more opinion words andthe related features, by relying on the identified syntacticdependency relations. They inferred the sentiment polaritieson the features via a heuristic contextual evidencebased method during the iterative extraction process

PROPOSED SYSTEM:

Next, this work has made the following main contributions:_ This work presents a new supervised joint topic modelcalled SJASM, which forms the prediction for overallratings/sentiments of reviews via normal linear modelbased on the inferred hidden aspects and sentiments inthe reviews._ It formulates overall sentiment analysis and aspectbasedsentiment analysis in a unified framework, whichallows SJASM to leverage the inter-dependency betweenthe two problems and to support the problemsto improve each other._ It presents a detailed inference method for SJASMbased on collapsed Gibbs sampling._ This work compares SJASM with seven strong representativebaselines, and experimentally shows thebenefits of SJASM over them for the sentiment analysisproblems.

 

 

CONCLUSIONS

In this work, we focus on modeling online user-generatedreview data, and aim to identify hidden semantic aspectsand sentiments on the aspects, as well as to predict overallratings/sentiments of reviews. We have developed anovel supervised joint aspect and sentiment model (SJASM)to deal with the problems in one go under a unifiedframework. SJASM treats review documents in the form ofopinion pairs, and can simultaneously model aspect termsand their corresponding opinion words of the reviews forsemantic aspect and sentiment detection. Moreover, SJASMalso leverages overall ratings of reviews as supervision andconstraint data, and can jointly infer hidden aspects andsentiments that are not only meaningful but also predictiveof overall sentiments of the review documents. Weconducted experiments using publicly available real-worldreview data, and extensively compared SJASM with sevenwell-established representative baseline methods. For semanticaspect detection and aspect-level sentiment identificationproblems, SJASM outperforms all the generativebenchmark models, sLDA, JST, ASUM, and LARA. As foroverall sentiment prediction, SJASM again outperforms thesix benchmark methods sLDA, Pooling, SVM, JST, ASUM,and Lexicon.Online user-generated reviews are often associated withlocation or time-stamp information. For future work, wewill extend the proposed model by modeling the metadatato cope with the spatio-temporal sentiment analysis ofonline reviews. Probabilistic topic modeling approaches tosentiment analysis often requires the number of latent topicsto be specified in advance of analyzing review data. Anotherinteresting future direction of our work is to developBayesian nonparametric model, which can automaticallyestimate the number of latent topics from review data, andalso allow the number of the topics to increase as new dataexamples appear.ACKNOWLEDGMENTSThis work was supported in part by a grant awarded bya Singapore MOE AcRF Tier 2 Grant (ARC30/12) and aSingapore MOE AcRF Tier 1 Grant (RG 66/12).

REFERENCES

[1] B. Liu, “Sentiment analysis and opinion mining,” Synthesis Lectureson Human Language Technologies, vol. 5, no. 1, pp. 1–167, May 2012.

[2] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentimentclassification using machine learning techniques,” in Proceedingsof the ACL-02 conference on Empirical methods in natural languageprocessing – Volume 10, ser. EMNLP’02. Stroudsburg, PA, USA:Association for Computational Linguistics, 2002, pp. 79–86.

[3] V. Ng, S. Dasgupta, and S. M. N. Arifin, “Examining the role oflinguistic knowledge sources in the automatic identification andclassification of reviews,” in Proceedings of the COLING/ACL onMain Conference Poster Sessions, ser. COLING-ACL ’06. Stroudsburg,PA, USA: Association for Computational Linguistics, 2006,pp. 611–618.

[4] J. Zhao, K. Liu, and G. Wang, “Adding redundant features forcrfs-based sentence sentiment classification,” in Proceedings of theConference on Empirical Methods in Natural Language Processing, ser.EMNLP ’08. Stroudsburg, PA, USA: Association for ComputationalLinguistics, 2008, pp. 117–126.

[5] P. Melville, W. Gryc, and R. D. Lawrence, “Sentiment analysis ofblogs by combining lexical knowledge with text classification,”in Proceedings of the 15th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, ser. KDD’09. New York,NY, USA: ACM, 2009, pp. 1275–1284.

[6] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts,“Learning word vectors for sentiment analysis,” in Proceedingsof the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies – Volume 1, ser. HLT’11.Stroudsburg, PA, USA: Association for Computational Linguistics,2011, pp. 142–150.

[7] B. Yang and C. Cardie, “Context-aware learning for sentence-levelsentiment analysis with posterior regularization,” in Proceedingsof the 52nd Annual Meeting of the Association for ComputationalLinguistics, ACL 2014, June 22-27, 2014, Baltimore, MD, USA, Volume1: Long Papers, 2014, pp. 325–335.

[8] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”J. Mach. Learn. Res., vol. 3, pp. 993–1022, March 2003.