INTERPRETING DOCUMENT COLLECTIONS WITH TOPIC MODEL USING LATENT DIRICHLET ALLOCATION
Sri Vasavi College, Erode Self-Finance Wing 3rd February 2017 National Conference on Computer and Communication NCCC’17
Topic models come under the area of text mining and provide a way to analyse large text corpora. a topic contains a cluster of words with similar meaning. topics are created based on the strength of their use in analysis of large repositories of information. topic modeling can be used in various application areas like document clustering, text classification, characterizing core and distributed genes within a species, etc; there are a variety of models available for identifying topics in collection of large number of text documents. these methods are latent semantic analysis (lsa), probabilistic latent semantic analysis (plsa), latent dirichlet allocation (lda).this paper gives an overview of lda model and applies lda to sample corpora taken for study.
1. R.deepa and dr.r.manicka chezian,”an ontological approach for the semantic web search and the keyword similarity metrics”, international journal of advanced research in computer and communication engineering, issn (online) 2278-1021,vol. 5, issue 3, march 2016 2. shi-hengwang et al.,“text mining for identifying topics in the literatures about adolescent substance use and depression”, article in bmc public health, doi: 10.1186/s12889-016-2932-1, december 2016 3. lijun sun, yafeng yin “discovering themes and trends in transportation research using topic modeling”, preprint submitted to transportation research, june 12, 2016
Topic models; lda; lsa; correlation.