Domain-Based Similarity Calculation Method for Calculating Document Similarity

Herath, HML G; Kumara, BTGS

dc.contributor.author	Herath, HML G
dc.contributor.author	Kumara, BTGS
dc.date.accessioned	2020-12-31T20:53:01Z
dc.date.available	2020-12-31T20:53:01Z
dc.date.issued	2020
dc.identifier.uri	http://ir.kdu.ac.lk/handle/345/2965
dc.description.abstract	Abstract: Document similarity is important in different areas dealing with textual data such as knowledge management, information extraction, natural language processing, and artificial intelligence. Several methods are existing to calculate document similarity. But the results of most approaches are unsatisfactory because specific domain and contextual similarity are not taken into consideration. In this paper, a domain-based similarity calculation method to calculate document similarity is proposed by integrating context, World Wide Web (WWW), and WordNet Similarity. Context is gathered by implementing a topic modeling algorithm and generating a domain context. There are many topic modeling algorithms available and here Latent Dirichlet Allocation (LDA) is used. The World Wide Web is used to capturing the latest knowledge. The method makes it possible to get a similarity value to the words in different domains. The quality of the obtained model is compared and evaluated using human judgment to ensure the accuracy of the calculation. Results indicate the accuracy of the calculation and the proposed model can achieve the limitations of existing measures.	en_US
dc.language.iso	en	en_US
dc.subject	Domain-based Similarity	en_US
dc.subject	Topic modeling	en_US
dc.subject	Wordnet Similarity	en_US
dc.subject	World Wide Web	en_US
dc.title	Domain-Based Similarity Calculation Method for Calculating Document Similarity	en_US
dc.type	Article Full Text	en_US
dc.identifier.journal	13th International Research Conference General Sir John Kotelawala Defence University	en_US
dc.identifier.pgnos	155-162	en_US

Files in this item

Name:: FOC 155-162.pdf
Size:: 460.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science [66]

Show simple item record