IR@KDU Repository

Domain-Based Similarity Calculation Method for Calculating Document Similarity

Show simple item record Herath, HML G Kumara, BTGS 2020-12-31T20:53:01Z 2020-12-31T20:53:01Z 2020
dc.description.abstract Abstract: Document similarity is important in different areas dealing with textual data such as knowledge management, information extraction, natural language processing, and artificial intelligence. Several methods are existing to calculate document similarity. But the results of most approaches are unsatisfactory because specific domain and contextual similarity are not taken into consideration. In this paper, a domain-based similarity calculation method to calculate document similarity is proposed by integrating context, World Wide Web (WWW), and WordNet Similarity. Context is gathered by implementing a topic modeling algorithm and generating a domain context. There are many topic modeling algorithms available and here Latent Dirichlet Allocation (LDA) is used. The World Wide Web is used to capturing the latest knowledge. The method makes it possible to get a similarity value to the words in different domains. The quality of the obtained model is compared and evaluated using human judgment to ensure the accuracy of the calculation. Results indicate the accuracy of the calculation and the proposed model can achieve the limitations of existing measures. en_US
dc.language.iso en en_US
dc.subject Domain-based Similarity en_US
dc.subject Topic modeling en_US
dc.subject Wordnet Similarity en_US
dc.subject World Wide Web en_US
dc.title Domain-Based Similarity Calculation Method for Calculating Document Similarity en_US
dc.type Article Full Text en_US
dc.identifier.journal 13th International Research Conference General Sir John Kotelawala Defence University en_US
dc.identifier.pgnos 155-162 en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR@KDU


My Account