Big Data and Hadoop: A Review

Weerasinghe, LDSB; Hettige, B

View/Open

COM-013.pdf (918.4Kb)

Date

2016

Author

Weerasinghe, LDSB

Hettige, B

Metadata

Show full item record

Abstract

Most of the people use electronic data stored systems which are powered by new technology and share data across the other users. Therefore the data become unstructured and larger in size. This paper reviews on big data and Hadoop by considering security, data analysis, data storage methods and speed. Then figure what are the problems in big data and how these problems overcome using Hadoop with its architecture. Critically review the limitations of the Hadoop and find the solutions using various technologies such as multi-agent concept and machine learning concept. Big Data is mean that about a terabyte or the Zettabyte size of the file. Big data define by using four parameters, such as the scale of the data as volume, a different form of data as variance, analysis of streaming data as velocity and uncertainty of data as veracity. The people use Relational database management schemas (RDBMS) to store big data and faced a lot with difficulties such as very costly, only have fixed schema, difficult to save huge files, difficult to access files and take a lot of time to perform analytics. Hadoop is a framework which helps people to save that kind of data and do an analysis of that data. Hadoop using a distributed file system to save the big size of data and implementation of google map reduces algorithm for analysis big data on Hadoop. Relational databases deal with structured data, but the Hadoop deal with unstructured data. Hadoop is an open source data management system with no cost associated with it. This can design from the single server computer and can scale up to millions of computer servers using parallelism with a high degree of fault tolerance.

URI

http://ir.kdu.ac.lk/handle/345/1251

Collections

Computing [28]