Saturday, November 11, 2023

What is Big Data

Big Data as the name suggests is data so huge, complex and ever growing that conventional technologies are not able to store or process that amount of data.

Examples of Big Data

Some of the examples of the volume of data are as follows-

  1. Wal-Mart Inc. handles more than a million customer transactions every hour and store data related to these transactions into databases estimated to contain more than 2.5 petabytes of data.
  2. Social media giant Facebook which stores data about posts, likes, photos of the users has an incoming daily rate of about 600 TB of data. Their warehouse stores upwards of 300 PB of data.
  3. As per estimate by Cisco there will be about 50 billion devices connected to the internet by 2020.

Processing Big Data

Big data is not only about the generated volume of data but how that data is stored and how it is processed for analytical insights. Big data analysis helps with fast decision making for businesses, helps business to analyse consumer behavior, helps with fraud detection by analyzing any anomaly in the data.

Types of data in Big Data

Since data is collected from various sources like logs, emails, sensors, mobile devices etc. data can be structured, unstructured or semi-structured.

  • Structured data – Data that is organized as per the pre-defined format can be categorized as structured data. Examples of structured data are the data stored in relational databases or spreadsheets.
  • Semi-Structured data – Semi-structured data doesn’t have a fixed format like structured data but still it has some organizational structure like tags or markups to differentiate between elements.

    Examples of Semi-structured data are XML, JSON.

  • Unstructured data – Unstructured data is the data that does not fit into any pre-defined format. It usually includes different types of data like text and multimedia content and that too in no order. Unstructured data is hard to format, manipulate and analyze as it doesn’t follow any pre-defined format.

    Examples of unstructured data are web pages, email messages, logs.

Characteristics of Big data

Big data can be described by the following five characteristics-

  1. Volume– One of the characteristics of Big data is the large volume of data, where data is talked about in terms of terabytes and petabytes if not more.
  2. Variety– Big data may be any type of data – structured, semi-structured or unstructured. Also data may come from various sources like sensors, click streams, logs. Different sources and nature of data results in challenges in terms of formatting, storing and analyzing. At the same time analyzing data of all these types together provide better analytical insight.
  3. Velocity– This characteristics is about the speed at which the data is generated and processed. There are many time-sensitive processes where Big data technologies should be able to analyze the streamed data as soon as possible.
  4. Variability- This characteristics is about the inconsistency of the data flow. There are generally peak times of data flow that may hamper the process of handling the data effectively.
  5. Veracity– This characteristics is about data quality which can vary. That may happen because of the fact that data comes from multiple sources making it hard to format, match and manipulate. The quality of data affects the usefulness of the analytical insight.

Big Data technologies

As already stated Big data does not refer only to the huge volume of data but the technologies that can store, process and analyze that data.

  • Apache Hadoop– Hadoop is like synonymous to Big data. There is a whole ecosystem build around Hadoop to work with Big data. MapReduce, Hive, Pig for data processing, HDFS as a data layer for storing data, Flume, Kafka, Sqoop for data ingestion, Oozie for job scheduling, Zookeeper for operational services.
  • NoSql Databases– NoSql Databases doesn’t follow the rigid structure of relational databases and provide fast performance while storing data especially unstructured or semi-structured data. Examples of NoSql databases are MongoDB, Cassandra, Hbase.
  • R- R is one of the most popular statistical analysis package which helps with analysis of the data.

That's all for this topic What is Big Data. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Hadoop Framework Tutorial Page


Related Topics

  1. Introduction to Hadoop Framework
  2. What is Hadoop Distributed File System (HDFS)
  3. Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode
  4. How MapReduce Works in Hadoop
  5. YARN in Hadoop

You may also like-

  1. HDFS Commands Reference List With Examples
  2. What is SafeMode in Hadoop
  3. Installing Ubuntu Along With Windows
  4. Word Count MapReduce Program in Hadoop
  5. How to Create Immutable Class in Java
  6. How to Pass Command Line Arguments in Eclipse
  7. JVM Run-Time Data Areas - Java Memory Allocation
  8. Heap Memory Allocation in Java

1 comment: