In the age of ever-evolving technologies (Big Data Technology) and an array of devices, communication has reached unprecedented levels, leading to an exponential growth in data. Just a few decades ago, data in the world was relatively small in size and limited in scope. However, today, data is not only vast but also rapidly expanding, driven by various sources such as share markets, social networking sites, and the widespread use of the internet.
As a testament to this data explosion, consider the fact that at the beginning of 2003, the world produced a staggering 5 billion gigabytes of data. What’s even more astonishing is that this data generation continues to grow each day. In fact, it’s estimated that a mind-boggling 2.5 quintillion bytes of data are created daily, with an astonishing 90 percent of the world’s data having been generated in just the past two years.
To put these numbers into perspective, take the example of Wal-Mart, one of the world’s largest retailers. Wal-Mart processes a staggering one million customer transactions per hour, with all of this data stored in databases estimated to contain more than 2.5 petabytes of data. Remarkably, these vast amounts of data are processed without any data loss or delay, thanks to the advancements in big data technology.
So, what exactly is “big data”? In essence, big data refers to the handling and analysis of extremely large and complex data sets. These data sets originate from a multitude of sources, including:
- Social Media Data: Social networks generate an enormous amount of data through user interactions, posts, and multimedia content.
- Stock Exchange Data: Financial markets produce vast volumes of data related to stock prices, trading volumes, and market trends.
- Power Grid Data: Utilities and power companies collect data about electricity consumption and distribution to ensure a stable power supply.
- Import-Export Data: Trade and logistics industries generate data on imports, exports, and shipping.
- Search Engine Data: Search engines like Google generate data from user queries and website indexing.
- Weather Forecasting: Meteorological agencies gather data from various sensors and satellites to predict weather conditions.
- Satellite Data: Earth observation satellites capture vast amounts of data related to land, oceans, and the atmosphere.
Traditionally, data can be classified into three types:
- Structured Data: This type of data is highly organized and is typically found in relational databases, making it easy to query and analyze.
- Semi-Structured Data: This data falls somewhere between structured and unstructured data. It includes formats like XML and JSON and is often encountered in web applications and document management systems.
- Unstructured Data: Unstructured data lacks a specific format and includes text documents, images, videos, and social media posts.
The rise of big data technologies is paramount in providing accurate and meaningful analysis of these diverse data sets. This accuracy is invaluable for informed decision-making across various sectors, from business and finance to healthcare and environmental monitoring.
In conclusion, the world’s data landscape has undergone a profound transformation, thanks to advancements in technology and the proliferation of communication devices. The rapid growth of data from various sources has given rise to the field of big data technology, which plays a crucial role in managing, analyzing, and deriving insights from this vast sea of information. As we move forward, big data technology will continue to evolve, shaping industries and enabling data-driven decision-making on an unprecedented scale.
Installation Of Hadoop
Hadoop is run on Linux kernel. If you want to install Hadoop on windows OS, Cygwin need to install in your machine.
Cygwin is creating linux like environment in windows. Here is the link to get cygwin. https://cygwin.com/install.html
Hadoop can be installed in Multi Node cluster / single node cluster. any one we choose.
In this blog I posted Installation of single node cluster in your machine on Linux OS.
Step 1:
Java is mandatory to run Hadoop so check whether java is installed or not in your machine.
To check for java
$java –version
If you want to install Java in linux OS follow the below.
Step2:
$Sudo apt-get install oracle-java8-installer
Step3:
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine where Hadoop runs. For single node cluster need to configure SSH access to local host for user.
Generate a public key
$ssh-keygen -t rsa -P “”
Then you have to enable access to your local machine.
$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step4:
Hadoop is free source from Apache software foundation go to that site and down load the Hadoop latest version which suitable to your machine.
Extract the downloaded tar file.
$tar xvfz hadoop-1.2.1.tar.gz
After extracting it do some following changes under Hadoop/conf.
Change 1:
Core-site.xml:
hadoop.tmp.dir
TEMPORARY-DIR-FOR-HADOOPDATASTORE
A base for other temporary directories
fs.default.name
hdfs://localhost:54310
Change2:
Mapared-site.xml:
mapred.job.tracker
localhost:54311
Change3:
Hdfs-site.xml:
dfs.replication
1
Step4:
Conf/slaves change to localhost
Step5:
Conf/master change to localhost
Step6:
Iit is essential to Setting up the environment variables for Hadoop and Java
For Temporary set up run the below command:
$export JAVA_HOME=/usr/lib/jvm/jdk1.8.0
$export HADOOP_COMMON_HOME=/home/hadoop/hadoop-install/hadoop-1.2.1
For permanent setting:
Open .bashrc and type end of the file append the below lines.
To open bashrc
$gedit ~/.bashrc
And type the below two lines
$export JAVA_HOME=/usr/lib/jvm/jdk1.8.0
$export HADOOP_COMMON_HOME=/home/hadoop/hadoop-install/hadoop-1.2.1
Once done the above run the below command.
$source ~/.bashrc
Step 7:
Format the Hadoop file system in Hadoop directory.
$./bin/hadoop namenode –format
Step 8:
Running the cluster.
$./bin/start-all.sh
Step 9:
To stop the cluster.
$./bin/stop-all.sh
References:
- Davenport, T. H., & Harris, J. (2007). Competing on analytics: The new science of winning. Harvard Business Press.
- Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.
- Press, G. (2013). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 81(1), 21-26.
Hi folks do you want to expose your writing skills with your Profile, Submit your post/article here without plagarism. Click here to submit.
Java articles in this website, please visit.
See the more on official guide of ORM/Hibernate here.
Official Documentation of Apache Hadoop is here