Big Data Tools & Techniques For Msc Case Study Analysis
Table Creation from Data Folder to MySQL
Creating tables from the data which was extracted previously, using pcloudera and load the data into MySQL.
mysql-uroot-pcloudera<db_setup.sql
mysql-uroot-pcloudera<diagnoses.sql
mysql-uroot-pcloudera<imaging.sql
mysql-uroot–pcloudera<hearing_evaluation.sql
First, we have created tab separated file using below command:
mysql-uroot-pclouderaassignment-e"select*fromimaging"-B>imaging.csv
mysql-uroot-pclouderaassignment-e"select*fromdiagnoses"-B>diagnoses.csv
mysql-uroot-pclouderaassignment-e"select*fromhearing_evaluation"-B>hearing_evaluation.tsv
To create the actual csv file for all the given tables (i.e. diagnoses, hearingevaluation and imaging). We have to use the following command for each table to form the output from the tables and split it using the following regex command.
mysql-uroot-pclouderaassignment-e"select*fromdiagnoses"-B|sed"s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g">diagnoses.csv
mysql-uroot-pclouderaassignment-e"select*fromhearing_evaluation"-B|sed"s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g">hearing_evaluation.csv
mysql-uroot-pclouderaassignment-e"select*fromimaging"-B|sed"s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g">imaging.csv
Above command filters out the data as per requirement. We will repeat same filtering for all the three files before importing to the csv file.
Importing data to Hadoop
After extracting and creating the data into the MySQL, we have to import the data into Hadoop for the analysis. I have used the following commands to import the data into Hadoop from MySQL.
- sqoop import --connect jdbc:mysql://localhost:3306/assignment --table diagnoses --username root --password cloudera --target-dir /sqoop_import -m 1
- sqoop import --connect jdbc:mysql://localhost:3306/assignment --table hearing_evaluation --username root --password cloudera --target-dir /sqoop_import -m 1
sqoop import --connect jdbc:mysql://localhost:3306/assignment --table imaging --username root --password cloudera --target-dir /sqoop_import -m 1
Analyze the data
This process is to find out or figure out the outcomes of the data, which means analysis of the data using hive. Hive uses the map reducer technique to figure out the output of the command run on the hive cluster. Hive is a method for the retrieval of organized data in the Hadoop Data Warehouse. It exists on Hadoop to synthesize broad data and allows it simple to search and evaluate. Hive was originally developed by Facebook and subsequently developed by the Apache Software Foundationas an open source called the Apache Hive. This is utilized by different companies. In Amazon ElasticMapReduce, for example, Amazon uses it....................................
This is just a sample partical work. Please place the order on the website to get your own originally done case solution.