Big Data Query & Analysis by Apache Hive  1. This task is using Apache Hive for

Important - Read this before proceeding

These instructions reflect a task our writers previously completed for another student. Should you require assistance with the same assignment, please submit your homework details to our writers’ platform. This will ensure you receive an original paper, you can submit as your own. For further guidance, visit our ‘How It Works’ page.

Big Data Query & Analysis by Apache Hive 
1. This task is using Apache Hive for converting big raw data into useful information for the
end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive
queries (refer to the marking scheme). Apply appropriate visualization tools to present
your findings numerically and graphically. Interpret shortly your findings.
Finally, take screenshot of your outcomes (e.g., tables and plots) together with the
scripts/queries into the report.
Tip: The mark for this section depends on the level of your HIVE queries’ complexities, for
instance using the simple select query is not supposed for full mark .
2. 
Advanced Analytics using PySpark 
In this section, you will conduct advanced analytics using PySpark. 
3.1. Analyze and Interpret Big Data
We need to learn and understand the data through at least 4 analytical methods
(descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to
present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels
etc. accordingly to help end-users for getting insights.
3.2. Design and Build a Classifier 
a) Design and build a binary classifier over the dataset. Explain your algorithm and its
configuration. Explain your findings into both numerical and graphical
representations. Evaluate the performance of the model and verify the accuracy
and the effectiveness of your model.
b) Apply a multi-class classifier to classify data into ten classes (categories): one
normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms). Briefly explain your model with
supportive statements on its parameters, accuracy and effectiveness. [
Tip: you can use this link (https://spark.apache.org/docs/2.2.0/mlclassification-regression.html) for more information on modelling.

Leave a Comment