files 4 files total
Instructions
When you finish this class we hope that you would be able to write and understand Python code to solve unique Data Analytics tasks on your own.
Something we feel an employer would expect if this class was on your resume.
It is expected that you have studied the material by reading and completing the zyBook activities, asking questions, and trying out code.
For each assignment it is expected that:
You would go through the program development cycle.
Understand the problem task thoroughly. UNDERSTAND
Plan your code by producing an algorithm showing all of the steps. ANALYZE
This could be done as putting pseudocode comments in the code.
Write your code. APPLY
Test your code thoroughly. EVALUATE – FINISH CREATION
Double check the assignment requirements.
Programming Submission Rubric for DAT 53580% Assignment Requirements Fulfilled. The program runs correctly.
20% Divided equally among the items in the list below:
Only zipped folders or single files should be submitted to Blackboard. All files require the correct extensions. .py for Spyder IDE files or .ipynb
Zip multiple files in a folder to submit.
All submission folder and file names must include the student’s last name.
All individual files should have self documenting names.
All variables should have self documenting names.
Use of comments, including a comment block at the top of each file with your name and other details.
Include this sentence in the comment block at the top and type in your name:
I certify, that this computer program submitted by me is all of my own work. Signed: Your Name
All sources cited.
Correct spelling and grammar.
Neat, clearly presented code.
Session 3 Programming Assignment Week 5 & 6
Use one Jupyter Notebook file for all parts of the assignment.
Upload your .ipynb file to MyCourses.
PART 1 – 20 PointsThe dataset mpg contains information on miles per gallon (mpg) and engine size for cars sold from 1970 through 1982. The dataset has the features mpg, cylinders, displacement, horsepower, weight, acceleration, model_year, origin, and name.
Load the dataset mpg
Create a new dataframe using the columns weight and mpg
Use matplotlib to make a scatter plot of weight vs mpg labelling the x-axis Weight and the y-axis MPG
If displacement and horsepower were used instead of weight and mpg, the output would be:
displacement horsepower0 307.0 130.01 350.0 165.02 318.0 150.03 304.0 150.04 302.0 140.0.. … …393 140.0 86.0394 97.0 52.0395 135.0 84.0396 120.0 79.0397 119.0 82.0 [398 rows x 2 columns] Part 2 – 20 PointsThe titanic dataset contains data on 887 Titanic passengers, including each passenger’s survival status, embarkation location, cabin class, and sex. Write a program that performs the following tasks:
Load the dataset in titanic.csv as titanic.
Create a new data frame, firstSouth, by subsetting titanic to include instances where a passenger is in the first class cabin (pclass feature is 1) and boarded from Southampton (embarked feature is S).
Create a new data frame, secondThird, by subsetting titanic to include instances where a passenger is either in the second (pclass feature is 2) or third class (pclass feature is 3) cabin.
Create bar charts for the following:Passengers in first class who embarked in Southampton grouped by sex
Passengers in second and third class grouped by survival status
The output should be: survived pclass sex age … deck embark_town alive alone3 1 1 female 35.0 … C Southampton yes False6 0 1 male 54.0 … E Southampton no True11 1 1 female 58.0 … C Southampton yes True23 1 1 male 28.0 … A Southampton yes True27 0 1 male 19.0 … C Southampton no False [5 rows x 15 columns] survived pclass sex age … deck embark_town alive alone0 0 3 male 22.0 … NaN Southampton no False2 1 3 female 26.0 … NaN Southampton yes True4 0 3 male 35.0 … NaN Southampton no True5 0 3 male NaN … NaN Queenstown no True7 0 3 male 2.0 … NaN Southampton no False [5 rows x 15 columns] Part 3 – 20 PointsThe nbaallelo_slr dataset contains information on 126315 NBA games between 1947 and 2015. The columns report the points made by one team, the Elo rating of that team coming into the game, the Elo rating of the team after the game, and the points made by the opposing team. The Elo rating measures the relative skill of teams in a league.
The code creates a new column y in the data frame that is the difference between pts and opp_pts.
Split the data into 70 percent training set and 30 percent testing set using sklearn’s train_test_split function. Set random_state=0.
Store elo_i and y from the training data as the variables X and y.
The code performs a simple linear regression on X and y.
Perform 10-fold cross-validation with the default scorer using scikit-learn’s cross_val_score function.
Ex: If random_state=1 is used, the output is:
The cross-validation scores are [0.08005753 0.06323433 0.0703416 0.06041472 0.07528804 0.06335341 0.07329385 0.06774416 0.06069483 0.07071991]please follow instructions
part1,2,3 should be one file so need 4 different files same questions like one question or four people
files 4 files total Instructions When you finish this class we hope that you wou
Important - Read this before proceeding
These instructions reflect a task our writers previously completed for another student. Should you require assistance with the same assignment, please submit your homework details to our writers’ platform. This will ensure you receive an original paper, you can submit as your own. For further guidance, visit our ‘How It Works’ page.