Instructions When you finish this class we hope that you would be able to write

Important - Read this before proceeding

These instructions reflect a task our writers previously completed for another student. Should you require assistance with the same assignment, please submit your homework details to our writers’ platform. This will ensure you receive an original paper, you can submit as your own. For further guidance, visit our ‘How It Works’ page.

Instructions
When you finish this class we hope that you would be able to write and understand Python code to solve unique Data Analytics tasks on your own.
Something we feel an employer would expect if this class was on your resume.
It is expected that you have studied the material by reading and completing the zyBook activities, asking questions, and trying out code.
For each assignment it is expected that:
You would go through the program development cycle.
Understand the problem task thoroughly. UNDERSTAND
Plan your code by producing an algorithm showing all of the steps. ANALYZE
This could be done as putting pseudocode comments in the code.
Write your code. APPLY
Test your code thoroughly. EVALUATE – FINISH CREATION
Double check the assignment requirements.
Programming Submission Rubric for DAT 53580% Assignment Requirements Fulfilled. The program runs correctly.
20% Divided equally among the items in the list below:
Only zipped folders or single files should be submitted to Blackboard.
All files require the correct extensions. .py for Spyder IDE files or .ipynb
Zip multiple files in a folder to submit.
All submission folder and file names must include the student’s last name.
All individual files should have self documenting names.
All variables should have self documenting names.
Use of comments, including a comment block at the top of each file with your name and other details.
Include this sentence in the comment block at the top and type in your name:
I certify, that this computer program submitted by me is all of my own work. Signed: Your Name
All sources cited.
Correct spelling and grammar.
Neat, clearly presented code.
Session 4 Programming Assignment Week 7 & 8
Please use one Jupyter Notebook file for all parts of the assignment.
Upload your .ipynb file to MyCourses.
Part 1 – 20 PointsThe dataset SDSS contains 17 observational features and one class feature for 10000 deep sky objects observed by the Sloan Digital Sky Survey. Use sklearn’s KNeighborsClassifier function to perform kNN classification to classify each object by the object’s redshift and u-g color.
Import the necessary modules for kNN classification
Create a dataframe X with features redshift and u_g
Create dataframe y with feature class
Initialize a kNN model with k=3
Fit the model using the training data
Find the predicted classes for the test data
Calculate the accuracy score and confusion matrix
Ex: If the feature u is used rather than u_g, the output is:
Accuracy score is 0.979[[1452 18 0] [ 7 268 0] [ 38 0 1217]]
Part 2 – 20 PointsThe nbaallelo_slr dataset contains information on 126315 NBA games between 1947 and 2015. The columns report the points made by one team, the Elo rating of that team coming into the game, the Elo rating of the team after the game, and the points made by the opposing team. The Elo score measures the relative skill of teams in a league.
Load the dataset into a data frame.
Create a new column y in the data frame that is the difference between the points made by the two teams.
Use sklearn’s LinearRegression() function to perform a simple linear regression on the y and elo_i columns.
Compute the proportion of variation explained by the linear regression using the LinearRegression object’s score method.
Ex: If the Elo rating of the team after the game, elo_n, is used instead of elo_i, the output is:
The intercept of the linear regression line is -59.135. The slope of the linear regression line is 0.040. The proportion of variation explained by the linear regression model is 0.111. Part 3 – 20 PointsThe nbaallelo_log file contains data on 126314 NBA games from 1947 to 2015. The dataset includes the features pts, elo_i, win_equiv, and game_result. Using the csv file nbaallelo_log.csv and scikit-learn’s LogisticRegression function, construct a logistic regression model to classify whether a team will win or lose a game based on the team’s elo_i score.
Hot encode the game_result variable as a numeric variable with 0 for L and 1 for W
Use the LogisticRegression function to construct a logistic regression model with game_result as the target and elo_i as the predictor.
Predict the probability of a win from an elo_i score of 1310.
Predict whether a team with an elo_i score of 1310 will win.
Note: Use ravel() from numpy to flatten the second argument of LogisticRegression.fit() into a 1-D array.
Ex: If a elo_i score of 1410 is used instead of 1310, the output is:
A team with the given elo_i score has predicted probability: 0.593 losing0.407 winningand the overall prediction is 0

Leave a Comment