Slides

Video for the 19th November 2020 session

Video for the 23rd November 2020 session

Video for the 1st December 2020 session

Linear Regression handout

Logistic Regression handout


Files for this topic's exercises


Assignment 5

1) Take a dataset for regression analysis with a minimum of 5 predictors (e.g.: https://towardsdatascience.com/regression-analysis-on-life-expectancy-6914775a77e2). Describe the dataset (predictors, response, number of data points). Perform exploratory data analysis (with scatterplot matrix and heatmaps) to find out how the variables are related. Please divide the data for training and testing. Use stepwise regression to fit a model and find out the most significant variables in the model. How well did the model fit the test data? Describe the model equation, root mean squared error (RMSE) and comment if the assumptions are met. If the assumptions are not met, how would you improve the model? Please describe in words about the relationship between the significant predictors and the response. Please share the notebook (the code and the screenshots of the visualizations) to show this work with the assignment.

2) Take a dataset for classification analysis with a minimum of 3 predictors (e.g.: accelerometer data from a smart phone for human activity recognition). Describe the dataset (predictors, response, number of data points). Perform exploratory data analysis to understand the relationships between the variables. Please divide the data for training and testing. What algorithm is used for fitting the model? What accuracy is achieved when the model is fit? Which of the predictors are significant? How well did the model fit the test data? What is the confusion matrix for this model and what do you infer from it? Please describe in words about the relationship between the significant predictors and the response. Please share the notebook (the code and the screenshots of the visualizations) to show this work with the assignment.

Use any data science platform of choice. Please provide the references to where you obtained the datasets from.

Deadline

Assignment due on 5th January 2020.