A train accident data was obtained from internet and the data contains many variables like people Survived data, sex, age, class travelled & other data. An attempt has been made to implement Logistic Linear classficiation and find survival probabilities.

Import Required Libraries

The Total Number of Columns & Rows of Data imported. Totally 619 x 10 data points available.

Now check for null in the data

Remove the unneccessary data

Check Null again

There is null in Station onbaord., it can be dropped or substituted. Am dropping it.

Check the datatypes in the csv

Convert sex into categorical

Converting into Indicative variable

The total number of male vs female in the coach

Station Boarded on Details

Let do some heat map

PassengerID doesnt have any role, better to remove it

Now again, performing heatmap

There is a strong correlation between Sex and Survived stats; Now lets built a classifier, using Independent x & dependent y variable.

Now, its time to split dataset into trained and test

Checking the shape of test and train datasets

Applying Logistic regression

On x train & ytrain

Storing the Prediction in y_pred variables.

Computing Confusion Matrixs to Evaluate the accuracy of Classification used

Lets Check more accuracy

The computed matrixs has the accuracy of 81%. That is for 100 Passengers, around 81 passengers status are accurately known as survived or not survided.

Lets Plot other parameter from Confusion matrixs

Algorithm has decent precision & decent recall scores. The Coefficients & projected lines can be generated.

Now the Intercept is sensible value and we can check the Probabilities.

Now, the first column will be probability of entry has not survived label and second column entry that survived label.

Above shows that more number of Females survived and bottomone shows more male did not survive.

On a cautionary Note, bottom I tried to plot Class vs Suriveval plot, but it is Lurking variable syndrome. Not to be accounted.

Even Cost vs Survival,