Covid-19 Mortality Risk Prediction based on Patient Blood Reports

Based on blood test report, a participant needs to predict the mortality rate of a Covid-19 patient.

Days

Hours

Minutes

Seconds

This Hackathon sponsored by Flip Robo Technologies aims to provide hands on experience to our learners, compete with their peers and achieve a higher credibility in the Analytics Space.

FAQs

1. Which platform I need to use?

Ans. You need to use Python.

2. Is there any prize or reward?

Ans. There’s no prize or reward.

Problem Statement:

Predict the risk of mortality of a patient (due to corona Virus) based on his blood report, given the dataset of patient hospitalization records. Please ensure that appropriate features and data-rows are chosen. Please choose appropriate hyper-parameters for the model and justify the hyper-parameter based on the performance measure chosen. Once you train the model, there is a separate test set also available. The prediction should yield high performance measure in the test set. Justify your model based on the performance measure chosen, and also list the most important features arrived by training the model. You can choose a variety of ML algorithms

Data Available:

We have some real time data of about available for 375 + 100 patients. These patients took multiple blood tests and these results have been recorded with time. The patients that survived have an outcome of 0 and patients who succumbed have an outcome of 1. I have also supplied the test data set.

Task:

1) Write a model to predict the mortality likelihood of the patient based on the data given. This needs to be done by following methods:

a. Do not fill any missing data. Substitute all the missing data as -1

i. Take the final data report of the patient as the input data for each patient, and fit the model. This implies that size of the training data is only 375 rows

ii. Augment the training data by adding relevant rows to the training data. Expectation is not to have as many rows as the rows in the datasheet given, but use some criteria to group rows together

b. Try to fill the missing data by typical methods: Mean, Most Co-related value, etc.

i. Take the final data report of the patient as the input data for each patient, and fit the model. This implies that size of the training data is only 375 rows

ii. Augment the training data by adding relevant rows to the training data. Expectation is not to have as many rows as the rows in the datasheet given, but use some criteria to group rows together

iii. Can you identify the most important features and use those features in model creation? How does that model’ performance metrics compare to the model consuming all 75 features?

2) Choose Accuracy as the performance measure

3) Identify the co-related features using multiple measures, and plot their dependencies to each other and to the target variable

a. Understand and Analyze the data

i. Identify from the data any dependencies among the features and their impact on the target variables

ii. Show multiple visualizations of the feature dependencies

4) Can we identify the most important features from the trained model?

5) Create ML model(s) for the outcome.

a. Can you try using an ensemble of models?

b. What are the correct hyper-parameters (for that algorithm)?

Expectation is that there will be 5 models created (1a.i, ii, and 1b.i, ii, iii) and each having accuracy as the performance measure. Please output the appropriate loss also

1	Test File	Download
2	Train File	Download
3	Sample Submission	Download

Accuracy of the model.

Code File * (only .ipynb file accepted)

Solution File * (only .csv file accepted)

Solution Description *

Do you want to show your code on leaderboard? *

Yes

S.No.	Description	Code File	Solution File

Rank	User	Points	Score
1	Mark	10	1000
2	Emma	8	900
3	Sophia	6	8000

Covid-19 Mortality Risk Prediction based on Patient Blood Reports

This Hackathon sponsored by Flip Robo Technologies aims to provide hands on experience to our learners, compete with their peers and achieve a higher credibility in the Analytics Space.

Problem Statement:

Data Available:

Task:

Company

Placement

Learner's Zone

Terms

Micro Degree Courses

Certificate Courses