Categories
Blockchain
Get started
Ecommerce & Digital Marketing
Cloud Computing & DevOps
Get started
Undergraduate Course
Mentor Giri Job Boosters DTGyan Refer & Earn Register Today

This Hackathon sponsored by Flip Robo Technologies aims to provide hands on experience to our learners, compete with their peers and achieve a higher credibility in the Analytics Space.

 

FAQs

1.       Which platform I need to use?

Ans. You need to use Python.

2.       Is there any prize or reward?

Ans. There’s no prize or reward.

Problem Statement:

Predict the risk of mortality of a patient (due to corona Virus) based on his blood report, given the dataset of patient hospitalization records. Please ensure that appropriate features and data-rows are chosen. Please choose appropriate hyper-parameters for the model and justify the hyper-parameter based on the performance measure chosen. Once you train the model, there is a separate test set also available. The prediction should yield high performance measure in the test set. Justify your model based on the performance measure chosen, and also list the most important features arrived by training the model. You can choose a variety of ML algorithms

Data Available:

We have some real time data of about available for 375 + 100 patients.  These patients took multiple blood tests and these results have been recorded with time. The patients that survived have an outcome of 0 and patients who succumbed have an outcome of 1. I have also supplied the test data set.

Task:

1)      Write a model to predict the mortality likelihood of the patient based on the data given. This needs to be done by following methods:

a.       Do not fill any missing data. Substitute all the missing data as -1

                                                               i.      Take the final data report of the patient as the input data for each patient, and fit the model. This implies that size of the training data is only 375 rows

                                                             ii.      Augment the training data by adding relevant rows to the training data. Expectation is not to have as many rows as the rows in the datasheet given, but use some criteria to group rows together

b.       Try to fill the missing data by typical methods: Mean, Most Co-related value, etc.

                                                               i.      Take the final data report of the patient as the input data for each patient, and fit the model. This implies that size of the training data is only 375 rows

                                                             ii.      Augment the training data by adding relevant rows to the training data. Expectation is not to have as many rows as the rows in the datasheet given, but use some criteria to group rows together

                                                           iii.      Can you identify the most important features and use those features in model creation? How does that model’ performance metrics compare to the model consuming all 75 features?

 

2)      Choose Accuracy as the performance measure

3)      Identify the co-related features using multiple measures, and plot their dependencies to each other and to the target variable

a.       Understand and Analyze the data

                                                               i.      Identify from the data any dependencies among the features and their impact on the target variables

                                                             ii.      Show multiple visualizations of the feature dependencies

4)      Can we identify the most important features from the trained model?

5)      Create ML model(s) for the outcome.

a.       Can you try using an ensemble of models?

b.       What are the correct hyper-parameters (for that algorithm)?

Expectation is that there will be 5 models created (1a.i, ii, and 1b.i, ii, iii) and each having accuracy as the performance measure. Please output the appropriate loss also

1 Test File Download
2 Train File Download
3 Sample Submission Download

Accuracy of the model.


S.No. Description Code File Solution File
Rank User Points Score
1 Mark 10 1000
2 Emma 8 900
3 Sophia 6 8000