Phase 1 Data Collection
• Describe the opportunities and challenges for utilization of clinical data.
• Apply the framework for conceptualizing data usage in healthcare.
• Select the correct answer with font size 14 with an explanation in 2 or 3 sentences
Q 1
Part 1.
How are images commonly represented when given to a deep learning model?
As a 1-dimensional vector, where each number is a hand-picked feature
As layers of number grids, where each number is pixel intensity
As a sequence of 1-dimensional vectors, where each number is pixel intensity
As a 1-dimensional vector, where each number is pixel intensity
Part 2.
What deep neural network architecture is most commonly used for image classification?
Recurrent Neural Network (RNN)
Multi-layer Perceptron (MLP)
Generative Adversarial Network (GAN)
Convolutional Neural Network (CNN)
Part 3.
What is the kind of question being answered via the COVID detector?
Multi-label classification
Binary classification
Sequence-to-sequence translation
Linear regression
Part 4.
You are interested in further leveraging hospital resources in order to boost the performance of your COVID detector. Which of the following actions would improve the likelihood of a high performing model? Check all that apply.
Giving the machine learning team segmentation labels for a small subset of the COVID chest x-ray dataset.
Giving the machine learning team access to an existing COVID detector.
Giving the machine learning team a large dataset of chest x-rays, even if they do not originate from COVID-positive patients.
Giving the machine learning team the text reports associated with each of the COVID chest x-ray examinations.
Q 2
Part 1.
How can we represent a patient’s electronic health record, a form of structured data, to a machine learning model?
As layers of number grids, where each number is pixel intensity
As a 1-dimensional vector, where each number is a hand-picked feature
As a 1-dimensional vector, where each number is pixel intensity
As a sequence of 1-dimensional vectors, where each number is pixel intensity
Part 2.
Given the type of data available, which of the following are reasonable alternative framings of task at hand, from a machine learning perspective? Check all that apply.
A regression model that predicts the patient’s date of death.
A model that predicts the number of days before a patient requires invasive mechanical ventilation. This model would be trained only on patients who required invasive mechanical ventilation.
A binary classification model that predicts whether or not the patient will require hospitalization.
A model that predicts what range of days it will take for a patient to require invasive mechanical ventilation. The 4 categories include: [“0-4 days”, “5-9 days”, “10-14 days”, “14+ days or will not need one”]
Part 3.
Given that we are training a model to predict whether or not the patient requires invasive mechanical ventilation, which of these values should NOT be passed into the model as a feature? Check all that apply.
Ferritin
Invasive mechanical ventilation date
Patient birth date
D-DIMER
White Blood Cell count
Ventilator setting
Patient inpatient arrival date
Part 4.
Imagine the path that the patient data took through the healthcare system. What are some possible errors that might have gotten introduced to the data before it was published? Check all that apply.
The patient was a recent transfer from another system
The patient comes to ED and gets immediately intubated, thus no labs are provided
Labs are logged AFTER the invasive mechanical ventilation
The patient had been to the hospital multiple times