Paper Summary
Objective is to learn from limited labeled data in a medical image analysis setting, that are robust to distribution shifts. They use SimCLR not SimCLRv2.
They study the effectiveness of self-supervised learning as a pre-training strategy in medical image classification. They perform comparison between supervised pre-training and self-supervised pre-training approaches.
Self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images significantly improves the accuracy of medical image classifiers.
Big self-supervised models are robust to distribution shift and can learn efficiently with a small number of labeled medical images.
MICLe - uses multiple images of the underlying pathology per patient case to construct more informative positive pairs for self-supervised learning.
We observe that self-supervised pretraining outperforms supervised pretraining, even when the full ImageNet dataset (14M images and 21.8K classes) is used for supervised pretraining. We attribute this finding to the domain shift and discrepancy between the nature of recognition tasks in ImageNet and medical image classification. Self-supervised approaches bridge this domain gap by leveraging in-domain medical data for pretraining.
Multi-Instance Contrastive Learning (MICLe)
strategy that helps adapt contrastive learning to multiple images of the underlying pathology per patient case. Such multi-instance data is often available in medical imaging datasets – e.g., frontal and lateral views of mammograms, retinal fundus images from each eye, etc.
Given multiple images of a given patient case, they construct a positive pair for self-supervised contrastive learning by drawing two crops from two distinct images of the same patient case. These images are typically from different viewing angles and show different body parts with the same underlying pathology. This enables self-supervised learning algorithms to learn representations that are robust to changes of viewpoint, imaging conditions, and other factors in a direct way.
MICLe does not require class label information.
Self-supervised pretraining on unlabeled ImageNet using SimCLR
Additional self-supervised pretraining using unlabeled medical images. If multiple images of each medical condition are available, a novel Multi-Instance Contrastive Learning (MICLe) is used to construct more informative positive pairs based on different images.
Supervised fine-tuning on labeled medical images. Note that unlike step (1), steps (2) and (3) are task and dataset specific.
First, they perform self-supervised pretraining on unlabeled images using contrastive learning to learn visual representations.
For contrastive learning they use a combination of unlabeled ImageNet dataset and task specific medical images. Then, if multiple images of each medical condition are available the Multi-Instance Contrastive Learning (MICLe) is used for additional self-supervised pretraining.
X = {x₁, x₂, ..., x_M}
has images from the same patient (i.e., same pathology) captured from different views, where M
can vary across bags.M = |X| ≥ 2
), positive pairs are constructed by drawing two crops from two randomly selected images in the bag.Following SimCLR, two fully connected layers are used to map the output of ResNets to a 128-dimensional embedding, which is used for contrastive learning.
SimCLR pre-training is performed on unlabeled dermatology/chest X-ray samples both with and without initialization from pre-trained ImageNet self-supervised weights.
Dermatology - same augmentation as SimCLR.
Unlike the original set of proposed augmentation in SimCLR, they do not use the Gaussian blur, because they deduce that it makes it impossible to distinguish local texture variations and other areas of interest thereby changing the underlying disease interpretation the X-ray image.
Augmentations that lead to the best performance on the validation set for this task are random cropping, random color jittering (strength = 0.5), rotation (upto 45 degrees) and horizontal flipping.
The primary metrics for the dermatology task are top-1 accuracy and Area Under the Curve (AUC)
For the chest X-ray task, given the multi-label setup, they report mean AUC averaged between the predictions for the five target pathologies.
They note that when only using ImageNet for self-supervised pretraining, the model performs worse in this setting compared to using in-domain data for pretraining.
They show that self-supervised models are robust and generalize better than baselines, when subjected to shifted test sets, without fine-tuning.
Here are some more articles you might like to read next: