Estimating Example Difficulty using Variance of Gradients

Chirag Agarwal, Daniel D'souza, Sara Hooker

How do we identify examples that are challenging for a model to classify?

In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying challenging examples helps inform safe deployment of models, isolates examples that require further human inspection, and provides interpretability into model behavior. We start with a simple hypothesis – examples that a model has difficulty learning will exhibit higher variance in gradient updates over the course of training. On the other hand, we expect the backpropagated gradients of the samples that are relatively easier to learn will have lower variance.

In this work, we propose Variance of Gradients (VOG) as a valuable and efficient proxy metric for detecting outliers in the data distribution. We provide quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Data points with high VOG scores are far more difficult for the model to learn and over-index on corrupted or memorized examples

abstract_pre abstract_post
During the early training stages, Lower VoG appears to capture a color(red) bias. During Late Stage Training, Lower VoG images evidence uncluttered backgrounds. High VoG images present unusual vantage points throughout training.

VOG offers an efficient method to rank the global difficulty of examples and automatically surface a possible subset to aid human interpretability. VOG can be computed using checkpoints stored over the course of training and is model agnostic. Alternatively, VOG can be computed using the predicted label, which makes it an unsupervised auditing tool at test time.

The primary contributions of our work can be summarized as follows:

1. We propose Variance of Gradient (VOG) – a class-normalized variance gradient score for determining the relative ease of learning data samples within a given class
2. We show that VOG is an effective auditing tool for ranking high dimensional datasets by difficulty
3. VOG identifies clusters of images with clearly distinct semantic properties
4. VOG effectively surfaces OOD and memorized examples.

Auditing Datasets

VoG can be an effective tool to audit high-dimensional datasets. Below, we plot images from late-stage training for randomly selected classes from CIFAR-10 and CIFAR-100. We observe consistent results in differences between Low VoG and High VoG samples across both datasets.


Click below.
Low VoG exemplars: Images that are assigned low VoG scores tend to have clear, distinct images of the class in question. We also observe different color bias patterns emerge during early training for certain classes.
High VoG exemplars: Images that are assigned high VoG scores usually contain unusual vantage points and/or have obstructed views of the class.

CIFAR-10
abstract_1
CAR
abstract_1
FROG
abstract_1
BIRD
atypical_2
CAR
atypical_2
FROG
atypical_
BIRD
abstract_1
PLANE
abstract_1
TRUCK
abstract_1
HORSE
atypical_2
PLANE
atypical_2
TRUCK
atypical_
HORSE

CIFAR-100
abstract_1
BED
abstract_1
BICYCLE
abstract_1
LEOPARD
atypical_2
BED
atypical_2
BICYCLE
atypical_
LEOPARD
abstract_1
SPIDER
abstract_1
SUNFLOWER
abstract_1
TURTLE
atypical_2
SPIDER
atypical_2
SUNFLOWER
atypical_
TURTLE

Learn More

Pre-computed VoG scores for MNIST/CIFAR-10/CIFAR-100 are available here.

We welcome additional discussion and code contributions on the topic of this work. A comprehensive introduction of the methodology, experiment framework and results can be found in our paper and open source code.

Citation

If you use this software, please consider citing:

@article{agarwal2020estimating,
title={Estimating Example Difficulty using Variance of Gradients},
author={Agarwal, Chirag and D'souza, Daniel and Hooker, Sara},
journal={arXiv preprint arXiv:2008.11600},
year={2020}
}

}