How do I know how much this input changes with output? Looking at the gradient is good, but it doesn’t handle saddle point (and that’s why we added momentum to SGD). Numerical method has its problem too. This paper uses path integration.
Two Axioms
- Sensitivity (a): If output is different with only one feature, that feature should be given a non-zero attribution.
- Implementation Invariance: If given the same input, two model always produce the same output, the attributions are always identical.
While the axiom are pretty cool, I doubt that we need that implementation invariance. Is that really important if the input features are correlated with each other?
Integrated Gradients
It’s just the straight line form of the path methods:
In this case,
It also have property of completeness and others:
I’m lazy but it follows some other axioms. Not so important though, as it’s just a fancy line integral to overcome gradient’s problem at saddle points.