Gradient Sparse Autoencoders
11/15/2024
Introduces a novel approach to extracting neural network features by considering both activation values and their downstream effects.
Traditional sparse autoencoders focus only on activation magnitudes. Our approach incorporates gradient information to identify features that are not just active, but actually influential in the network's computations.
