Abstract
The Forward-Forward (FF) algorithm offers a biologically plausible alternative to backpropagation,
enabling neural networks to learn through local updates. However, FF's efficacy relies heavily on the
definition of "goodness", which is a scalar measure of neural activity. While current implementations
predominantly utilize a simple sum-of-squares metric, it remains unclear if this default choice is
optimal. To address this, we benchmarked 21 distinct goodness functions across four standard image
datasets (MNIST, FashionMNIST, CIFAR-10, STL-10), evaluating classification accuracy, energy
consumption, and carbon footprint. We found that certain alternative goodness functions inspired from
various domains significantly outperform the standard baseline. Specifically,
game_theoretic_local achieved 97.15% accuracy on MNIST,
softmax_energy_margin_local reached 82.84% on FashionMNIST, and
triplet_margin_local attained 37.69% on STL-10. Furthermore, we observed substantial
variability in computational efficiency, highlighting a critical trade-off between predictive
performance and environmental cost. These findings demonstrate that the goodness function is a pivotal
hyperparameter in FF design.
Methodology
In this study, we systematically benchmark 21 distinct goodness functions for the Forward-Forward (FF) algorithm. The FF algorithm replaces the global backward pass of backpropagation with two local forward passes, maximizing "goodness" for positive data and minimizing it for negative data.
The Forward-Forward Algorithm
For each layer, the objective is to have high "goodness" for positive data (real samples) and low "goodness" for negative data (corrupted or generated samples). The local loss function is defined as:
L = log(1 + exp(-(G(y_pos) - θ))) + log(1 + exp(G(y_neg) - θ))
where G(y) is the goodness function, and θ is a threshold.
Goodness Functions Evaluated
We categorize the 21 goodness functions into five groups:
- Baseline: Sum of Squares.
- Distance and Energy-Based: L2 Normalized Energy, Huber Norm, Triplet Margin, Tempered Energy, Outlier Trimmed Energy, Softmax Energy Margin.
- Biologically Inspired: Hebbian, Oja's Rule, BCM Theory.
- Information Theoretic: InfoNCE, Predictive Coding, NT-Xent.
- Statistical and Other Approaches: Decorrelation, Game Theoretic, Fractal Dimension, Whitened Energy, PCA Energy, Gaussian Energy, Sparse L1, Attention Weighted.
Figure: Overview of the Forward-Forward algorithm and the diverse goodness functions evaluated in this study.
Results
We evaluated the performance of the 21 goodness functions across four datasets: MNIST, FashionMNIST, CIFAR-10, and STL-10. We report Classification Accuracy (linear classifier on frozen embeddings), Multi-pass Accuracy (native FF inference), and Environmental Impact (Emissions and Energy).
1. MNIST
On MNIST, game_theoretic_local achieved the highest Multi-pass Accuracy of
98.17%. The baseline sum_of_squares was the most energy-efficient.
| Goodness Function | Class. Acc. | Multi-pass Acc. | Emissions (g CO2) |
|---|---|---|---|
| attention_weighted_local | 0.9737 | 0.9803 | 13.14 |
| bcm_local | 0.0986 | 0.0979 | 12.56 |
| decorrelation_local | 0.9738 | 0.9795 | 12.84 |
| fractal_dimension_local | 0.9676 | 0.9803 | 12.62 |
| game_theoretic_local | 0.9715 | 0.9817 | 12.78 |
| gaussian_energy_local | 0.9690 | 0.9805 | 13.11 |
| hebbian_local | 0.9690 | 0.9805 | 13.44 |
| huber_norm_local | 0.9696 | 0.9815 | 12.93 |
| info_nce_local | 0.9564 | 0.9799 | 13.47 |
| l2_normalized_energy_local | 0.9690 | 0.9805 | 13.04 |
| nt_xent_local | 0.9564 | 0.9799 | 12.92 |
| oja_local | 0.9056 | 0.9740 | 13.55 |
| outlier_trimmed_energy_local | 0.3645 | 0.4055 | 13.20 |
| pca_energy_local | 0.9690 | 0.9805 | 13.52 |
| predictive_coding_local | 0.9788 | 0.9803 | 13.19 |
| softmax_energy_margin_local | 0.9568 | 0.9791 | 13.60 |
| sparse_l1_local | 0.9719 | 0.9811 | 13.64 |
| sum_of_squares (baseline) | 0.9690 | 0.9805 | 12.32 |
| tempered_energy_local | 0.9690 | 0.9805 | 13.27 |
| triplet_margin_local | 0.9750 | 0.9806 | 13.28 |
| whitened_energy_local | 0.9690 | 0.9805 | 13.06 |
2. FashionMNIST
For FashionMNIST, softmax_energy_margin_local achieved the highest Multi-pass Accuracy of
86.32%. whitened_energy_local was the most efficient.
| Goodness Function | Class. Acc. | Multi-pass Acc. | Emissions (g CO2) |
|---|---|---|---|
| attention_weighted_local | 0.8338 | 0.8594 | 12.52 |
| bcm_local | 0.1023 | 0.1000 | 12.91 |
| decorrelation_local | 0.8243 | 0.8556 | 12.76 |
| fractal_dimension_local | 0.8449 | 0.8540 | 12.56 |
| game_theoretic_local | 0.8323 | 0.8586 | 12.92 |
| gaussian_energy_local | 0.8246 | 0.8487 | 12.71 |
| hebbian_local | 0.8246 | 0.8487 | 13.33 |
| huber_norm_local | 0.8341 | 0.8573 | 12.49 |
| info_nce_local | 0.7860 | 0.8471 | 12.69 |
| l2_normalized_energy_local | 0.8246 | 0.8487 | 12.90 |
| nt_xent_local | 0.7860 | 0.8471 | 12.55 |
| oja_local | 0.1337 | 0.8059 | 13.31 |
| outlier_trimmed_energy_local | 0.3155 | 0.1678 | 12.85 |
| pca_energy_local | 0.8246 | 0.8487 | 13.39 |
| predictive_coding_local | 0.8539 | 0.8625 | 12.91 |
| softmax_energy_margin_local | 0.8284 | 0.8632 | 12.97 |
| sparse_l1_local | 0.8209 | 0.8536 | 13.17 |
| sum_of_squares (baseline) | 0.8246 | 0.8487 | 13.12 |
| tempered_energy_local | 0.8246 | 0.8487 | 13.27 |
| triplet_margin_local | 0.7887 | 0.8585 | 13.03 |
| whitened_energy_local | 0.8246 | 0.8487 | 11.91 |
3. CIFAR-10
On CIFAR-10, sparse_l1_local achieved the highest Multi-pass Accuracy of
43.82% and was also the most efficient, highlighting the importance of sparsity.
| Goodness Function | Class. Acc. | Multi-pass Acc. | Emissions (g CO2) |
|---|---|---|---|
| attention_weighted_local | 0.2173 | 0.3980 | 14.66 |
| bcm_local | 0.2608 | 0.3521 | 14.51 |
| decorrelation_local | 0.2309 | 0.4146 | 14.44 |
| fractal_dimension_local | 0.2617 | 0.4235 | 14.50 |
| game_theoretic_local | 0.2347 | 0.4305 | 14.31 |
| gaussian_energy_local | 0.2857 | 0.3959 | 13.91 |
| hebbian_local | 0.2857 | 0.3959 | 14.63 |
| huber_norm_local | 0.2363 | 0.3976 | 14.52 |
| info_nce_local | 0.2560 | 0.3867 | 15.22 |
| l2_normalized_energy_local | 0.2857 | 0.3959 | 14.57 |
| nt_xent_local | 0.2560 | 0.3867 | 14.29 |
| oja_local | 0.1753 | 0.1618 | 14.50 |
| outlier_trimmed_energy_local | 0.3747 | 0.1000 | 14.41 |
| pca_energy_local | 0.2857 | 0.3959 | 14.17 |
| predictive_coding_local | 0.4452 | 0.4342 | 14.29 |
| softmax_energy_margin_local | 0.2523 | 0.3869 | 14.16 |
| sparse_l1_local | 0.2733 | 0.4382 | 14.02 |
| sum_of_squares (baseline) | 0.2857 | 0.3959 | 14.99 |
| tempered_energy_local | 0.2857 | 0.3959 | 14.71 |
| triplet_margin_local | 0.2516 | 0.4101 | 15.62 |
| whitened_energy_local | 0.2857 | 0.3959 | 15.11 |
4. STL-10
For STL-10, triplet_margin_local achieved the highest Multi-pass Accuracy of
37.72%, suggesting that explicit separation is crucial for data-scarce tasks.
| Goodness Function | Class. Acc. | Multi-pass Acc. | Emissions (g CO2) |
|---|---|---|---|
| attention_weighted_local | 0.3647 | 0.3647 | 6.45 |
| bcm_local | 0.2132 | 0.1447 | 6.18 |
| decorrelation_local | 0.3649 | 0.3689 | 6.46 |
| fractal_dimension_local | 0.3554 | 0.3614 | 6.33 |
| game_theoretic_local | 0.3770 | 0.3561 | 6.72 |
| gaussian_energy_local | 0.3655 | 0.3664 | 6.75 |
| hebbian_local | 0.3655 | 0.3664 | 6.56 |
| huber_norm_local | 0.3699 | 0.3479 | 6.68 |
| info_nce_local | 0.3494 | 0.3354 | 6.61 |
| l2_normalized_energy_local | 0.3655 | 0.3664 | 6.44 |
| nt_xent_local | 0.3494 | 0.3354 | 6.33 |
| oja_local | 0.3632 | 0.1096 | 6.41 |
| outlier_trimmed_energy_local | 0.2764 | 0.1004 | 6.77 |
| pca_energy_local | 0.3655 | 0.3664 | 6.56 |
| predictive_coding_local | 0.4152 | 0.3607 | 6.73 |
| softmax_energy_margin_local | 0.3475 | 0.3231 | 6.66 |
| sparse_l1_local | 0.3581 | 0.3657 | 6.44 |
| sum_of_squares (baseline) | 0.3655 | 0.3664 | 10.16 |
| tempered_energy_local | 0.3655 | 0.3664 | 6.22 |
| triplet_margin_local | 0.3769 | 0.3772 | 6.46 |
| whitened_energy_local | 0.3655 | 0.3664 | 6.55 |
Classification Loss Plots
MNIST Classification Loss
FashionMNIST Classification Loss
CIFAR-10 Classification Loss
STL-10 Classification Loss
Accuracy Per Layer
MNIST Accuracy Per Layer
FashionMNIST Accuracy Per Layer
CIFAR-10 Accuracy Per Layer
STL-10 Accuracy Per Layer
Multi-pass Accuracy
MNIST Multi-pass Accuracy
FashionMNIST Multi-pass Accuracy
CIFAR-10 Multi-pass Accuracy
STL-10 Multi-pass Accuracy
BibTeX
@misc{shah2025searchgoodness,
title={In Search of Goodness: Large Scale Benchmarking of Goodness Functions for the Forward-Forward Algorithm},
author={Arya Shah and Vaibhav Tripathi},
year={2025},
eprint={placeholder},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={placeholder},
}