🧠Chan’s Curiosity Log — Nov 03, 2025
Published:
Daily reflections on new papers, theories, and open questions.
đź§© Paper 1 (Worth Reading in Detail): Optimal learning of a multi-layer perceptron near interpolation
1.1 Background
Until now, quantitative theories of neural networks that predict which features they can extract, how much data (n) they require, and how well they generalize have often relied on over-simplified assumptions about architecture or data abundance.
1.2 Questions
This limitation prevented precise characterization of how depth and nonlinearity jointly influence learning when networks are trained with enough data to fully express their representational power.
Main questions the paper addresses:
- Q1: Assuming training data are generated by a target MLP, how much data are needed to reach a given generalization performance using a Bayes-optimal MLP with the same architecture?
- Q2: Given a fixed data budget, which target features can the MLP learn?
- Q3: Given finite compute and data, can practical algorithms reach the Bayes-optimal solution, or are they limited by statistical–computational gaps?
1.3 Key Idea
The authors use the replica method to study supervised learning of a multi-layer perceptron in the Bayesian optimal teacher–student framework, under two key conditions:
- Network width scales with input dimension — promoting feature learning while avoiding the degeneracy of ultra-wide networks.
- The system lies in the interpolation regime, where data size and parameter count are comparable, forcing the model to adapt precisely to the task.
1.4 Key Conclusions & Results
- A rich phenomenology of learning transitions emerges.
- With sufficient data, optimal performance is achieved via specialization toward the target — yet training algorithms may get trapped in sub-optimal states predicted by theory.
- Specialization proceeds inhomogeneously across layers (from shallow to deep) and across neurons within each layer.
- Deeper targets are systematically harder to learn.
1.5 Why It Is Interesting
This is the first statistical-physics analysis of deep MLPs via replicas — technically impressive though analytically intricate.
While the final results are algebraically heavy, the conceptual progress is substantial.
I should study their derivation carefully and explore how similar analyses could be applied to other questions in learning theory.
1.6 Questions Worth Exploring
- Can the derivations be simplified using renormalization-group (RG) ideas?
- How do the learning transitions correspond to phase transitions in the order-parameter landscape?
- Could one define a “critical surface” separating efficient and inefficient learning regimes?
đź§ Paper 2: Bayesian continual learning and forgetting in neural networks
2.1 Background
A long-standing challenge in artificial networks is catastrophic forgetting — the tendency to lose previously learned information when new tasks are introduced.
Biological synapses, in contrast, maintain a remarkable balance between stability and plasticity, retaining memories while remaining adaptable.
2.2 Question
Why can the brain manage this equilibrium effortlessly, while AI models struggle?
2.3 Key Idea
The paper advances the hypothesis that biological synapses operate under Bayesian principles, maintaining uncertainty estimates (error bars) over synaptic weights.
They propose Metaplasticity from Synaptic Uncertainty (MESU) — a continual-learning method inspired by this idea, where metaplasticity allows synapses to flexibly adjust while preserving long-term information.
2.4 Key Conclusions & Results
- Across image-classification benchmarks, MESU mitigates forgetting while maintaining adaptability.
- On task-incremental CIFAR-100, MESU consistently outperforms standard training methods.
- The boundary-free streaming formulation enables efficient continual updates without catastrophic interference.
2.5 Why It Is Interesting
This approach resonates closely with my earlier work on variational continual learning
(Phys. Rev. E 108, 014309 (2023)).
I’m curious to see exactly how their formulation extends or refines that probabilistic framework.
2.6 Questions Worth Exploring
- How does MESU’s Bayesian update connect to non-equilibrium learning dynamics?
- Could a probability-distribution-level training protocol (as in our future work) generalize this idea?
- Are there analogies between synaptic uncertainty and fluctuation–dissipation mechanisms in physics-based learning?
🧩 Overall reflection: both papers highlight how probabilistic and statistical-physics perspectives are converging to explain deep learning — from phase transitions in MLPs to Bayesian plasticity in continual learning.
