machinelearning
[R] Unraveling the Mysteries: Why is AdamW Often Superior to Adam+L2 in Practice?

Hello, ML enthusiasts! 🚀🤖 We analyzed rotational equilibria in our latest work, **ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS** 💡 **Our Findings:** Balanced average rotational updates (effective learning rate) across all network components may play a key role in the effectiveness of AdamW. 🔗 [ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS](https://arxiv.org/abs/2305.17212) Looking forward to hearing your thoughts! Let’s discuss more about this fascinating topic together!

19
3