In this blog post, we introduce NAVER Cloud’s paper entitled “Peri-LN: Revisiting Normalization Layer in the Transformer Architecture,” which we presented at ICML 2025. Why was large-scale LLM training more unstable with V100 GPUs? V100 GPUs only supported FP16 (16-bit floating point) precision, which made large-scale model training inherently fragile. Even minor instabilities during training […]