Why transformers normalize, LayerNorm vs RMSNorm internals, pre-norm gradient highways, and BatchNorm side notes—all in a lean, hack-ready walkthrough.
Engineering dissection of Rotary Positional Embedding (RoPE) mechanics, scaling hacks, and MiniMind implementation details
Dissecting sinusoidal positional embedding in Transformers model
From zero to insight.
Copyright 2025
Sitemap