Superficial Drive

LLM Deep Dive 4: Normalization (LayerNorm vs RMSNorm) in MiniMind

Nov 3, 2025

Why transformers normalize, LayerNorm vs RMSNorm internals, pre-norm gradient highways, and BatchNorm side notes—all in a lean, hack-ready walkthrough.

LLM Deep Dive 3: Rotary Positional Embedding with MiniMind

Oct 30, 2025

LLM LLM MiniMind RoPE

Engineering dissection of Rotary Positional Embedding (RoPE) mechanics, scaling hacks, and MiniMind implementation details

LLM Deep Dive 2: Positional Embedding

Oct 26, 2025

LLM LLM positional embedding transformers

Dissecting sinusoidal positional embedding in Transformers model

LLM Deep Dive 1: Tokenizer Engineering with MiniMind

Oct 23, 2025

LLM LLM MiniMind tokenizer BPE

Dissecting BPE tokenization — the critical first layer between human text and neural networks

From zero to insight.

LLM (4)

LLM positional embedding transformers MiniMind tokenizer BPE LayerNorm RMSNorm RoPE

LLM Deep Dive 4: Normalization (LayerNorm vs RMSNorm) in MiniMind LLM Deep Dive 3: Rotary Positional Embedding with MiniMind LLM Deep Dive 2: Positional Embedding LLM Deep Dive 1: Tokenizer Engineering with MiniMind

Sitemap