SubNorm (Attention)
Apply layer normalization before projection.
PreNorm(
Attention(
dim=768,
plugins=[
SubNorm(dim=768)
],
),
dim=768,
)
This plugin implements Sub-LN for Foundation Transformers (opens in a new tab). Note that Sub-LN presumes Pre-LN rather than Post-LN