SubNorm (Attention)

Apply layer normalization before projection.

PreNorm(
    Attention(
        dim=768,
        plugins=[
            SubNorm(dim=768)
        ],
    ),
    dim=768,
)

This plugin implements Sub-LN for Foundation Transformers (opens in a new tab). Note that Sub-LN presumes Pre-LN rather than Post-LN