VeOmni Flash Attention Custom Name Adapter (Transformers 5.x)#

Problem Background#

VeOmni uses custom attention implementation names:

veomni_flash_attention_2_with_sp
veomni_flash_attention_3_with_sp
veomni_flash_attention_4_with_sp

These names are registered into ALL_ATTENTION_FUNCTIONS and routed to VeOmni’s SP-aware attention wrapper.

With Transformers 5.x, model init and flash-attention preload logic may still call transformers.modeling_flash_attention_utils._lazy_imports(...) for the configured implementation string. For non-native names, _lazy_imports falls back to hub-kernel loading and can fail with:

ValueError: Could not find the currently requested flash attention implementation at veomni_flash_attention_2_with_sp

even though VeOmni already registered the custom attention function.

Why This Happens#

The failure path is:

Model config keeps VeOmni custom name in _attn_implementation.
Transformers flash preload code tries to resolve low-level flash kernels from the implementation string.
Custom VeOmni names are not hub kernel identifiers.
Hub fallback returns no valid kernel entry for this name.
_lazy_imports raises before normal ALL_ATTENTION_FUNCTIONS dispatch takes effect.

Adapter Strategy Implemented#

Instead of patching _lazy_imports directly, VeOmni patches:

transformers.integrations.hub_kernels.load_and_register_attn_kernel

and intercepts VeOmni custom names only. This compatibility adapter is applied only when transformers>=5.0.0.

For VeOmni names, the adapter returns a local kernel-like object exposing:

flash_attn_func
flash_attn_varlen_func

mapped to local FA2/FA3/FA4 backends:

veomni_flash_attention_2_with_sp -> flash_attn.flash_attn_func / flash_attn.flash_attn_varlen_func
veomni_flash_attention_3_with_sp -> flash_attn_interface.flash_attn_func / flash_attn_interface.flash_attn_varlen_func
veomni_flash_attention_4_with_sp -> flash_attn.cute.flash_attn_func / flash_attn.cute.flash_attn_varlen_func

For simplicity, paged VeOmni aliases (for example paged|veomni_flash_attention_2_with_sp) are not handled by this adapter.

All non-VeOmni implementations are delegated to the original Transformers loader unchanged.

Design Goals#

Keep VeOmni custom implementation names unchanged.
Keep existing VeOmni ALL_ATTENTION_FUNCTIONS.register(...) behavior unchanged.
Avoid hub-kernel lookup for VeOmni private names.
Minimize patch surface by touching a single integration point.
Fail fast with clear ImportError when required FA backend is missing.

Expected Runtime Behavior#

After import veomni:

VeOmni custom names remain registered in ALL_ATTENTION_FUNCTIONS.
_lazy_imports("veomni_flash_attention_2_with_sp") and _lazy_imports("veomni_flash_attention_4_with_sp") can resolve through the adapter.
No spurious “kernel hub name not found” error for VeOmni custom names.
Paged VeOmni aliases are outside the adapter scope.

Notes#

This adapter is a compatibility bridge for Transformers 5.x behavior around flash preload.
It does not change VeOmni SP attention semantics.
It does not require the kernels Python package for VeOmni custom names.
FA2 and FA3 have dedicated branches in _lazy_imports (both v4 and v5) and are resolved directly without reaching the hub-kernel path. The adapter is therefore a no-op for those two in practice, but is kept for safety.
FA4 (veomni_flash_attention_4_with_sp) has no such branch in _lazy_imports and always falls through to the hub-kernel path in Transformers v5. The adapter is the critical component that makes FA4 usable on v5.
On Transformers v4, FA4 is supported via the VeOmni SP variant (veomni_flash_attention_4_with_sp). Instead of the string name, VeOmni passes a SimpleNamespace object (from _load_veomni_local_flash_kernel) directly to _lazy_imports, which v4 accepts in its kernels-fallback branch via getattr(). The bare flash_attention_4 name still requires Transformers v5; for Transformers v4, use attn_implementation="veomni_flash_attention_4_with_sp".
FA4 requires the flash-attn-cute package (flash_attn.cute). To install FA4:
- Transformers v5: uv sync --extra gpu --extra fa4 --extra transformers5-exp --no-group transformers-stable
- Transformers v4: uv sync --extra gpu --extra fa4

VeOmni Flash Attention Custom Name Adapter (Transformers 5.x)

Contents