Transformers v5 Notes#

This section documents VeOmni’s integration with HuggingFace transformers==5.2.0 (the only supported transformers version).

Included Notes#

Flash Attention custom-name handling: explains why _lazy_imports fails for VeOmni custom attention names and how the local hub-kernel loader adapter resolves it.
Patchgen workflow: explains the modeling code generation workflow used for every supported model and how to regenerate.
MoE weight loading: explains how VeOmni expects MoE expert weights to be laid out and documents qwen3_moe handling.
Testing a new model: SOP for adding test cases in test_models_patch.py and test_e2e_parallel.py when onboarding a new model.