Transformers v5 Notes

Contents

Transformers v5 Notes#

This section documents VeOmni’s integration with HuggingFace transformers==5.2.0 (the only supported transformers version).

Included Notes#

  • Flash Attention custom-name handling: explains why _lazy_imports fails for VeOmni custom attention names and how the local hub-kernel loader adapter resolves it.

  • Patchgen workflow: explains the modeling code generation workflow used for every supported model and how to regenerate.

  • MoE weight loading: explains how VeOmni expects MoE expert weights to be laid out and documents qwen3_moe handling.

  • Testing a new model: SOP for adding test cases in test_models_patch.py and test_e2e_parallel.py when onboarding a new model.