Testing a New Model for Transformers v5#
When adding a new model with transformers>=5.0.0 support, two test suites need updating:
tests/models/test_models_patch.py— single-GPU forward/backward correctness across attention and MoE backendstests/e2e/test_e2e_parallel.py— multi-GPU e2e training with FSDP2, sequence parallelism (SP), and expert parallelism (EP)
Both files gate v5-only cases so they are skipped on v4 environments.
For VLM models, there is also a lightweight trainer-level smoke test for freeze_vit:
tests/models/test_vlm_trainer.py— builds a real toy VLM model on CPU and checks that vision parameters stay trainable whenfreeze_vit=Falseand are frozen whenfreeze_vit=True
1. tests/models/test_models_patch.py#
What it tests#
Runs one forward + backward step on dummy data for every combination of:
HF attention backends (
eager,flash_attention_2,flash_attention_3)VeOmni attention backends (
veomni_flash_attention_2_with_sp,veomni_flash_attention_3_with_sp)MoE backends (for MoE models:
eager,fused)
Then asserts that loss and grad norm match across all combinations within (rtol, atol).
How to add a case#
Add an entry to _TEST_CASES_TRANSFORMERS_V5:
_TEST_CASES_TRANSFORMERS_V5 = [
pytest.param(
"./tests/toy_config/qwen3_5_toy/config.json",
False, # is_moe
_DEFAULT_RTOL,
_DEFAULT_ATOL,
id="qwen3_5",
),
# ← add your new model here
pytest.param(
"./tests/toy_config/<new_model>_toy/config.json",
False, # is_moe — set True for MoE models
_DEFAULT_RTOL,
_DEFAULT_ATOL,
id="<new_model>",
),
]
The id= string is used as a key for:
Test node naming (
pytest -k <id>)Looking up custom weight sync functions in
weight_sync_adapters.py(only needed if the model has non-standard state dict keys)
Filtering unsupported modes#
If the model doesn’t support certain attention backends yet, add a filter block in test_models_patch_fwd_bwd keyed on case_id:
if case_id == "<new_model>":
hf_model_modes = [mode for mode in hf_model_modes if mode.attn_implementation != "flash_attention_3"]
veomni_model_modes = [
mode for mode in veomni_model_modes if mode.attn_implementation != "veomni_flash_attention_3_with_sp"
]
Toy config#
Create a minimal config under tests/toy_config/<new_model>_toy/config.json with few layers. Add a README.md under the same folder to indicate:
Where the original config is from
What changes are made from the original config
2. tests/e2e/test_e2e_parallel.py#
What it tests#
Launches full torchrun training runs (2 epochs, 2 steps) across parallel configurations (fsdp2 always enabled):
Parameter |
Values |
|---|---|
|
1, 2 |
|
1 (base models), 1×2 (MoE models) |
Each run produces a log_dict.json. The test asserts that loss and grad norm match across all SP/EP configurations within (rtol, atol).
How to add a case#
Add an entry to text_test_cases (for text-only models) with marks=_v5_only:
text_test_cases = [
# ... existing v4 cases ...
pytest.param(
"<new_model>",
"./tests/toy_config/<new_model>_toy/config.json",
False, # is_moe
_DEFAULT_RTOL,
_DEFAULT_ATOL,
None, # max_sp_size
marks=_v5_only,
),
]
Parametrize fields#
The text_test_cases parametrize string is:
"model_name, config_path, is_moe, rtol, atol, max_sp_size"
Field |
Type |
Description |
|---|---|---|
|
|
Used for directory naming and log output |
|
|
Path to toy config directory or |
|
|
If |
|
|
Tolerances for cross-config comparison |
|
|
|
Limiting sequence parallelism#
If the model does not support SP yet, set max_sp_size=1 to only run with sp_size=1:
pytest.param(
"qwen3_5",
"./tests/toy_config/qwen3_5_toy/config.json",
False, # is_moe
_DEFAULT_RTOL,
_DEFAULT_ATOL,
1, # max_sp_size — remove once SP is supported
marks=_v5_only,
),
VLM / multimodal models#
For vision-language or multimodal models, add to the appropriate test case list (qwen2vl_test_cases, qwen3vl_test_cases, etc.) and pair with the matching fixture and test function. The same max_sp_size field is available.
3. tests/models/test_vlm_trainer.py#
What it tests#
Builds a real toy VLM model and calls VLMTrainer._freeze_model_module() directly. The test only checks one behavior:
freeze_vit=False-> the vision tower parameters remain trainablefreeze_vit=True-> the vision tower parameters are frozen
This is intentionally simpler than an e2e training test. It is meant to catch model wrapper path changes such as model.visual vs model.model.visual.
How to add a case#
Add your toy config to the matching case list:
_FREEZE_VIT_VLM_CASES_TRANSFORMERS_V5 = [
pytest.param("./tests/toy_config/qwen3_5_toy/config.json", id="qwen3_5"),
pytest.param("./tests/toy_config/<new_vlm_model>_toy/config.json", id="<new_vlm_model>"),
]
For transformers v4 VLMs, add to _FREEZE_VIT_VLM_CASES_TRANSFORMERS_V4 instead.
Checklist#
When adding a new v5 model, verify:
[ ] Toy config created under
tests/toy_config/<model>_toy/[ ] Entry added to
_TEST_CASES_TRANSFORMERS_V5intest_models_patch.py[ ] Unsupported attention/MoE modes filtered in
test_models_patch_fwd_bwdif needed[ ] Entry added to
text_test_cases(or VLM equivalent) intest_e2e_parallel.pywithmarks=_v5_only[ ] For VLM models, toy config added to
_FREEZE_VIT_VLM_CASES_TRANSFORMERS_V5intests/models/test_vlm_trainer.py[ ]
max_sp_sizeset appropriately (use1if SP not supported,Noneotherwise)[ ]
pytest --collect-only -k <model>shows expected test cases[ ] Tests pass:
pytest tests/models/test_models_patch.py -k <model>andpytest tests/e2e/test_e2e_parallel.py -k <model>[ ] For VLM models,
pytest tests/models/test_vlm_trainer.py -k <model>passes