Testing a New Model#

When adding a new model under veomni/models/transformers/<model>/, two test suites need updating:

  1. tests/models/test_models_patch.py — single-GPU forward/backward correctness across attention and MoE backends.

  2. tests/e2e/test_e2e_parallel.py — multi-GPU e2e training with FSDP2, sequence parallelism (SP), and expert parallelism (EP).

For VLM models, there is also a lightweight trainer-level smoke test for freeze_vit:

  1. tests/models/test_vlm_trainer.py — builds a real toy VLM model on CPU and checks that vision parameters stay trainable when freeze_vit=False and are frozen when freeze_vit=True.

For VLM / Omni models, an additional multi-GPU test covers the FSDP2 asymmetric-forward path:

  1. tests/distributed/test_dummy_forward.py — verifies rank-asymmetric multimodal batches (rank 0 has images/video/audio, others text-only) do not hang NCCL under FSDP2, exercising the model’s dummy_forward() hook.

1. tests/models/test_models_patch.py#

What it tests#

Runs one forward + backward step on dummy data for every combination of:

  • HF attention backends (eager, flash_attention_2, flash_attention_3)

  • VeOmni attention backends (veomni_flash_attention_2_with_sp, veomni_flash_attention_3_with_sp)

  • MoE backends (for MoE models: eager, fused)

Then asserts that loss and grad norm match across all combinations within (rtol, atol).

How to add a case#

Append a pytest.param(...) to the test-cases list:

pytest.param(
    "./tests/toy_config/<new_model>_toy/config.json",
    False,  # is_moe — set True for MoE models
    _DEFAULT_RTOL,
    _DEFAULT_ATOL,
    id="<new_model>",
),

The id= string is used as a key for:

  • Test node naming (pytest -k <id>)

  • Looking up custom weight sync functions in weight_sync_adapters.py (only needed if the model has non-standard state dict keys)

Filtering unsupported modes#

If the model doesn’t support certain attention backends yet, add a filter block in test_models_patch_fwd_bwd keyed on case_id:

if case_id == "<new_model>":
    hf_model_modes = [mode for mode in hf_model_modes if mode.attn_implementation != "flash_attention_3"]
    veomni_model_modes = [
        mode for mode in veomni_model_modes if mode.attn_implementation != "veomni_flash_attention_3_with_sp"
    ]

Toy config#

Create a minimal config under tests/toy_config/<new_model>_toy/config.json with few layers. Add a README.md under the same folder to indicate:

  1. Where the original config is from

  2. What changes are made from the original config

2. tests/e2e/test_e2e_parallel.py#

What it tests#

Launches full torchrun training runs (2 epochs, 2 steps) across parallel configurations (FSDP2 always enabled):

Parameter

Values

sp_size

1, 2

ep_size

1 (base models), 1×2 (MoE models)

Each run produces a log_dict.json. The test asserts that loss and grad norm match across all SP/EP configurations within (rtol, atol).

How to add a case#

Add an entry to text_test_cases (for text-only models):

pytest.param(
    "<new_model>",
    "./tests/toy_config/<new_model>_toy/config.json",
    False,  # is_moe
    _DEFAULT_RTOL,
    _DEFAULT_ATOL,
    None,  # max_sp_size
),

Parametrize fields#

The text_test_cases parametrize string is:

"model_name, config_path, is_moe, rtol, atol, max_sp_size"

Field

Type

Description

model_name

str

Used for directory naming and log output

config_path

str

Path to toy config directory or config.json

is_moe

bool

If True, also iterates over ep_size values

rtol, atol

float

Tolerances for cross-config comparison

max_sp_size

int | None

None = no limit (run sp=1,2). Set to 1 to skip sp=2 if SP is not yet supported

Limiting sequence parallelism#

If the model does not support SP yet, set max_sp_size=1 to only run with sp_size=1:

pytest.param(
    "qwen3_5",
    "./tests/toy_config/qwen3_5_toy/config.json",
    False,  # is_moe
    _DEFAULT_RTOL,
    _DEFAULT_ATOL,
    1,  # max_sp_size — remove once SP is supported
),

VLM / multimodal models#

For vision-language or multimodal models, add to the appropriate test case list (qwen2vl_test_cases, qwen3vl_test_cases, etc.) and pair with the matching fixture and test function. The same max_sp_size field is available.

3. tests/models/test_vlm_trainer.py#

What it tests#

Builds a real toy VLM model and calls VLMTrainer._freeze_model_module() directly. The test only checks one behavior:

  • freeze_vit=False -> the vision tower parameters remain trainable

  • freeze_vit=True -> the vision tower parameters are frozen

This is intentionally simpler than an e2e training test. It is meant to catch model wrapper path changes such as model.visual vs model.model.visual.

How to add a case#

Add your toy config to the freeze-ViT VLM cases list:

pytest.param("./tests/toy_config/<new_vlm_model>_toy/config.json", id="<new_vlm_model>"),

4. tests/distributed/test_dummy_forward.py (VLM / Omni only)#

What it tests#

Launches a 2-GPU torchrun job under FSDP2. Rank 0 receives a multimodal batch (images + video, and audio for Omni); other ranks receive text-only. Both ranks must complete forward + backward without an NCCL timeout, which proves that the model’s dummy_forward() hook (or equivalent) fires on text-only ranks so every rank participates in FSDP collectives.

Any model whose patchgen-generated modeling overrides <M>VisionTransformerPretrainedModel.dummy_forward (or similar) must add a case here — this suite is the only coverage for that override on multi-GPU.

How to add a case#

Append an entry to _vlm_cases (VLM) or _omni_cases (Omni):

pytest.param(
    "<new_vlm_model>",
    "./tests/toy_config/<new_vlm_model>_toy",
    partial(_vlm_batch, patch_size=<P>),
    id="<new_vlm_model>",
),

patch_size must match the model’s vision patch size (e.g. 14 for Qwen2.5-VL, 16 for Qwen3-VL). Run with pytest tests/distributed/test_dummy_forward.py -k <new_vlm_model> on a 2-GPU host.

Checklist#

When adding a new model, verify:

  • [ ] Toy config created under tests/toy_config/<model>_toy/

  • [ ] Entry added to the test-cases list in test_models_patch.py

  • [ ] Unsupported attention/MoE modes filtered in test_models_patch_fwd_bwd if needed

  • [ ] Entry added to text_test_cases (or VLM equivalent) in test_e2e_parallel.py

  • [ ] For VLM models, toy config added to the freeze-ViT VLM cases list in tests/models/test_vlm_trainer.py

  • [ ] max_sp_size set appropriately (use 1 if SP not supported, None otherwise)

  • [ ] pytest --collect-only -k <model> shows expected test cases

  • [ ] Tests pass: pytest tests/models/test_models_patch.py -k <model> and pytest tests/e2e/test_e2e_parallel.py -k <model>

  • [ ] For VLM models, pytest tests/models/test_vlm_trainer.py -k <model> passes

  • [ ] For VLM / Omni models, an entry added to _vlm_cases / _omni_cases in tests/distributed/test_dummy_forward.py (required on any model that overrides dummy_forward)

  • [ ] pytest tests/distributed/test_dummy_forward.py -k <model> passes on 2 GPUs