Get Started with Ascend NPU#
Key Updates#
2025/12/23: VeOmni supports training on Ascend NPU.
Installation#
Please refer to Installation with Ascend NPU.
Supported Models#
Environment Variables#
CPU_AFFINITY_CONF#
export CPU_AFFINITY_CONF=1
Enable coarse-grained or fine-grained CPU core binding. This configuration helps prevent thread contention, improves cache hit rates, avoids memory access across different NUMA (Non-Uniform Memory Access) nodes, and reduces task scheduling overhead—collectively optimizing task execution efficiency. Parameter Settings:
0: Disable the binding function. Default is0.1: Enable coarse-grained kernel binding.2: Enable fine-grained kernel binding.
PYTORCH_NPU_ALLOC_CONF#
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
expandable_segments:<value>: Enable the memory pool extension segment feature.
True: This configuration instructs the cache allocator to create specific memory blocks with the capability to be extended later. This allows for more efficient handling of scenarios where the required memory size frequently changes during runtime.False: The memory pool extension segment feature is disabled, and the original memory allocation method is used. Default isFalse.
MULTI_STREAM_MEMORY_REUSE#
export MULTI_STREAM_MEMORY_REUSE=2
Refer to: https://github.com/ByteDance-Seed/VeOmni/issues/575
Declarations#
The Ascend support code, Dockerfile and image provided in the documentation are for reference only. If you intend to use them in a production environment, please contact the official channels. Thank you.