Installation with Ascend NPU (x86)#

Required Environment#

CANN == 8.3.RC1

Prepare CANN#

Choose one of the following methods to use CANN:

Install CANN according to the official documentation
Download and use the CANN image

Install with uv or pip#

UV#

Recommend to use uv for faster and easier installation.

git clone https://github.com/ByteDance-Seed/VeOmni.git
cd VeOmni

# use the locked uv env
uv sync --locked  --extra npu
source .venv/bin/activate

You can use --extra to install other optional dependencies. Refer to pyproject.toml for more details.

# eg. install with video/audio processing dependencies (torchcodec, PyAV, librosa, soundfile)
# Note: `video` and `audio` extras are equivalent - both include video and audio processing
uv sync --locked  --extra npu --extra video
# or equivalently:
uv sync --locked  --extra npu --extra audio

Note: For video/audio processing with the video or audio extra, you also need to install ffmpeg separately:
# Ubuntu/Debian/openEuler
sudo apt-get install ffmpeg
# or
sudo yum install ffmpeg

Pip#

git clone https://github.com/ByteDance-Seed/VeOmni.git
cd VeOmni

pip install -e .[npu]
pip install transformers==5.2.0
pip install datasets==2.21.0

Set up CANN environment before installing torchcodec#

Make sure CANN_path is set to your CANN installation directory, e.g., export CANN_path=/usr/local/Ascend

source $CANN_path/ascend-toolkit/set_env.sh

To enable the NPU chunked cross-entropy loss, set model.ops_implementation.cross_entropy_loss_implementation: npu in your training YAML (replaces the legacy VEOMNI_ENABLE_CHUNK_LOSS environment variable).

Note: The NPU chunked cross-entropy backs both ForCausalLM and ForConditionalGeneration (VLMs) — chunk_loss now does the SP reduction itself, so VLMs with Ulysses SP enabled get the correct loss. Only ForSequenceClassification stays on the eager wrapper: chunk_loss hard-codes the causal labels[..., 1:] shift, which is incompatible with the token-level (no-shift) labels that ForSequenceClassificationLoss expects. A warning_rank0 is logged at install time; expect eager-level numbers for sequence-classification losses during profiling.

Video/Audio Processing Dependencies (Optional)#

For video/audio processing capabilities, you need to install torchcodec separately. Follow these steps:

# Clone the torchcodec repository
cd ..
git clone https://github.com/meta-pytorch/torchcodec.git
cd torchcodec

# Checkout to a specific version for compatibility
git checkout v0.5.0

# Copy the installation script to the torchcodec source directory
cp ../VeOmni/docs/get_started/installation/install_torchcodec_Ascend.sh .

# Note: Ensure Python is installed as a shared library (required for compiling C++ extensions)
# The installation script will automatically verify this requirement

# Run the installation script (replace with your actual CANN path)
bash install_torchcodec_Ascend.sh $CANN_path/ascend-toolkit/set_env.sh

# Verify installation
pip show torchcodec

# Test torchcodec import
python -c "from torchcodec.decoders import VideoDecoder; print('Success')"
# If the terminal outputs'Success', it indicates that the torchcodec installation was successful. If an error message is output, it indicates that the installation was not successful

Installation with Ascend NPU (x86)

Contents