Installation with Ascend NPU (x86)#

Required Environment#

CANN == 8.3.RC1

Prepare CANN#

Choose one of the following methods to use CANN:

  1. Install CANN according to the official documentation

  2. Download and use the CANN image

Install with uv or pip#

UV#

Recommend to use uv for faster and easier installation.

git clone https://github.com/ByteDance-Seed/VeOmni.git
cd VeOmni

# use the locked uv env
uv sync --locked  --extra npu
source .venv/bin/activate

You can use --extra to install other optional dependencies. Refer to pyproject.toml for more details.

# eg. install with video/audio processing dependencies (torchcodec, PyAV, librosa, soundfile)
# Note: `video` and `audio` extras are equivalent - both include video and audio processing
uv sync --locked  --extra npu --extra video
# or equivalently:
uv sync --locked  --extra npu --extra audio

Note: For video/audio processing with the video or audio extra, you also need to install ffmpeg separately:

# Ubuntu/Debian/openEuler
sudo apt-get install ffmpeg
# or
sudo yum install ffmpeg

Pip#

git clone https://github.com/ByteDance-Seed/VeOmni.git
cd VeOmni

pip install -e .[npu]
pip install transformers==5.2.0
pip install datasets==2.21.0

Set up CANN environment before installing torchcodec#

Make sure CANN_path is set to your CANN installation directory, e.g., export CANN_path=/usr/local/Ascend

source $CANN_path/ascend-toolkit/set_env.sh

To enable the NPU chunked cross-entropy loss, set model.ops_implementation.cross_entropy_loss_implementation: npu in your training YAML (replaces the legacy VEOMNI_ENABLE_CHUNK_LOSS environment variable).

Note: The NPU chunked cross-entropy backs both ForCausalLM and ForConditionalGeneration (VLMs) — chunk_loss now does the SP reduction itself, so VLMs with Ulysses SP enabled get the correct loss. Only ForSequenceClassification stays on the eager wrapper: chunk_loss hard-codes the causal labels[..., 1:] shift, which is incompatible with the token-level (no-shift) labels that ForSequenceClassificationLoss expects. A warning_rank0 is logged at install time; expect eager-level numbers for sequence-classification losses during profiling.

Video/Audio Processing Dependencies (Optional)#

For video/audio processing capabilities, you need to install torchcodec separately. Follow these steps:

# Clone the torchcodec repository
cd ..
git clone https://github.com/meta-pytorch/torchcodec.git
cd torchcodec

# Checkout to a specific version for compatibility
git checkout v0.5.0

# Copy the installation script to the torchcodec source directory
cp ../VeOmni/docs/get_started/installation/install_torchcodec_Ascend.sh .

# Note: Ensure Python is installed as a shared library (required for compiling C++ extensions)
# The installation script will automatically verify this requirement

# Run the installation script (replace with your actual CANN path)
bash install_torchcodec_Ascend.sh $CANN_path/ascend-toolkit/set_env.sh

# Verify installation
pip show torchcodec

# Test torchcodec import
python -c "from torchcodec.decoders import VideoDecoder; print('Success')"
# If the terminal outputs'Success', it indicates that the torchcodec installation was successful. If an error message is output, it indicates that the installation was not successful