Ascend A2 Docker Image Build and Usage Guide#
Overview#
This guide provides step-by-step instructions for building and using the Ascend A2 Docker image for VeOmni framework. The image is based on Huawei’s Ascend CANN platform and includes all necessary dependencies for running multi-modal models on Ascend A2 accelerators.
Note: Ascend A2 supports both x86 and ARM64 architectures, with different environment management approaches for each:
x86 Architecture: Uses
uvfor dependency management, requires virtual environment activationARM64 Architecture: Uses
pipfor dependency management, no virtual environment required
Prerequisites#
Docker installed on your system
Access to Ascend A2 hardware accelerators
Network access to pull the base image and install dependencies
Proxy configuration (if required in your environment)
Step 1: Pull the Base Image#
First, pull the Huawei Ascend CANN base image. This image supports both x86 and ARM64 architectures.
You can find the latest official Ascend CANN images at: Ascend Hub
# for arm
docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:9.0.0-910b-ubuntu22.04-py3.11
# for x86
docker pull --platform=amd64 swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:9.0.0-910b-ubuntu22.04-py3.11
Step 2: Build the Custom Image#
Build the VeOmni Ascend A2 image using the appropriate Dockerfile for your architecture.
Note: Proxy settings are optional and only needed if your server requires proxy access to the internet. Remove the proxy arguments if not needed.
For x86 Architecture#
# Optional proxy settings (remove if not needed)
docker build \
--build-arg http_proxy=http://<user>:<pass>@<host>:<port> \
--build-arg https_proxy=http://<user>:<pass>@<host>:<port> \
--build-arg no_proxy=localhost,127.0.0.1 \
-t ascend-a2-env:v1 \
-f docker/ascend/Dockerfile.ascend_9.0.0_a2.x86 \
.
For ARM64 Architecture#
# Optional proxy settings (remove if not needed)
docker build \
--build-arg http_proxy=http://<user>:<pass>@<host>:<port> \
--build-arg https_proxy=http://<user>:<pass>@<host>:<port> \
--build-arg no_proxy=localhost,127.0.0.1 \
-t ascend-a2-env:v1 \
-f docker/ascend/Dockerfile.ascend_9.0.0_a2.arm \
.
Without proxy (simplified)#
For x86:
docker build \
-t ascend-a2-env:v1 \
-f docker/ascend/Dockerfile.ascend_9.0.0_a2.x86 \
.
For ARM64:
docker build \
-t ascend-a2-env:v1 \
-f docker/ascend/Dockerfile.ascend_9.0.0_a2.arm \
.
Image Components#
The built image includes:
Ubuntu 22.04 with Python 3.11
Ascend CANN 9.0.0 runtime
VeOmni framework with NPU support
TorchCodec for efficient video processing
All necessary development tools and dependencies
Step 3: Run the Container#
Basic Container Start#
Start the container with Ascend device access. The example below uses a wildcard to include all Ascend cards, but you can also specify individual devices if needed:
docker run --runtime=runc -it \
--ulimit nproc=65535 \
--ulimit nofile=65535 \
--device=/dev/davinci* \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64:ro \
-v /usr/local/Ascend/driver/tools:/usr/local/Ascend/driver/tools:ro \
-v /usr/local/Ascend/add-ons:/usr/local/Ascend/add-ons:ro \
--name ascend-a2-container \
ascend-a2-env:v1 \
/bin/bash
Advanced Configuration Options#
You can enhance the basic command with the following optional configurations:
Add more Ascend devices by either listing them individually or using a wildcard to include all cards matching the naming pattern:
# Option 1: List individual devices --device=/dev/davinci1 \ --device=/dev/davinci2 \ # Option 2: Use wildcard to include all davinci devices --device=/dev/davinci* \
Increase shared memory (recommended for larger models):
--shm-size=64G \
Add proxy environment variables (if needed):
-e http_proxy="http://<user>:<pass>@<host>:<port>" \ -e https_proxy="http://<user>:<pass>@<host>:<port>" \ -e no_proxy="localhost,127.0.0.1,.huawei.com" \
Mount checkpoints (example):
-v /path/to/your/checkpoints:/app/ckpt/:ro \
Mount datasets (example):
-v /path/to/your/dataset.json:/app/dataset/dataset.json:ro \ -v /path/to/your/images:/app/dataset/images:ro \
Example: Complete Advanced Command#
Here’s an example combining all these options:
docker run --runtime=runc -it \
--ulimit nproc=65535 \
--ulimit nofile=65535 \
--device=/dev/davinci* \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
--shm-size=64G \
-v /usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64:ro \
-v /usr/local/Ascend/driver/tools:/usr/local/Ascend/driver/tools:ro \
-v /usr/local/Ascend/add-ons:/usr/local/Ascend/add-ons:ro \
-v /path/to/your/checkpoints:/app/ckpt/:ro \
-v /path/to/your/dataset:/app/dataset/:ro \
--name ascend-a2-container \
ascend-a2-env:v1 \
/bin/bash
Step 4: Run Training Inside the Container#
Environment Activation (x86 Architecture Only)#
For x86 architecture, you need to activate the virtual environment created by uv before running training commands:
# Activate the virtual environment (x86 only)
source /app/.venv/bin/activate
Training Command Example#
After starting the container with appropriate mounts (and activating the environment for x86), you can run training commands. Here’s an example for Qwen3-VL training using generic paths:
bash train.sh tasks/train_vlm.py configs/multimodal/qwen3_vl/qwen3_vl_dense.yaml \
--model.model_path /app/ckpt/your-model-checkpoint \
--data.train_path /app/dataset/your-dataset.json \
--data.datasets_type iterable \
--data.source_name sharegpt4v_sft \
--data.max_seq_len 1024 \
--train.global_batch_size 8
Note: Replace /app/ckpt/your-model-checkpoint and /app/dataset/your-dataset.json with the actual paths you used in your mount configuration.
Step 5: Stop and Remove the Container#
When you’re done, stop and remove the container:
docker stop ascend-a2-container && docker rm ascend-a2-container
Important Notes#
Architecture Differences#
Feature |
x86 Architecture |
ARM64 Architecture |
|---|---|---|
Dependency Manager |
uv |
pip |
Virtual Environment |
Required |
Not Required |
Activation Command |
|
N/A |
Installation Command |
|
|
Device Access#
The container requires access to all Ascend devices for proper functionality. The --device flags in the run command grant access to these devices.
Mounts#
Driver directories: Required for Ascend runtime functionality
Checkpoints: Mount pre-trained models to
/app/ckpt/Datasets: Mount training data to appropriate locations
Shared memory: Increase
--shm-sizefor larger models or datasets
Proxy Settings#
Update the proxy settings in both the build and run commands to match your environment. Remove the proxy arguments if not needed.
Dockerfile Details#
x86 Dockerfile (Dockerfile.ascend_9.0.0_a2)#
Sets up the Ubuntu 22.04 base with Ascend CANN
Configures system dependencies and development tools
Installs and configures
uvfor dependency managementUses
uvto install VeOmni framework with NPU supportSets up the virtual environment
ARM64 Dockerfile (Dockerfile.ascend_9.0.0_a2.arm)#
Sets up the Ubuntu 22.04 base with Ascend CANN
Configures system dependencies and development tools
Uses
pipto install VeOmni framework with NPU supportClones and builds TorchCodec for video processing
Sets up the working environment
Troubleshooting#
Device Access Issues#
Ensure you have the correct permissions to access Ascend devices
Verify the device paths exist on your host system
Check that the Ascend driver is properly installed on the host
Proxy Problems#
Verify proxy credentials and addresses are correct
Ensure the proxy allows access to required domains
Try removing proxy settings if running in an internal network
Build Failures#
Check network connectivity for pulling dependencies
Ensure sufficient disk space is available
Review the full build log for specific error messages
Environment Activation Issues (x86 Only)#
If you encounter “command not found” errors, ensure you’ve activated the virtual environment with
source /app/.venv/bin/activateVerify the virtual environment directory exists at
/app/.venv/
Support#
For additional help, please refer to:
VeOmni documentation
Ascend CANN documentation
Docker documentation for container management