The Azure Big Compute team has announced the launch of 1.0.0 of Batch Shipyard, which enables simple deployment of Dockerized workloads to Azure Batch compute pools and allows users to run parallel jobs in the cloud. The solution is optimized for parametric sweeps, Deep Learning with NVIDIA GPUs, and simulations using MPI and InfiniBand, and can be used to run containerized jobs on thousands of machines.
Batch Shipyard combines features of Azure Batch (i.e. the ability to handle large-scale and complex VM deployment and management, high throughput, highly available job scheduling, and auto-scaling pay) with those of Docker containers. This allows for deployment consistency and isolation of batch-style and HPC containerized workloads at any scale without requiring developing directly to the Azure Batch SDK.
The initial version of Batch Shipyard features the following:
- Automated Docker Host Engine installation tuned for Azure Batch compute nodes
- Automated deployment of required Docker images to compute nodes
- Accelerated Docker image deployment at scale to compute pools consisting of a large number of VMs via private peer-to-peer distribution of Docker images among the compute nodes
- Automated Docker Private Registry instance creation on compute nodes with Docker images backed to Azure Storage if specified
- Automatic shared data volume support for Azure File Docker Volume Driver installation and GlusterFS distributed network file system installation
- Seamless integration with Azure Batch job, task and file concepts along with full pass-through of the Azure Batch API to containers executed on compute nodes
- Support for Azure Batch task dependencies allowing complex processing pipelines and graphs with Docker containers
- Transparent support for GPU accelerated Docker applications on Azure N-Series VM instances
- Support for multi-instance tasks to accommodate Dockerized MPI and multi-node cluster applications on compute pools with automatic job cleanup
- Transparent assist for running Docker containers utilizing Infiniband/RDMA for MPI on HPC low-latency Azure VM instances
- Automatic setup of SSH tunneling to Docker Hosts on compute nodes if specified
Azure has also released a directory of recipes on Github that enable Deep Learning, Computational Fluid Dynamics, Molecular Dynamics, and Video Processing on Batch Shipyard, along with sample batch-style Docker workloads.