Guidelines and Best Practices for Multi-stream Virtual Machine Backup

book

Article ID: 100060290

calendar_today

Updated On:

Description

Description

 

This guide provides guidelines for effective use of the multi-stream Virtual Machine backup feature and discusses the best practices to derive optimum performance. Infrastructure recommendations, configuration best practices and tools and techniques for troubleshooting bottlenecks are discussed. 

 

Key Factors that Impact Performance

Below is a generic representation of the backup infrastructure with Backup Exec components, various types of storage and network elements and connectivity between them. This could aid in better understanding of the factors that affect data flow efficiency and common causes of bottlenecks described below. 



 

  • Available I/O Capacity: The performance gets impacted by bottlenecks anywhere in the end-to-end Read-Transfer-Write chain. 

                  IOPS of source & target storage. 

                  Available network capacity between source & target storage. 

  • Significant difference in the I/O capacity between source & target storage, typically based on the type of storage combinations used, i.e. SSD, HDD, and NAS. 
  • The number of concurrent streams configured and the number of  backup jobs running in parallel on the same target storage. 
  • Datastore locations of source disks. Multiple disks of virtual machine configured on the same datastore or spread across different datastore. 
  • Size of data being backed-up across streams. There could be minimal or no gains when the size of data to backup across streams is less. 

 

Resource Recommendations

As the structure of networks and infrastructure configurations are diverse, below can be considered as baseline recommendations for multi-stream VM backup. 

  • A target storage (B2D or Dedupe) configured on a dedicated physical disk. 
  • Storage with a write throughput of at least 200 MB/s for a 2-stream backup. A throughput of 125 MB/s could be considered for each additional stream when there are no bottlenecks in the backup chain i.e. in the source storage and network. 
  • A target storage with less throughput could still be considered if the cumulative transfer rate of data from all source storage matches or is below the transfer rate of target storage. 
  • In the case of network attached storage (Fiber Channel, iSCSI), a dedicated low latency network with adequate bandwidth to meet the data transfer requirement needs to be provisioned. 

 

Identifying I/O Capacity of Backup Chain

As the performance of multi-stream VM backup is largely dependent on the available I/O capacity of disk storage and of a network that is specific to the target environment, it is recommended to try sample backup runs with different values of the stream-count to determine the suitable configuration that provides optimal backup performance. 

The cumulative throughput rate for all data-streams in the Job Log could be compared to the throughput of a single-stream backup to identify gain or loss in performance. This also helps in identifying the available end-to-end I/O capacity and planning a suitable concurrency model for the environment. In addition, as described in the sections below, performance statistics in the Job Log can help identify the source of bottleneck in the backup chain. 

 

Popular tools like iPerf for network performance measurement and fio, diskspd for storage performance measurement can also be used to measure the IO capacity. 

 

Backup Concurrency Configuration

The minimum and maximum number of streams configured for the job determines the number of disks that are backed-up simultaneously. In addition, the number of 'Write Sessions' configured for the target storage also controls the number of concurrent streams that get created across jobs. Also, it is important to note that each multi-stream job utilizes one instance of Write Session for a control stream that backups metadata of the VM and this needs to be considered while configuring device concurrency. It is essential to consider the available IO capacity of the backup infrastructure and use these parameters to limit the number of streams to balance the workload and avoid performance bottlenecks.  

 

A simple approach to arriving at the right stream-count that could provide maximum performance and avoiding any overheads is to measure the IO capacity in the backup chain and balance it out with right concurrency. i.e. Available capacity of the target and the network should at least match the cumulative throughput created by all streams at the source. Below are few example configurations and results of performance validation that provide more insights into considerations for configuring concurrency. 

 

Example 1. High-Capacity Single Source and Target Storage 

Single Source Storage, 3 Virtual Disks, 550GB Backup Size 

Source IO Capacity (SSD) : 750 MB/s; Target IO Capacity (SSD) : 750 MB/s 

1-Stream Throughput : 326 MB/s; 2-Stream Throughput : 581 MB/s; 3-Stream Throughput : 700 MB/s 

 

Analysis : Significant performance gains were observed until the 3rd stream. Post which performance may degrade due to lack of capacity. 

 

Example 2. Dual Low-Capacity Sources and Higher-Capacity Target  

Dual Source Storage, 6 Virtual Disks, 750GB Backup Size 

Source IO Capacity (Fiber Channel) : 150 MB/s, 60 MB/s; Target IO Capacity (HDD) : 350 MB/s 

1-Stream Throughput : 132 MB/s; 3-Stream Throughput : 187 MB/s; 6-Stream Throughput : 195 MB/s 

 

Analysis : Stream count configuration beyond 3 didn’t result in any performance gain as the source storage is low in capacity. Distributing the 6 virtual disks to more than 2 source storage may improve the throughput significantly. 

 

Example 3. Dual Higher-Capacity Sources and Low-Capacity Target 

Dual Source Storage, 2 Virtual Disks, 1200GB Backup Size 

Source IO Capacity (Fiber Channel) : 500 MB/s, 500 MB/s; Target IO Capacity (HDD) : 550 MB/s 

1-Stream Throughput : 496 MB/s; 2-Stream Throughput : 506 MB/s 

 

Analysis : As the cumulative IO capacity of the source storage is much higher than the target storage, even a count of 2-streams didn’t result in any performance gain. A higher IO capacity target storage needs to be configured to gain performance. 

 

Example 4. Dual Higher-Capacity Sources and Very Low-Capacity Target 

Dual Source Storage, 2 Virtual Disks, 300GB Backup Size 

Source IO Capacity (HDD) : 350 MB/s, 350 MB/s; Target IO Capacity (Fiber Channel) : 150 MB/s 

1-Stream Throughput : 135 MB/s; 2-Stream Throughput : 144 MB/s 

 

Analysis : There is a clear mismatch of IO capacity with very slow target storage and multi-stream backup will likely result in degraded backup performance in such configurations. 

 

Example 5. Dual Very Low-Capacity Sources and Higher-Capacity Target 

Dual Source Storage, 4 Virtual Disks, 600GB Backup Size 

Source IO Capacity (HDD) : 65 MB/s, 65 MB/s; Target IO Capacity (Fiber Channel) : 350 MB/s 

1-Stream Throughput : 61 MB/s; 4-Stream Throughput : 211 MB/s 

 

Analysis : Although throughput is less, such a configuration provides significant performance gains due to higher target storage IO capacity compared to source. 

 

Example 6. Dual Higher-Capacity Sources and Lower-Capacity Target 

Dual Source Storage, 6 Virtual Disks, 750GB Backup Size 

Source IO Capacity (HDD) : 350 MB/s, 350 MB/s; Target IO Capacity (Fiber Channel) : 650 MB/s 

1-Stream Throughput : 346 MB/s; 2-Stream Throughput : 585 MB/s; 3-Stream Throughput : 619 MB/s 

 

Analysis : Performance gains in this configuration could be observed until the stream count of 3, beyond which limitation of target could result in degradation in throughput. 

 

Configuration Guidelines for Performance

  • Multi-stream backup could add significant overhead on source storage if multiple disks configured for the VM reside on the same datastore. To avoid any bottleneck with multiple concurrent reads, especially with slower disks, it is generally advisable to distribute disks across different data stores. 
     
  • As multi-stream backup multiplies data movement to the target storage, it is recommended that faster disks or disks with higher throughput that meet the target throughput requirements be configured. 
     
  • Where multiple VMs are being backed-up in a single Job, it is advisable to reconsider mapping of VMs across different jobs, keeping in view the datastore locations of disks prior to enabling multi-stream for such jobs. Further, different target storage could be configured across jobs scheduled to run in parallel to load balance and get optimum performance from multi-stream backup. 
     
  • As each additional stream configured adds significant overhead to the read and write IO, it is essential to consider the load on source and target storage during backup and ensure that the concurrency level being configured reflects the IOPS of the storage devices. 
     
  • To optimize multi-stream backup performance for VMware workloads, it is recommended to use advanced transport modes that provide the fastest read performance, like HotAdd and SAN where possible. 

 

Identifying Performance Bottlenecks

Below are the scenarios where a bottleneck with multi-stream backups could most likely be observed. 

  • Target storage is significantly slower compared to source storage. For ex. a HDD with low IOPS is used as a target while the source storage is configured on faster disks like SSD or Fibre Channel. 
  • Source storage is configured on significantly slower disks compared to target storage that is configured on faster SSD or Fiber Channel. 
  • The number of streams configured for the job does not reflect the available IO capacity of either source or target storage throughput. 
  • The available network capacity does not match the IO capacity needed based on the number of streams configured and the I/O capacity of source and target storage.

The job log provides statistics on the performance of a multi-stream backup, i.e. the efficiency of data flow from source to target, which helps in identifying bottlenecks during the data transfer process.
Four metrics are described beloware captured for each disk backed-up across all streams that help assess backup efficiency. 

 

  • Read Time: Higher read times compared to single-stream backup indicate a bottleneck with source storage I/O. 
  • Read Buffer Wait Time: Higher wait times indicate a bottleneck with target storage I/O and potentially network transport to target storage. But comparison of single-stream throughput can help identify the actual bottleneck.  
  • Write Time: A higher value of write time indicates a bottleneck with target storage I/O. 
  • Write Buffer Wait Time: A higher value of wait time indicates a bottleneck on the source storage I/O and potentially network transport from source storage. 

 

Additional benchmarking can be performed with performance measurement tools like iperf, fio, diskspd, perfmon. Network and storage vendor tools can also be used to pinpoint bottlenecks in the backup chain. 

Issue/Introduction

Guidelines and Best Practices for Multi-stream Virtual Machine Backup

Additional Information

ETrack: 4131650