系统备份和高可用容灾架构白皮书

合集下载

构建高可用性系统架构的容灾与备份策略

构建高可用性系统架构的容灾与备份策略在当前互联网时代，信息技术的发展日新月异。

为了确保系统的高可用性，保障业务的正常运行，构建有效的容灾与备份策略是非常重要的。

本文将探讨高可用性系统架构中容灾与备份策略的相关原则和应用。

一、容灾策略的原则和应用1. 理解容灾的概念与必要性容灾（Disaster Recovery，简称DR）是指在遇到系统故障、自然灾害、恶意攻击等导致服务中断的情况下，通过预先规划和部署备用方案，以最小的损失和最短的恢复时间将系统从灾难中恢复并继续提供服务。

容灾的必要性在于降低业务中断的风险、减少损失并提高系统的可靠性。

2. 容灾策略的主要原则(1) 多样化的地理位置：在构建容灾方案时，应将备份设施、服务器等部署在不同的地理位置，避免单点故障的影响。

(2) 冗余与备份：通过冗余和备份技术，使得系统在故障情况下能够快速切换到备用设备，确保业务的连续性。

(3) 自动化和监控：应利用自动化的工具和监控系统，实现故障的自动检测、预警和快速恢复，减少人工干预的依赖。

(4) 持续测试与演练：定期进行容灾演练，测试备用设备的可用性和恢复性能，以确保备用方案的有效性。

3. 容灾策略的具体应用(1) 网络容灾：通过配置冗余的网络设备和多个互联网接入点，实现网络连接的高可用性。

(2) 数据容灾：通过数据冗余和备份技术，将数据实时同步到备用设备，以确保数据的可用性和完整性。

(3) 应用容灾：构建高可用的应用架构，通过负载均衡、多个实例的部署以及自动故障切换等手段，实现应用的连续性和可用性。

(4) 电源容灾：通过使用不同电源供应路径、备用电源以及自动切换设备，保障系统在电力故障的情况下继续运行。

二、备份策略的原则和应用1. 理解备份的概念与必要性备份是指将系统或数据的副本保存在独立的存储介质中，以防止数据丢失、系统故障等情况。

备份的目的在于提供额外的数据保护层，以便在需要时能够恢复数据。

2. 备份策略的主要原则(1) 差异化备份：采用增量备份和差异备份的方式，避免全量备份的频繁性，减少备份所需的存储空间和时间。

容灾-异地容灾备份-容灾一体机-数据安全备份-数据库双活网关

在计算机技术高速发展的今天，存储作为计算机的一个重要功能在不断发展着。

然而在传统的计算资源与存储资源分离的体系架构下，光靠越来越快、核数越来越多的CPU 是不够的，瓶颈在于传统存储的硬盘读写太慢了，计算主机上大部分的CPU计算能力都空闲或者说在等待存储数据传输过来，这种不匹配的架构已经不能满足企业IT数据中心高速发展和变化的需求。

扩展困难：传统存储SAN/NAS通过添加新的存储柜扩容升级（Scale Up)，但是这种方法并不能带来线性的性能提升。

存储访问性能并不能随着虚拟机数据量增加而线性增加，致使存储访问性能最终成为数据中心性能和容量的瓶颈。

性能瓶颈虚拟化将多个业务系统打包成独立的虚拟机同时运行，众多虚拟机的同时运行使得整个存储系统基本都是随机I/O读写，现有存储通常采用SATA/SAS机械磁盘实现，无法应对大量并发随机读写请求。

服务质量保证问题虚拟化数据中心中运行着大量不同的应用，这些应用通常对应不同的服务等级。

现有存储为20年前的设计，并没有考虑虚拟化环境中的应用负载，利用现有存储很难为不同的应用负载制定存储性能策略，以适应不同的应用负载。

管理复杂IT管理员不仅需要同时管理计算和存储两套系统，还往往需要面对专有化设备的配置，非常繁琐。

传统网络存储架构SAN/NAS起初是为静态负载场景设计，对于动态变化的负载，其管理运维就会变得相对复杂。

硬件专有化，成本高在未来，企业用户在构建信息系统时最关注的应该是存储系统的设计与配备。

主流存储厂商使用自行设计的专有芯片去优化I/O路径，如利用专有芯片做数据压缩，去冗等。

这些专有化硬件带来的高额研发和生产成本必然会提升存储系统的总体拥有成本。

华为FusionCloud桌面云-系统高可用性技术白皮书

保留一切权利。

非经本公司书面许可，任何单位和个人不得擅自摘抄、复制本文档内容的部分或全部，并不得以任何形式传播。

商标声明和其他华为商标均为华为技术有限公司的商标。

本文档提及的其他所有商标或注册商标，由各自的所有人拥有。

注意您购买的产品、服务或特性等应受华为公司商业合同和条款的约束，本文档中描述的全部或部分产品、服务或特性可能不在您的购买或使用范围之内。

除非合同另有约定，华为公司对本文档内容不做任何明示或默示的声明或保证。

由于产品版本升级或其他原因，本文档内容会不定期进行更新。

除非另有约定，本文档仅作为使用指导，本文档中的所有陈述、信息和建议不构成任何明示或暗示的担保。

华为技术有限公司地址：深圳市龙岗区坂田华为总部办公楼邮编：518129网址：客户服务邮箱：support@客户服务电话：4008302118目录1 华为桌面云解决方案............................................................................................................. - 5 -2 系统可用性指标..................................................................................................................... - 5 -3 系统软硬件可靠性................................................................................................................. - 6 -3.1 机柜..................................................................................................................................................... - 6 -3.2 服务器 ................................................................................................................................................. - 7 -3.2.1 内存可靠性................................................................................................................................. - 7 -3.2.2 硬盘可靠性................................................................................................................................. - 7 -3.2.3 支持磁盘在线定时故障检测和预警........................................................................................... - 7 -3.2.4 电源可靠性................................................................................................................................. - 8 -3.2.5 系统监控..................................................................................................................................... - 8 -3.2.6 板载软件可靠性 ......................................................................................................................... - 8 -3.3 存储设备 ............................................................................................................................................. - 8 -3.4 网络设备 ............................................................................................................................................. - 9 -3.4.1 网卡负荷分担............................................................................................................................. - 9 -3.4.2 交换机堆叠............................................................................................................................... - 10 -3.4.3 交换机互连冗余 ....................................................................................................................... - 10 -3.4.4 虚拟路由冗余保护 ................................................................................................................... - 11 -3.4.5 网络分平面通信 ....................................................................................................................... - 11 -3.5 云平台软件........................................................................................................................................ - 11 -3.5.1 管理节点HA ............................................................................................................................ - 11 -3.5.2 管理节点数据备份 ................................................................................................................... - 12 -3.5.3 虚拟机备份............................................................................................................................... - 13 -3.5.4 虚拟机HA ................................................................................................................................ - 13 -3.5.5 虚拟机故障检测和处理............................................................................................................ - 14 -3.5.6 虚拟机热迁移........................................................................................................................... - 15 -3.5.7 存储迁移................................................................................................................................... - 16 -3.5.8 虚拟机负载均衡 ....................................................................................................................... - 16 -3.5.9 黑匣子 ...................................................................................................................................... - 17 -3.5.10 数据一致性保证 ..................................................................................................................... - 17 -3.5.11 健康检查工具及故障信息收集工具 ....................................................................................... - 17 -3.6 FusionAccess桌面接入系统可用性 ................................................................................................... - 17 -3.6.1 FusionAccess服务的高可用性 .................................................................................................. - 17 -3.6.2 桌面接入的高可用性................................................................................................................ - 19 -3.6.3 FusionAccess管理数据备份...................................................................................................... - 20 -3.6.4 上电恢复可靠性设计................................................................................................................ - 20 -4 虚拟机桌面业务可靠性....................................................................................................... - 21 -5 术语表.................................................................................................................................. - 21 -1 华为桌面云解决方案桌面云解决方案的架构组件部署在云计算提供的虚拟机中，对外提供桌面服务，结构图如下图所示。

精选-FusionSphere 容灾方案白皮书服务器虚拟化

1 容灾简介1.1 云计算容灾概述随着云计算的蓬勃发展，越来越多重要的计算机信息系统出现在云计算中。

由于各行业的用户和企业对网络应用和数据信息的依赖日益强烈，使得突发性灾难如火灾、洪水、地震、区域电力中断或者人为破坏对整个企业的数据和业务生产会造成重大影响，如重要信息丢失、服务中断、经济损失、客户流失等。

因此，为了保证云计算中计算机信息系统的业务连续性和数据可靠性，华为提供了针对云计算的容灾解决方案，保证灾难发生时关键数据不丢失，系统服务尽快恢复运行。

1.1.1 容灾简介容灾系统是指在相隔较远的异地，建立两套或多套功能相同的系统，系统之间可以相互进行健康状态监视和功能切换，当一处系统因意外(如火灾、洪水、地震、人为蓄意破坏等)停止工作时，整个应用系统可以切换到另一处，使得该系统功能可以继续正常工作。

容灾系统需要具备较为完善的数据保护与灾难恢复功能，保证生产中心不能正常工作时数据的完整性及业务的连续性，并在最短时间内由灾备中心接替，恢复业务系统的正常运行，将损失降到最小。

1.1.2 容灾系统的评价指标容灾系统主要为了在灾难发生时业务不发生中断，那么当灾难发生时，用户最关心的是什么呢？下面是国际上通用的容灾系统的评审标准Share 78，可以作为广大用户衡量和选择容灾解决方案的指标。

●备份/恢复的范围●容灾计划的状态●业务中心与容灾中心之间的距离●业务中心与容灾中心之间如何相互连接●数据是怎样在两个中心之间传送的●允许有多少数据被丢失●怎样保证更新的数据在容灾中心被更新●容灾中心可以开始容灾进程的能力因此，容灾系统的设计，主要也是围绕这几个用户需求。

由于用户投入资金的数量限制，想用少的资金达到第6级容灾级别显然是有难度的，我们设计出的系统也只能是在现有的条件下尽量减少故障历时，尽量多的恢复数据，这也是衡量我们所设计出来的容灾系统质量的指标。

实际的容灾系统设计过程中，我们重点关注的是RTO和RPO两个指标。

高可用性计算系统架构与容灾备份设计

高可用性计算系统架构与容灾备份设计随着计算系统在现代生活和商业活动中的重要性不断增加，构建一个高可用性的计算系统架构和有效的容灾备份设计变得至关重要。

高可用性计算系统架构的目标是确保系统始终可用，即使在硬件或软件故障的情况下也能够继续提供服务。

容灾备份设计则是为了保证在灾难性事件发生时，能够快速恢复系统，并保障业务的连续运行。

构建一个高可用性计算系统首先需要考虑可用性的要求和目标。

可用性是指系统能够在要求的时间内提供正常的服务。

一个高可用性的计算系统需要具备以下特点：1. 避免单点故障：设计架构时要避免单点故障，即系统中的任何一个组件或节点出现故障时，不会影响整个系统的运行。

这可以通过使用冗余组件、备份节点和多个数据中心来实现。

2. 快速故障检测和恢复：需要具备快速检测故障的能力，并能够实时监控系统的健康状态。

一旦检测到故障，系统应该能够自动切换到备份节点或组件，并能够实现快速故障恢复。

3. 负载均衡：通过合理分配负载，确保系统的各个组件或节点工作在合适的负载状态下，避免出现过载或资源不足的情况。

4. 水平扩展：系统应该具备水平扩展的能力，以应对不断增长的用户和数据量。

通过添加更多的节点、组件或数据中心来扩展系统的处理能力。

5. 数据一致性和可靠性：高可用性的计算系统需要保证数据的一致性和可靠性。

这可以通过使用分布式数据库、数据复制和备份来实现。

为了实现高可用性，可以采用多种架构模式和技术。

常用的架构模式包括主备模式、多节点模式、集群模式和分布式模式等。

主备模式是最常见的架构模式之一，它通过设置一个主节点和一个备份节点来实现高可用性。

主节点负责处理用户请求，备份节点则作为一个热备份，当主节点发生故障时，备份节点会自动接管服务。

主备模式可以通过心跳检测和数据复制来实现故障切换和数据同步。

多节点模式是一种分散负载并提高可用性的架构模式。

多个节点同时运行系统，每个节点都可以处理用户请求。

通过负载均衡器将用户请求分发到各个节点，确保系统在负载均衡状态下工作。

云计算架构与容灾备份构建高可用性和容灾能力的云计算架构

云计算架构与容灾备份构建高可用性和容灾能力的云计算架构云计算作为一种新兴的计算模式，已经在IT行业得到了广泛的应用。

它提供了高效、灵活、可扩展的计算资源，为企业和个人用户带来了巨大的便利和经济效益。

然而，在使用云计算的过程中，安全性和可靠性一直是人们关注的焦点。

为了保障云计算系统的高可用性和容灾能力，构建合理的云计算架构和进行容灾备份是至关重要的。

1. 云计算架构的概念与重要性云计算架构是指在云计算环境下所采用的系统架构，包括云计算的各个组件和层次以及它们之间的关系和交互方式。

云计算架构的设计直接影响着云计算系统的性能、安全性和可靠性。

一个合理的云计算架构能够提供高效的计算和存储资源，同时还能够满足用户对可靠性和安全性的要求。

2. 构建高可用性的云计算架构高可用性是指系统在面临故障或异常情况时，仍能够保持正常运行并提供可靠的服务。

在云计算环境下，构建高可用性的云计算架构是确保用户业务连续性和数据完整性的重要手段。

以下是一些构建高可用性云计算架构的关键要素：a) 负载均衡：通过将用户请求均匀地分配到多个服务器上，实现对网络和服务的负载均衡，提高系统的响应速度和可用性。

b) 多活数据中心部署：通过在不同地理位置部署多个数据中心，实现对用户数据的冗余备份和分布式存储，提高数据的可靠性和容灾能力。

c) 弹性扩展：通过动态添加和移除计算节点，根据用户需求调整系统的规模，提高系统的弹性和可扩展性。

d) 容错设计：通过使用冗余设备和备份系统，提供故障自动切换和数据恢复能力，保证系统的持续可用性。

这些要素相互结合，可以构建出一个高可用性的云计算架构，确保云计算系统能够在遇到故障或异常情况时，仍能够持续提供可靠的服务。

3. 容灾备份策略的重要性与实施方法容灾备份是指在系统或数据发生灾难性故障时，通过采用冗余备份和灾备切换等措施，保证系统能够迅速恢复以及数据得到有效保护和恢复。

容灾备份策略是保障云计算系统可靠性和延续性的重要手段。

数据库容灾与高可用架构的实践与优化中的备份恢复与容灾切换的策略与模式

数据库容灾与高可用架构的实践与优化中的备份恢复与容灾切换的策略与模式在数据库容灾与高可用架构的实践与优化中，备份恢复与容灾切换策略与模式是至关重要的部分。

它们可以确保在遭受硬件故障、自然灾害、人为错误等不可避免的情况下，能够迅速恢复数据库服务并保证数据的完整性与可靠性。

本文将探讨备份恢复与容灾切换的策略与模式。

一、备份恢复策略：备份恢复策略是指确定备份的类型、频率以及如何恢复数据的计划和过程。

以下是一些常见的备份恢复策略：1. 完全备份与增量备份：完全备份是将整个数据库备份到一个磁盘或磁带中，可以从该备份点恢复数据库。

增量备份只备份自上次完全备份以来更改的数据。

这样可以减少备份时间并降低存储需求，但恢复时间稍长。

2. 定期备份与实时备份：定期备份是按计划时间周期性地执行的备份，例如每天、每周。

实时备份是在数据发生更改时立即进行备份，以最小化数据丢失。

实时备份通常使用日志文件来记录更改。

3. 远程备份与本地备份：远程备份将数据备份到远程位置，提供数据冗余和远程恢复的能力。

本地备份是将备份存储在本地位置，可以快速恢复，但可能会受到本地灾难的影响。

4. 校验与验证：备份完成后，应对备份数据进行校验和验证，以确保能够正常恢复。

验证可以包括恢复模拟和数据一致性检查。

二、容灾切换策略与模式：容灾切换策略与模式是指在主数据库发生故障时，如何快速将应用系统切换至备库以保证系统的可用性。

以下是一些常见的容灾切换策略与模式：1. 冷备份与热备份：冷备份是将备份库定期与主库同步，但不运行实时数据同步。

在切换时需要手动启动备库。

热备份是在主库和备库之间启用实时数据同步，可以达到零数据丢失的目标。

2. 主从复制与多点复制：主从复制是将主库上的数据同步到备库，实现异地容灾。

多点复制是将主库同步到多个备库，以增加冗余和可用性。

3. 双机热备模式与容灾互备模式：双机热备模式是指主库和备库都处于热备状态，并具备数据同步能力。

容灾互备模式是指主库和备库互为备份，既可以作为主库使用，也可以作为备库使用。

SunrunVas尚云平台白皮书-高可扩展性、高可用与灾备部分

SunrunVas尚云多系统云应用集成服务平台白皮书高可扩展性、高可用与灾备部分1尚云是什么？SunrunVas尚云多系统云应用集成服务平台（以下简称尚云），基于应用虚拟化技术，能够帮助企业快速、安全地为任何地点、使用任何设备的用户提供对应用和桌面的细粒度访问，同时对敏感数据的使用和分配进行严密、集中的控制。

尚云平台基于应用虚拟化技术，将政企客户的各类应用界面直接安全、快速地延伸至用户侧的移动终端、便携终端，而不需要做任何终端的适配研发，使用户可以在任何时间、任何地点、任何方式、任何网络下都不间断地、灵活地使用各类应用。

使用尚云平台，用户可以使用任何设备，从任何地点上访问任何应用，而企业的敏感数据也不会离开数据中心。

同时，企业能够以高效、集中的方式管理相关应用，与传统部署方式相比，这种方式可降低复杂性和成本。

2应用虚拟化技术随着企业信息化技术的不断发展，IT硬件和所需应用也大量增加，如何降低IT系统软硬件成本和管理成本、更安全高效地部署应用、快速方便地使用应用、灵活简便地维护应用，成为企业面临的一个难题，应用虚拟化技术应运而生。

虚拟化技术最基本的形式，就是从物理硬件中将逻辑计算资源分离出来。

应用虚拟化即应用软件虚拟化，用户可以通过网络访问应用虚拟化服务器，获得应用软件的虚拟运行环境，无需在本地安装即可直接在虚拟运行环境中运行应用软件。

应用虚拟化技术将应用软件的人机交互逻辑（应用界面、键盘及鼠标的操作、音频输入输出、读卡器、打印输出等）与计算逻辑隔离开来。

具体而言，当用户访问一个虚拟化应用时，用户侧客户端只需将人机交互逻辑通过网络传送到服务器端，服务器运行应用软件的计算逻辑，把变化后的人机交互逻辑回送客户端，从而使用户获得如同运行本地应用软件一样的访问感受和计算结果。

3尚云的系统架构云平台两部分，分为硬件层（物理资源层）、逻辑资源层（物理机虚拟化、存储虚拟化、网络虚拟化等）、云盘服务层（提供分布式云盘服务）、核心服务层（提供应用虚拟化、虚拟通道、视频审计等服务）、网关（提供网络接入服务）、门户（提供管理和使用的WEB界面服务）。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

系统备份和高可用容灾架构白皮书目录Architecture Overview (4)1.1BCDR Solution Conceptual Design (4)1.1.1Data Protection Conceptual Design (5)1.1.2High Availability Conceptual Design (6)1.1.3Disaster Recovery Conceptual Design (6)2.Data Protection Design (8)2.1Logical Design (8)2.2Backup Datastore (9)2.2.1Backup Data Store Target Location (9)2.3Performance (10)2.4Volume Sizing (10)2.5Other Considerations (11)2.6Backup Policies (11)2.6.1Virtual Machine Backup Options (11)2.6.2Schedule Window (12)2.6.3Retention Policies (12)3.High Availability Design – Application Services (13)3.1Logical Design (14)3.2Backup and Recovery (15)4.Disaster Recovery Design (16)4.1Logical Design (16)4.2vCenter Server Logical Design (17)4.3Server for vCenter Site Recovery Manager (18)4.4Database for vCenter Site Recovery Manager (19)4.4.1Database Sizing (20)4.5Licensing for vCenter Site Recovery Manager (21)4.6Networking for vCenter Site Recovery Manager (22)4.7Storage for vCenter Site Recovery Manager (22)4.7.1Datastore Configuration Specifications (22)4.7.2Virtual Disk (.vmdk) Configuration Specifications (22)4.7.3Datastore Naming Conventions (24)4.7.4Virtual Machine Swap and Operating System Paging Files (24)4.7.5Storage Replication (25)4.7.6Third-Party Storage-Based Replication Versus vSphere Replication (26)4.7.7Placeholder Virtual Machines (28)4.7.8Snapshot Space (28)4.8Naming Conventions for vCenter Site Recovery Manager Site (28)4.9Inventory Mappings for vCenter Site Recovery Manager (28)4.9.1Resource Mappings (29)4.9.2Folder Mappings (29)4.9.3Network Mappings (29)4.9.4Protection Groups (29)4.10Messages and Commands for vCenter Site Recovery Manager (29)4.10.1Messages (29)4.10.2Commands (30)4.11Bulk IP Address Updates (31)4.12Recovery Plans for vCenter Site Recovery Manager (33)4.12.1Startup Order and Response Time (33)4.12.2Recovery Plan Test Network (33)4.12.3Recovery Site Local Virtual Machines (34)4.12.4vCenter Site Recovery Manager Recovery Plan Scenarios (34)4.13Monitoring (36)4.14Users, Groups, Permissions, and Roles (36)work Communication and Firewalls (39)Architecture Overview1.1 BCDR Solution Conceptual DesignThe solution to support BCDR of the workloads running on VMware vSphere contains various components that must be integrated to make them work well together. These components range from networking devices, infrastructure services, and storage area network (SAN) devices to vSphere and applications. Each of these components has a large number ofpotentially valid configurations, but only a few of these configurations result in an integrated, functional system that meets the specified business and technical requirements.The service management and operations processes affected by the BCDR solution are out-of-scope for this engagement and are not addressed in this document.The following figure illustrates the conceptual design for the solution.Replace the diagram with an updated version that reflects the customer’s requirements. Use the Layers option in Visio to remove components that will not be deployed.Figure 1. BCDR Solution Conceptual Design1.1.1 Data Protection Conceptual DesignThe solution to support data protection of the workloads running on vSphere must perform the following tasks:•Back up and restore virtual machines.•Store data following the retention policies specified.•Send email reports about VMware vSphere Data Protection Advanced activities.Figure 2. Data Protection Conceptual Design1.1.2 High Availability Conceptual DesignThe solution to support high availability of application services within workloads running on VMware vSphere must perform the following tasks:•Display location and availability status of the applications.•Employ user-defined remediation if a service is unavailable or unstable. Remediation actions include restart service and reset virtual machine.•Trigger alerts and notifications when services become unavailable or unstable. •Enable a remediation action to be suspended while maintenance is performed. •Integrate with VMware vSphere High Availability for reset virtual machine functionality and compatibility with VMware vSphere vMotion®.Figure 3. High Availability Conceptual Design1.1.3 Disaster Recovery Conceptual DesignThe solution to support disaster recovery of the workloads running on vSphere must support the following provisions:•All protected virtual machines must reside on vSphere at the <Site_A> (protected site) and will be recovered to <Site_B> (recovery site).•All protected virtual machines must reside on vSphere at the <Site_C> (protected site) and will be recovered to <Site_B> (recovery site).•All protected virtual machines are recovered to a single site located at the <Site_B> (recovery site).Note Detailed design of the storage, network, and replication solution is out-of-scope for this effort.Replace the diagram with an updated version that reflects the customer’s requirements. Use the Layers option in Visio to remove the multi-site option if it is not required.Figure 4. Disaster Recovery Conceptual Design2. Data Protection DesignProvide background information, any necessary context, and general considerations to keep in mind for the technology or solution.<Customer> has one data center, Data Center_A on Site_A, and the customer wants to protect the organization’s data in case of an eventuality such as data loss, hardware failure, accidental deletion, or other disaster. Data backup is an insurance plan.2.1 Logical DesignThe logical design is a more detailed design that includes all major components and entities and their relationships. The data flows and connections are detailed in this stage. Logical designs do not include physical server names and IP addresses. They include business services application names and details and other relevant information.The following figure represents the logical design of the vSphere Data Protection Advanced solution. vSphere Data Protection Advanced protects VMware infrastructure through the VMware vCenter Server™ Manager layer. Connectivity through vCenter Server provides vSphere Data Protection Advanced with visibility to all VMware ESXi™ servers, an d therefore to the virtual machines that must be backed up.Replace the diagram with an updated version that reflects the customers’ requirements. Use the Layers option in Visio to remove the High Availability option if it is not required.Figure 5. vSphere Data Protection Advanced Logical Design2.2 Backup DatastorevSphere Data Protection Advanced uses de-duplication technology to back up virtual environments at the block level, which enables efficient disk utilization.To optimize backups and leverage the technology available from vSphere, it is important that all hosts can access the production storage, allowing vSphere Data Protection Advanced to be managed optimally to back up production virtual machines.2.2.1 Backup Data Store Target LocationThe backup datastore stores all the production data that is required in a disaster recovery event or data loss to recover production service based on a recovery point objective (RPO). It is important to choose the target location and meet the minimum performance requirements to mitigate such a scenario.2.2.1.1. Option 1: Store Production and Back up Data on the Same Storage Platform The benefits of this destination choice are the following:•You do not need to request that the storage team provide a new storage configuration to deploy and store the vSphere Data Protection Advanced appliance and the backup store. •You can take full advantage of vSphere capabilities.The risk is that if the destination datastore or the production storage is unrecoverable, you will lose the ability to recover your data.2.2.1.2. Option 2: Store the Backup Data on a Dedicated StorageIn this configuration, the benefits are the following:•If the production storage becomes unavailable, you can recover your data, because your backup data is not located on the same shared storage.•Separate production and backup workloads.•The backup schedule does not impact the production storage performance, because the backup storage is completely separated.The drawback is that you might be required to install and configure a dedicated storage volume for backups.vSphere Data Protection Advanced storage replication feature mitigates the riskof backup store failure.Table 1. VMware Backup Store Target – Design DecisionsFor this design, <Customer> has made the decisions listed in this table. DD001 Dedicated storage volume for virtual machine backup. The production virtualmachine needs to beaccessible at all times withlimited performanceimpact. The appliance cannot use VMware vSphere Distributed Resource Scheduler™ or HA features if a direct attachedstorage is used.2.3 PerformancevSphere Data Protection Advanced generates a significant amount of I/O, especially when performing multiple, concurrent backups. The storage platform must be able to handle this I/O. If the storage does not meet the performance requirements, it is possible for backupfailures to occur and for error messages to be generated. vSphere Data Protection Advancedincludes a performance analysis feature. This analysis can be run during virtual appliance deployment or post deployment.2.4 Volume SizingvSphere Data Protection Advanced can expand dynamically the backup store destination from two (2) to eight (8) TB. However, specific memory requirements placed on the appliance dictate this capability.Table 2. VMware vSphere Data Protection Sizing Guide 2 TB6 GB 4 TB 8 GB 6 TB10 GB8 TB 12 GBIt is important not to fill the backup store so that you can restore your data andavoid any virtual machine backup interruptions.2.5 Other ConsiderationsvSphere Data Protection Advanced can be deployed and protect virtual machines that reside on VMware Virtual SAN™. The virtual machine storage policy is not backed up with the virtual machine, but you can restore it by replacing the existing virtual machine.The default storage policy includes “Number Of Failures To Tolerate = 1”, whichmeans that each of these storage devices will be mirrored.2.6 Backup Policies2.6.1 Virtual Machine Backup OptionsvSphere Data Protection Advanced provides the following methods for performing a backup of a virtual machine:•Hot Add – Provides full image backups of virtual machines, regardless of the guest operating system.o The virtual machine base disk is attached directly to vSphere Data Protection Advanced to back up data where change block tracking is used to track and back upnew blocks that are altered.o The backup and restore performance is faster because the data flow is through the vmkernel layer.o A quiesced snapshot can be used to interrupt the I/O of the virtual machine to swap the virtual machine disk (vmdk) or consolidate the data when the backup is done.o Hot add does not work under a multi-writer disk mode scenario.•Network Block Device – The virtual machine data is transmitted across the network to allow vSphere Data Protection Advanced to back up the data.o The performance of the virtual machine network traffic might be impacted.o It takes a quiesced snapshot and might interrupt the I/O of the virtual machine to swap the vmdk or consolidate the data once the backup is done.o The time to complete the virtual machine backup might be longer than the backup window.o Network block device does not work in multi-writer disk mode.•vSphere Data Protection Advanced Agent inside the guest operating system.o Enables application-consistent backup and recovery with Microsoft SQL Server, Microsoft SharePoint, and Microsoft Exchange support.o Provides more granularity and flexibility to restore on the file level.With a vFlash-based disk virtual machine, Data Protection Advanced reverts tonetwork-block device transport to back up the virtual machine.Table 3. Virtual Machine Transport Mode – Design DecisionsFor this design, <Customer> has made the decisions listed in this table.vSphere Data Protection Advanced agent for backup. instance of businessapplication backup.Protection Advanced agentand maintain it.DD003 Hot add transport modewill be used to backupvirtual machines. It will optimize and speedvirtual machine backups,and avoid any impact tothe management network.All ESXi hosts must beable to access the sameset of virtual machinedatastores.2.6.2 Schedule WindowEven when vSphere Data Protection Advanced uses Changed Block Tracking technology to optimize the success rate to back up data, it is crucial to avoid any window where the production storage is in high demand to avoid any business impact.No backup or administrative activities are allowed during the vSphere DataProtection Advanced blackout window. However, restores can be performed.By default, the blackout window begins at 8 a.m. local server time andcontinues uninterrupted for three hours until 11 a.m. the same morning.2.6.3 Retention PoliciesThe retention policies are the properties of a backup job, therefore it is important to group virtual machines by business priorities and the retention requirements set by the business level.3. High Availability Design – Application Services Provide background information, any necessary context, and general considerations to keep in mind for the technology or solution.vSphere App HA works with vSphere HA host and virtual machine monitoring to improve application uptime and provide high availability. The feature can restart an application service if it detects a failure, and can reset a virtual machine if the application fails to restart.vSpher e App HA uses VMware vCenter™ Hyperic® to monitor applications. vCenter Hyperic stores and manages App HA policies, which are configured in the administration section of the VMware vSphere Web Client. Policies define items, such as the number of times vSphere App HA will attempt to restart a service, the number of minutes it will wait for the service to start, and when to reset a virtual machine if a service is unstable.The vSphere App HA solution provides the administrator the ability to define high availability for applications running in the virtual environment, and it provides a level of visibility and control through the Web Client interface for vSphere. The vSphere App HA solution provides levels of remediation by restarting failed components on the application, or by using the Application Awareness API by way of vSphere HA to reset the virtual machine if the restarting of the application does not complete correctly.<Customer> has a number of critical business applications for which they want to improve application uptime running the following services:•Apache Tomcat•Microsoft IIS•Microsoft SQL Server•Apache HTTP Server•Microsoft SharePoint•SpringSource tc Runtime•PostgreSQL•Oracle databaseFigure 6. vSphere App HA Architecture Overview3.1 Logical DesignThe logical design is a more detailed design that includes all major components and entities and their relationships. The data flows and connections are detailed in this stage. Logical designs do not include physical server names and IP addresses. They include business services application names and details, and other relevant information.vSphere App HA is shipped as an appliance. This appliance needs to be installed on an HA-enabled cluster. It tightly integrates with vCenter Server. A plug-in is also installed in vSphere Web Client for App HA management. App HA talks to vCenter Hyperic to gather service availability information.A vCenter Hyperic HQ agent is installed for each application that is monitored. The agent reports application health to the vCenter Hyperic HQ virtual appliance. vCenter Hyperic HQ enforces the vSphere App HA policies by issuing commands to either restart the application or reset the virtual machine. A new vSphere HA application monitoring command allows thevCenter Hyperic HQ agent to request a virtual machine reset.Replace the diagram with an updated version that reflects the customer’s requirements. Use the Layers option in Visio to remove the High Availability option if it is not required. Figure 7. VMware vSphere App HA Logical DesignTable 4. vSphere App HA Logical Design ComponentsvSphere App HA appliance •Remediation policy configuration•Custom servicesvCenter Hyperic Server •Communicate with App HA for services status.(This is not included in this service’s scope.) vCenter Hyperic Agent •Real-time monitoring of middleware and applications running onthe virtual machine.(This is not included in this service’s scope.)3.2 Backup and RecoveryTo protect the vSphere App HA appliance, you have to make a proper backup plan thatincludes a vPostgres database.4. Disaster Recovery DesignProvide background information, any necessary context, and general considerations to keep in mind for the technology or solution.<Customer> has two sites: protected <Site_A> at Location_A and recovery <Site_B> at Location_B. The vCenter Site Recovery Manager design provides a solution for automating the setup and execution of the disaster recovery plans or workflows.<Site_A> has the virtual machine workloads that are being protected and is referred to as the protected site in this document. <Site_B> is the disaster recovery site and is referred to as the recovery site.4.1 Logical DesignThe logical design is a more detailed design that includes all major components and entities, and their relationships. The data flows and connections are detailed in this stage. Logical designs do not include physical server names and IP addresses. They include business services application names and details, and other relevant information. Check the VMware Virtualization Technical Materials ZIP file for the requirement.The following represents the overall logical design for the disaster recovery solution.<Customer> has various business applications and services, referred to as business applications in this document that must be available in the event of a disaster. These business applications are running as virtual machines on vSphere, but have some dependencies on other applications and services that run on physical systems.Users access these business applications over the corporate Local Area Network (LAN) in<Site_A>, Wide Area Network (WAN) from <Site_B>, and Virtual Private Network (VPN) from other branch offices and remote locations.There is network connectivity between <Site_A> and <Site_B>, and data from <Site_A> is replicated to <Site_B> using VMware vSphere Replication™ or the appropriate storage vendor solution.<Site_A> has a cluster of ESXi hosts with pre-production virtual machines that must be protected. <Site_B> has a cluster of ESXi hosts with non-production virtual machines. Each site has an instance of vCenter Server that manages the ESXi hosts within the site. Each site also has a vCenter Site Recovery Manager server and a vCenter Site Recovery Manager database. A storage vendor-provided Storage Replication Adapter (SRA) is installed on the vCenter Site Recovery Manager server to provide communication between the vCenter Site Recovery Manager Server and the storage array.vSphere Replication can be configured to perform the replication betweendisparate storage arrays between <Site_A> and <Site_B>.It is important to read the SRA release note provided from the vendor in SRAbundles to obtain the requirement list for configuring the LUNs to be managedby vCenter Site Recovery Manager.4.2 vCenter Server Logical Design<Customer>’s vCenter Server des ign includes a total of two virtual vCenter Server 5.5systems. One vCenter Server is located on <Site_A> and one vCenter Server on <Site_B>. These are deployed within the same three-node ESXi cluster in each. Each vCenter Server provides specific functions as follows: •VMware vCenter Server 1 – Located within the <Site_A> data center to providemanagement of the Pre-Production Cluster and integration with vCenter Site Recovery Manager.•VMware vCenter Server 2 – Located within the <Site_B> data center on the Recovery Cluster to provide management of the Recovery Cluster and integration with vCenter Site Recovery Manager.As part of the vCenter Site Recovery Manager project, the vCenter Server instances located within <Site_A> and <Site_B> must be running the same version. They work as peers. The vCenter Site Recovery Manager version is dependent on the vCenter Server (see the VMware Product Interoperability Matrixes at/comp_guide2/sim/interop_matrix.php?).Figure 8. VMware vCenter Site Recovery Manager Logical DesignPRIMARY SITESECONDARY SITESee the VMware Software-Defined Data Center Services Architecture Design Diagrams to check the possibilities for a DR scenario.Figure 9. BCDR SDDC Management Cluster Logical Diagram4.3 Server for vCenter Site Recovery ManagerThe following is a sample design decision based on VMware recommended practice. Multiple valid design choices are possible. Update this to reflect the <Customer>-specific information in order to determine the most appropriate design choice.A vCenter Site Recovery Manager server is required at both the primary site (sometimes referred to as the protected site) and the secondary site (sometimes referred to as the recovery site). The vCenter Site Recovery Manager Server operates as an extension to the vCenter Server at a site. Because the vCenter Site Recovery Manager server depends on vCenter Server for some services, you must install and configure vCenter Server at a site before vCenter Site Recovery Manager can be implemented.vCenter Site Recovery Manager takes advantage of vCenter Server services, such as storage management, authentication, authorization, and guest customization. vCenter Site Recovery Manager also uses the standard set of vSphere administrative tools to manage these services.You can use vCenter Site Recovery Manager and vSphere Replication with the VMware vCenter Server Appliance™ or with a standard vCenter Server installation. You can have a vCenter Server Appliance on one site and a standard vCenter Server installation on the other. The vCenter Site Recovery Manager server deployment is supported in a few scenarios that include:•Deployed as either a physical or virtual system.•Deployed on a shared system, such as the vCenter Server, or on a dedicated system.For this design, <Customer> has made the decisions listed in this table. DD004The vCenter Site Recovery Manager Server will be deployed within a virtual machine.All components of the solution should bedelivered using the highest levels of availability.Running the vCenter Site Recovery Manager Server as a virtual machine allows it to make use of theunderlying vSphere cluster capabilities.VMware recommends that there be a clear separation of management infrastructure frommanaged infrastructure. Use resource pools to make sure that adequate resources are available. If no dedicated management cluster exists, one might be needed.DD005The vCenter Site Recovery Manager Server will be deployed using a dedicated server.The disaster recovery solution must not impact the performance of day-to-day operations, so separation must beprovided. Deploying on a dedicated system allows for easier upgrades.Deploying the vCenter Site Recovery Manager Server on a dedicated system will increase the overall licensing costs of the solution.4.4 Database for vCenter Site Recovery ManagerThe following is a sample of the design choices, evaluation, and design decision. Multiple valid design choices are possible. Update this to reflect the <Customer>-specific information to determine the most appropriate design choice.The vCenter Site Recovery Manager server requires its own database to store data, such as recovery plans and inventory information. The vCenter Site Recovery Manager database is a critical part of a vCenter Site Recovery Manager installation, and it cannot use the vCenter Server database because it has different database schema requirements.Each vCenterSite Recovery Manager site requires its own instance of the vCenter Site Recovery Manager database. vCenter Site Recovery Manager does not require the databases on each site to be identical. You can run different versions of a supported database from the same vendor on each site, or you can run databases from different vendors on each site. The vCenter Site Recovery Manager database deployment is supported in a few scenarios that include: • A shared database systemA dedicated database systemFor this design, <Customer> has made the decisions listed in this table.deployed on a shared database system where a dedicated instance will be created. database system that is highly-available, so a shared vCenter Site Recovery Manager database system offers better availability.4.4.1 Database SizingUse the vCenter Site Recovery Manager database sizing calculator to get an estimate of the size of the vCenter Site Recovery Manager database.The vCenter Site Recovery Manager database at each site holds information about virtual machine configuration, protection groups, and recovery plans. The VMware vCenter Site Recovery Manager Administration Guide provides the configuration maximums for vCenter Site Recovery Manager.Based on these configuration maximums, the size of the vCenter Site Recovery Manager database instance is less than 1 GB on the protected site and the recovery site.VMware recommends that you allocate at least 3 GB to the vCenter Site Recovery Manager database on both the protected and recovery sites, because the database space requirements are small, and it is possible to set up either site as a protected site.Update this to reflect the <Customer>-specific information to determine the correct number of vCenter Site Recovery Manager licenses required by the <Customer>.If vCenter Site Recovery Manager is configured to fail over virtual machines only from the primary site to the secondary site, you are required to have vCenter Site Recovery Manager licenses only for the protected virtual machines at the primary site.If vCenter Site Recovery Manager is configured to fail over a set of virtual machines from a primary site to a secondary site, and is also configured to fail over a different set of virtual machines from the secondary site to the primary site, you are required to have vCenter Site Recovery Manager licenses for the protected virtual machines at both sites.Note Verify that you have properly installed the software for vCenter Site Recovery Manager as well as the necessary plug-ins.VMware recommends that <Customer> purchase adequate licenses for the primary and recovery sites to make the failback process simpler. This also allows for the capability of providing bidirectional failover, if necessary.Table 7. vCenter Site Recovery Manager Licenses – Single Direction ProtectionNumber of protected virtual machines on ESXi hosts 10 01 0Number of vCenter Site Recovery Manager licenses(25 protected virtual machines per license)Total per-virtual machine 25-pack vCenter Site Recovery Manager licenses needed = 1 (up to 25 protected virtual machines total).Table 8. vCenter Site Recovery Manager Licenses – Bi-Directional ProtectionNumber of protected virtual machines on ESXi hosts 10 51 1Number of vCenter Site Recovery Manager licenses(25 protected virtual machines per license)Total per-virtual machine 25-pack vCenter Site Recovery Manager licenses needed = 2 (up to 25 protected virtual machines in Site_A and up to 25 protected virtual machines in Site_B).。