May 5, 2022 |
Leading Media Platform Protects Critical SAP S/4 HANA in AWS EC2 CloudLeading Media Platform Protects Critical SAP S/4 HANA in AWS EC2 CloudSIOS Chosen Based on Certifications and Validations for SAP, Amazon Web Services and Red Hat LinuxThis leading media organization reaches virtually all adults in Singapore weekly. via the widest range of media platforms in the country, spanning digital, television, radio, print and out-of-home media and more than 50 products and brands in four languages (English, Mandarin, Malay and Tamil). The EnvironmentThe company uses an SAP S/4 HANA application and HANA database to power operations throughout the organization, unifying operations across multiple departments. They were running these critical applications and database in a Red Hat Linux environment in their on-premises data center. Protecting these essential systems from downtime and disasters is a top priority for this organization’s IT team. The ChallengeThe media organization’s IT team recognized that they could realize significant cost savings by moving its SAP applications and HANA database into the AWS EC2 cloud. However, for the migration to be successful, they needed a high availability (HA) and disaster recovery (DR) solution that would “lift-and-shift” with their existing SAP landscape to AWS without disruption. The EvaluationThe company’s IT team wanted an HA/ DR solution that they could rely on to meet their 99.99% availability SLA in the cloud. The solution needed to be certified by both SAP and AWS and support a Red Hat Linux operating system. Finally, to ensure they could deliver full DR protection for these critical workloads, they needed a clustering solution that would failover across multiple AWS availability zones (AZs). An AWS Solution architect recommended that the organization use SIOS LifeKeeper for Linux clustering software. The company’s IT team had a short timeline to complete the project and needed to choose an HA vendor who could meet their requirements without impeding their progress. The SolutionThey chose SIOS LifeKeeper because it not only met all of our criteria, but also because the SIOS team got organized very quickly, enabling them to keep their cloud migration project on schedule. SIOS Lifekeeper is certified by both AWS and SAP for highavailability on Red Hat and AWS EC2 cloud environment. The SIOS solution also met another key criteria as it is able to replicate and provide redundancy across AWS availability zones The organization’s IT team was impressed with SIOS’ dedicated local support team who were available to answer questions and provide support 24 hours a day, 7 days a week. The organization currently has five pairs of failover clusters using SIOS LifeKeeper for Linux to protect S/4 HANA and SAP HANA applications running across multiple availability zones in AWS EC2. The ResultsThe availability has been reliable we had not had any extended downtime since using the software By enabling this customer to migrate these important workloads to the cloud without sacrificing HA/DR or application performance, SIOS allowed us to achieve significant cost savings,” he said. SIOS LifeKeeper’s ease of Mediacorp saved even more by minimizing IT admin time. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||||||||||||||||
May 1, 2022 |
Protect Systems from DowntimeProtect Systems from DowntimeIn today’s business environment, organizations rely on applications, databases, and ERP Systems such as SAP, SQL Server, Oracle and more. These applications unify and streamline your most critical business operations. When they fail, they cost you more than just money. It is critical to protect these complicated systems from downtime. Proven High Availability & Disaster RecoverySIOS has 20+ years of experience in high availability and disaster recovery. SIOS knows there isn’t a one size fits all solution. Today’s data systems are a combination of on-premise, public cloud, hybrid cloud and multi cloud environments. The applications themselves can create even more complexity. But configuring open-source cluster software can be painstaking, time consuming, and prone to human error. SIOS has solutions that provide high availability and disaster recovery for critical applications. These solutions have been developed based on our real-world experience, across different industries and use-cases. Our products include SIOS DataKeeper Cluster Edition for Windows and SIOS LifeKeeper for Linux or Windows. These powerful applications provide failover protection. The Application Recovery Kits included with LifeKeeper speeds up application configuration time by automating configuration and validates inputs. System Protection On-Premises, in the Cloud or in Hybrid EnvironmentsSIOS provides the protection you need for business-critical applications and reduces complexity of managing them, whether on-premises, in the cloud, or in hybrid cloud environments. Learn more about us in the video below or contact us to learn more about high availability and disaster recovery for your business critical applications. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||||||||||||||||
April 29, 2022 |
How to Achieve High Availability in the Cloud Using WSFCHow to Achieve High Availability in the Cloud Using WSFCMicrosoft Windows Server includes Windows Server Failover Clustering (WSFC) software to ensure the availability of critical applications. In an on-premises environment, primary and standby nodes in the cluster are connected to the same shared storage. However, this infrastructure cannot be taken directly to the cloud. Shared storage that spans both primary and standby systems is essential in WSFC, but shared storage cannot be used with public cloud services such as IaaS (Infrastructure as a Service) in AWS, Azure, or Google Cloud. Geographically Separated Shared Storage for WSFC is Not Available in the CloudWhen migrating on-premises applications to the cloud, companies prefer to move their entire infrastructure to the cloud, including WSFC, without changing the on-premises operation process. This allows them to minimize disruption by applying the same WSFC skills and know-how in the cloud. The servers that make up the cluster are divided into the primary node – where the application runs – and standby node(s). WSFC software monitors the application and server node to ensure they are operational. If WSFC detects something wrong with the primary node, it switches operation of the application to the standby node in a process called “failover”. In a WSFC environment, the primary server and the standby server are connected to shared storage – typically storage called SAN (Storage Area Network) or iSCSI-SAN storage. To failover operations from the primary server to the standby server, the network link must be switched so the standby server can read from and write to the SAN that normally reads from and writes to the primary server. In this way, it is possible to restart the service in a short time, allowing the standby server to access the same data as the primary node and meet low Recovery Point Objectives (RPOs). See related content: Disaster Recovery Fundamentals. However, when migrating WSFC to the cloud, there is no SAN available. For example, you cannot link Amazon Web Services (AWS) and Microsoft Azure to multiple nodes (servers) to use as shared storage. The same applies to IaaS for other cloud services. It is possible to build an HA cluster configuration based on WSFC without shared storage, but it requires extremely advanced skills, such as creating your own program to recover data on the standby node. The operation is complicated and it is not easy to verify when an incident occurs. Data Replication Software Solves the ProblemTo rectify this problem, you can install data replication software that is specialized for HA clusters – such as SIOS DataKeeper Cluster Edition – and synchronize storage among local servers. Data on the local disks of the primary and standby nodes are synchronized in real-time using host-based, block-level replication. With this method, you do not need shared storage. Instead, you can build an HA cluster configuration using familiar WSFC without disrupting established processes. With DataKeeper, synchronized nodes appear as a SAN in the WSFC management screen (Failover Cluster Management). If your operations managers have used WSFC, they will require little to no training with this approach. High Availability in the Cloud Surpasses On-Premises HA with SIOS DataKeeper and WSFCDataKeeper Cluster Edition is a software add-on that seamlessly integrates with Windows Server Failover Clustering (WSFC) to add performance-optimized host-based synchronous or asynchronous replication. In the unlikely event that the HA cluster malfunctions, WSFC will orchestrate the failover of operations to the standby node(s) and access shared storage as if it is shared storage. This simple mechanism makes it possible to move to AWS without changing the operations of the existing system. Without compromising familiar WSFC operations, it is possible to guarantee high availability in the cloud using DataKeeper that is equivalent to or better than on-premises high availability. The advantage of this cluster configuration is that it is very simple and can be easily applied to any cloud environment. Seamless Integration with WSFCSIOS DataKeeper Cluster Edition seamlessly integrates with and extends Windows Server Failover Clustering (WSFC) by providing a performance-optimized, host-based data replication mechanism. While WSFC manages the software cluster, SIOS performs the data replication to enable disaster protection and ensure zero data loss in cases where shared storage clusters are impossible or impractical, such as in cloud, virtual, and high-performance storage environments. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||||||||||||||||
April 26, 2022 |
The single best way to deploy quorum/witnessThe single best way to deploy quorum/witnessDuring a recent meeting, a customer asked a question about High Availability (HA) and the need for quorum/witness feasibility. Their question was, “What is the best way to deploy quorum/witness?” The answer to their question is simple, there is no single best way to deploy quorum. To understand why, let’s start by defining three key things: what is a witness resource, a quorum resource and a split-brain scenario. What is split brain?In a normal cluster environment, the protected application is running on the primary node in the cluster. In the event of an application failure of that primary node, the clustering software moves the application operation to a secondary or remote node, which assumes the role of primary. At any given time, there is only one primary node. Split brain is a condition that occurs when members of a cluster are unable to communicate with each other, but are in a running and operable state, and subsequently take ownership of common resources simultaneously. In effect, you have two bus drivers fighting for the steering wheel. Split-brain, due to its destructive nature, can cause data loss or data corruption and is best avoided through use of fencing, quorum, witness, or a quorum/witness functionality for cluster arbitration. In most cluster managers, quorum is maintained when:
In most cluster managers, quorum is lost when:
What is a witness resource (or server)?A witness resource is a server, network endpoint, or a device that is used to achieve and maintain quorum when a cluster has an even number of members. A cluster with an odd number of members, using cluster majority, does not need to use a witness resource as all members of the cluster server to arbitrate majority membership. What is quorum and a quorum resource?A quorum resource is a resource (device, system, block storage, file storage, file share, etc) that serves as a means for arbitration of the cluster state and membership. In some cluster managers, quorum is a resource within the cluster that aids or is required for any cluster state and cluster membership decisions. In other cluster managers, quorum functions as a tie-breaker to avoid split-brain. More than One Way to Deploy a QuorumGiven the critical nature of quorum it is essential that HA architectures deploy quorum/witness resources properly, and fortunately (or unfortunately) there is no single, best way to deploy quorum. There are several factors that may shape the way in which your witness and quorum resources behave. These factors include: 1. Whether or not your deployment will be on-premises, cloud, or hybridDeploying in an on-premises datacenter where additional storage devices, such as fiber channel storage, power control devices or connections, or traditional stonith devices are present will provide customers with additional options for quorum and witness functionality that may not reside in the cloud. Likewise, cloud and hybrid environments present differences in what can be deployed and what use cases quorum is being deployed to prevent. Additionally, latency requirements and differences may limit what types of devices and resources are available for a quorum/witness configuration. 2. Your recovery objectivesRecovery objectives are also important to consider when designing and architecting your quorum and witness resources. In an example two node cluster (node A and node B), when node A experiences a loss of connectivity to node B, what is the highest priority for recovery. If the witness/quorum resources are in the same network with node A, this could result in node A remaining online, but severed from clients, while node B is unable to assess quorum and takeover. Likewise, if the quorum device lived only in the region, data-center or network with node B, a loss could result in a failover of resources to a defunct network or center or away from a functional and operation primary node. 3. Redundancy of Available Data Centers (or Regions) Within Your InfrastructureThe redundancy of the data center or region is also an important factor in your HA topology with quorum/witness. If your data center has only two levels of redundancy, you must understand the tradeoff between placement of the quorum/witness in the same data center as the primary or standby cluster node. If the data center has more than two redundant tiers, such as a third availability zone or access to a second region, this option would provide a higher level of redundancy for the cluster. 4. Disaster Recovery RequirementsUnderstanding your true disaster recovery requirements is also a major factor in your design. If your cluster manager software requires access to the quorum/witness in order to recover from a total data center outage (or region failure) then you’ll need to understand this impact on your design. Many high availability software packages have tools or methods for this scenario, but if your software does not, your design and placement of quorum/witness may need to accommodate this reality. 5. Number Of Members Within the Cluster, and Their LocationAn additional quorum/witness server is typically not required when the cluster contains an odd number of nodes. However, if using only two nodes in a cluster or deploying a DR node that is not always available may change your architecture. As VP of Customer Experience I have worked with customers who have deployed three node architectures, but for cost savings they automate periodic shutdown of the third server. 6. Operation System and Cluster ManagerThe final factor to mention on quorum/witness is the cluster manager and operating system. Not all HA software and cluster managers are equal when it comes to deployment of quorum/witness or arbitration of quorum status. Some clustering software requires shared disks for arbitration, others are more flexible allowing shares (NFS, SMB, EFS, Azure Files, and S3). Being aware of what your cluster manager requires, and the modes that it supports with regards to quorum (simple majority, witness, file share, etc) will impact not only what you deploy, but how you deploy. The single best way to deploy a quorum/witness server is to understand your vendor’s definition of quorum/witness and their available options, know your requirements, factor in the limitations or opportunities presented by your data center (or cloud environment) and architect the solution that provides your critical systems the highest level of protection against split-brains, false failovers, and downtime. -Cassius Rhue, VP, Customer Experience Reproduced from SIOS |
||||||||||||||||||||||||||||||||||||||||||
April 21, 2022 |
Measuring and Improving Write Throughput Performance on GCP Using SIOS DataKeeper for WindowsMeasuring and Improving Write Throughput Performance on GCP Using SIOS DataKeeper for WindowsBackgroundThis post serves to document my findings in GCP in regards to write performance to a disk being replicated to GCP. But first, some background information. A customer expressed concern that DataKeeper was adding a tremendous amount of overhead to their write performance when testing with a synchronous mirror between Google Zones in the same region. The original test they performed was with the bitmap file on the C drive, which was a persistent SSD. In this configuration they were only pushing about 70 MBps. They tried relocating the bitmap to an extreme GCP disk, but the performance did not improve. Moving the Bitmap to a Local SSDI suggested that they move the bitmap to a local SSD, but they were hesitant because they believed the extreme disk they were using for the bitmap had latency and throughput that was as good or better than the local SSD, so they doubted it would make a difference. In addition, adding a local SSD is not a trivial task since it can only be added when the VM is originally provisioned. Selecting the Instance TypeAs I set out to complete my task, the first thing I discovered was that not every instance type supported a local SSD. For instance, the E2-Standard-8 does not support local SSD. For my first test I settled on a C2-Standard-8 instance type, which is considered “compute optimized”. I attached a 500 GB persistent SSD and started running some write performance tests and quickly discovered that I could only get the disk to write at about 140MBps rather than the max speed of 240MBps. The customer confirmed that they saw the same thing. It was perplexing, but we decided to move on and try a different instance type. The second instance type we selected was an N2-Standard-8. With this instance type we were able to push the disk to its maximum throughput speed of 240 MBps when not replicating the disk. I moved the bitmap to the local SSD I had provisioned and repeated the same tests on a synchronous mirror (DataKeeper v8.8.2) and got the results shown below. The ResultsDiskspd test parameters diskspd.exe -c96G -d10 -r -w100 -t8 -o3 -b64K -Sh -L D:\data.dat MBpsThe Data
ConclusionsThe 64k and 4k write sizes all incur overhead which could be considered as “acceptable” for synchronous replication. The 8k write size seems to incur a more significant amount of overhead, although the average latency of 3.183ms is still pretty low. -Dave Bermingham, Director, Customer Success Reproduced with permission from SIOS |