January 1, 2022 |
Four Avoidance Strategies for Improving Cluster Resilience, Performance, and OutcomesFour Avoidance Strategies for Improving Cluster Resilience, Performance, and OutcomesSimple Steps for Deployment in SIOS Protection Suite Cluster EnvironmentAvoiding something – we’ve all done it before. An old flame we see in the store while walking with our spouse, a salesperson when we aren’t “ready to buy”, and even a boss while we are out on “vacation”. When I was the manager of a development team, I caught a glimpse of a direct report browsing in a store while they were supposed to be out of the office sick. They ducked between clothing racks and scurried down the next aisle and hurried away. We’ve all done it before, and in some cases, for mental health, physical health, or reasons that remain private and personal, we all need some measures of avoidance. Even in HA. So, how do you add avoidance to your High Availability environment, and why? Four Reasons To Use An Avoidance Strategy In High Availability1. Better Performance (minimizing server overload)One reason to use avoidance strategies in HA is to increase application and server performance. Consider the case of three servers running production workloads, let’s call them Server Alpha, Server Beta, Server Gamma. Servers Alpha and Beta are running critical applications backed by a database, while Server Gamma is running reports and data transformation jobs. In the event of a failure of Server Alpha, a failover to Server Beta would traditionally occur. However, because server Beta is already running a large workload, the resulting additional application load might result in an undesirable server overload and poor performance for both applications. So it might be wise to deploy an avoidance strategy to make sure that Server Gamma is chosen as the failover target. 2. Performance OptimizationConsider again the scenario of three servers, Alpha, Beta, and Gamma. Servers Alpha and Beta are scaled to handle peak workloads, while Server Gamma is a cost-optimized server. In the event of a failure of Server Alpha and Server Beta, a failover will occur to the cost-optimized server, Gamma. However, this server is not scaled to handle peak workloads, nor the workloads of both Server Alpha and Server Beta at the same time. In this instance, an avoidance strategy can be used to optimize performance by automatically moving one or both of the workloads from Server Gamma as soon as another host is available. 3. High Availability OptimizationHA Optimization is another scenario for deploying avoidance strategies. Like the performance optimization strategy, HA optimization is used to ensure that your environment can survive most failure scenarios and that your applications are optimized to provide the highest level of availability possible at any point in time. HA optimization is important for an application such as SAP with replicated enqueue processes. In any SAP environment, you do not want the ASCS (ABAP SAP Central Service) and ERS (enqueue replication services) instance residing on the same server for extended periods of time because of the risk of lost locks and canceled jobs. To prevent this from occurring you can use an avoidance strategy that causes the ERS and ASCS instances to always run on opposite cluster nodes. Consider the case of three servers running production workloads, let’s call them Servers Alpha, Beta, Gamma. Server Alpha is running the ASCS instance, while Server Beta is running the ERS instance. Server Gamma functions as a third node for failovers of both Server Beta (ERS) and Server Alpha (ASCS). If Beta crashes, you wouldn’t want the ERS resource running on the same node as the ASCS instance. To ensure this operation, you can deploy an avoidance strategy that automatically checks first and ensures the two applications are on separate servers, and maintain SAP ASCS/ERS best practices for lock failover. 4. DR AvoidanceSuppose you have two data centers: City Alpha and City Beta which are about 70 miles apart with most of your clients centrally located between them. However, due to recent changes in internal organizations, mergers/closures and acquisitions, and governance requirements, your IT team has to add a third data center that is located in City Gamma, which is about 350 miles from Alpha and Beta. Now the resources which were primarily protected in Alpha and Beta are also extended to the Gamma location. Given that most of the users and teams are near the Alpha and Beta locations and even the most extreme users are located in neighboring cities, your team needs to avoid a failover to the Gamma location. Like the other strategies, a DR avoidance seeks to optimize performance, in/out regional data costs, latency, and client access by avoiding the DR node should only one node within either region fail. It would also ensure that even if both nodes fail after different times, failover always occurs to the other node in the cluster or data center before moving to DR. So, how do you deploy an avoidance strategy? Many providers have affinity rules that can be configured, while others use a combination of server priorities or manual steps. In the case of the SIOS Protection Suite for Linux, you can use a number of built-in methods including: 1. Resource prioritizationIn the event of a failure, resources will fail over to the server where they have the lowest remaining priority and cascade to any additional servers (Alpha, Beta, and Gamma). Server Alpha is the primary server for Resource.HR, Server Beta is the primary server for Resource.MFG, and Server Gamma is the backup server for all resources/servers. Using resource prioritization, Resource.HR would have a priority of one (1) on Server Alpha and a priority of two (2) on Server Gamma. While Resource.MFG could have a priority one (1) on Server Beta and a priority of two (2) on Server Gamma. If customers wanted to optimize the use of the environment, then Resource.HR could have a priority of three (3) on Server Beta and Resource.MFG could have a priority of three (3) on Server Alpha. In the event of a failure of Server Alpha, the resource Resource.HR would fail to Server Gamma first before trying to come in-service (be restored) on Server Alpha. SIOS Protection Suite for Linux (UI and CLI) allow users to specify a priority for each server and resource combination. 2. Policy or affinity rulesPolicy rules can also be used to prevent a resource recovery from occurring on a given server and thereby allowing a resource to avoid a specified server that may be running a more critical or resource-intensive workload. Typical policies include:
The SIOS Protection for Linux CLI allows users to specify policy rules which can disable failover to a specific resource for a specified server, provide temporal policies guarding failures, disable failures of a specific application type, constraint policies, and custom policies.
The most granular way to establish a resource avoidance strategy is to deploy specific avoidance scripts within each hierarchy. This method will allow the user to configure specific applications, (eg app1 and app2), to avoid one another whenever possible while allowing other applications to run without restriction. In the case of our three servers, Alpha, Beta, and Gamma, and three resources app1, app2, and app3 this method would provide the greatest flexibility. In this example, app1 and app2 will seek to avoid collocation when a server fails, but app3 will fail to the next available node based on priorities without any collocation restrictions. For additional examples of avoidance strategies and resources, consider the SIOS Protection Suite for Linux documentation. If a customer has two applications, app1 and app2, that they require to run on different nodes whenever possible, the customer can create two avoidance terminal leaf node resources using the SIOS Protection Suite for Linux gen/app resource and the ‘/opt/LifeKeeper/lkadm/bin/avoid_restore’ script. – Cassius Rhue, VP, Customer Experience Reproduced from SIOS |
December 28, 2021 |
Windows ClusteringWindows ClusteringWindows ClusteringHow to Achieve High Availability in WindowsTo mitigate system downtime and ensure high availability for Windows, IT best practice recommends that you cluster servers (or nodes) so that if one node fails, one or more other nodes automatically take over-processing. This is also referred to as Windows clustering. Clustering software is required that monitors the health of the primary node and initiates recovery actions if it detects an issue. HA clustering also requires a way to ensure that, in the event of a failure, the secondary node is accessing the most current versions of data in storage. In most cases, this is achieved by connecting all nodes of the cluster to the same shared storage. The cluster nodes should be separated geographically to protect applications from sitewide and regional disasters. In Windows Server environments, Microsoft includes Windows Server Failover Clustering (WSFC) in the Windows Server platform. What is Windows Server Failover Clustering?With WSFC, each active node has a standby node that has the same hardware specifications and shares the same storage. A third node is often configured as a “witness” server whose sole purpose is to ensure that the primary node is operational and, if an issue is detected, to signal the need to failover operation to the standby node. In addition to monitoring the health of the cluster, the nodes in a WSFC also work together to collectively provide:[1]
How SIOS DataKeeper Complements WSFCWSFC requires shared storage to ensure all cluster nodes are accessing the most up-to-date data in the event of a failover. Often, companies use expensive SAN hardware to assure data redundancy. SANs represent a single point of failure risk. And, if you want to run your application in the cloud with the same Windows Server Failover clustering protection, there is no SAN available. SIOS DataKeeper Cluster Edition seamlessly integrates with and extends WSFC and SQL Server Always On Failover clustering by eliminating the need for shared storage. It provides performance-optimized, host-based replication to synchronize local storage in all cluster nodes, creating a SANless cluster. While WSFC manages the cluster, SIOS DataKeeper performs synchronous or asynchronous replication of the storage giving the standby nodes immediate access to the most current data in the event of a failover. SIOS DataKeeper not only eliminates the cost, complexity, and single-point-of-failure risk of a SAN, but also allows you to use the latest in fast PCIe Flash and SSD in your local storage for performance and protection in a single cost-efficient solution. With SIOS DataKeeper, you can also balance network bandwidth and CPU utilization for each application.
In addition, SIOS DataKeeper’s Target Snapshots feature lets you run point-in-time reports from a secondary node to offload workloads that can impact performance on the primary node. This lets you query and run reports faster and make faster decisions. Working with WSFC, SIOS DataKeeper Cluster Edition protects business-critical Windows environments, including Microsoft SQL Server, SAP, SharePoint, Lync, Dynamics, and Hyper-V using your choice of industry-standard hardware and local attached storage in a “shared-nothing” or SANless configuration.[2] SIOS DataKeeper also provides high availability and disaster recovery protection for your business-critical applications in cloud environments, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Services without sacrificing performance. SIOS Protection Suite – Protecting a Windows Environment Without WSFCSIOS Protection Suite for Windows includes DataKeeper, SIOS LifeKeeper, and optional application Recovery Kits for leading application and infrastructure operations. It is a tightly integrated clustering solution that combines high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies to protect your business-critical applications and data from downtime and disasters. Distributed metadata and notificationsThe WSFC service and node’s metadata/status are hosted on each node in the cluster. When changes occur on any node, updated information is automatically propagated to all other nodes. SIOS Protection Suite does not require WSFC as SIOS monitors the health of the application environment, including servers, operating systems, and databases. It can stop and restart an application both locally and on another cluster server at the same site or in another location. When a problem is detected, SIOS Protection Suite automatically performs the recovery actions and automatically manages cascading and prioritized failovers. With SIOS Protection Suite, you can use your choice of SAN or SANless clusters using a wide array of storage devices, including direct-attached storage, iSCSI, Fibre Channel, and more. SIOS Protection Suite for Windows can meet your high availability and disaster recovery needs within a single site and across multiple sites. Popular SIOS Windows Clustering SolutionsSome of the most popular SIOS Windows clustering solutions – for SQL Server, SAP, and cloud-based environments – are discussed in more detail below. Windows Clustering for SQL Server, SAP, S/4HANA, and OracleSIOS provides comprehensive SAP-certified protection for both applications and data, including high availability, data replication, and disaster recovery. To protect SAP in a Windows environment, SIOS Protection Suite includes SIOS LifeKeeper, which monitors the entire application stack. SIOS protects your Oracle Database whether you are using it with SAP or running standalone Oracle applications – you simply select the Application Recovery Kit that matches your configuration. Windows Clustering in the CloudWhether you need SIOS DataKeeper to enable Windows Server Failover Clustering in the cloud or SIOS Protection Suite for Windows for application monitoring and failover orchestration, as well as efficient, block-level data replication, SIOS delivers complete configuration flexibility. SIOS allows you to create a cluster in any combination of physical, virtual, cloud, or hybrid cloud infrastructures. For example, working with WSFC, SIOS DataKeeper can:
SIOS DataKeeper Cluster Edition can provide high availability cluster protection across cloud ConclusionSIOS provides offerings that support a breadth of applications, operating systems, and infrastructure environments, providing a single solution that can handle all your high availability needs. Here are just a few examples that demonstrate the power of SIOS.
For more information on high availability/disaster recovery solutions to support your Windows environment click here [TM(1] . References https://www.techopedia.com/definition/24358/windows-clustering https://searchwindowsserver.techtarget.com/definition/Windows-Server-failover-clustering [2] A shared-nothing architecture (SN) is a distributed-computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit). https://en.wikipedia.org/wiki/Shared-nothing_architecture Reproduced from SIOS |
December 23, 2021 |
Linux ClusteringLinux ClusteringWhat is Linux Clustering?A high availability Linux cluster is a group of Linux computers or nodes, storage devices that work together and are managed as a single system. In a traditional clustering configuration, two nodes are connected to shared storage (typically a SAN). With Linux clustering, an application is run on one node, and clustering software is used to monitor its operation. If the software detects an issue, it moves operation of the application to the secondary node in a process called failover. Since the secondary node shares storage with the primary, operation can continue quickly, meeting very short (seconds to minutes) recovery time and recovery point objectives. Linux Open Source High Availability ClusteringSome Linux operating system vendors offer clustering software, such as SUSE Linux HAE; Red Hat Enterprise Linux (RHEL); and Oracle Real Application Clusters (RAC). While they allow you to create a failover cluster, they present a variety of challenges. First, choosing which software to use for each component of the HA configuration, which at a minimum, must include three related capabilities: data replication, server clustering and a resource manager with a heartbeat monitor. With SUSE and Red Hat, you are also locked into the OS. If you want to use other less expensive or free OS versions, such as CentOS or Oracle Enterprise Linux (OEL), you will need to buy a separate HA solution. Whichever you choose, creating a Linux clustering solution with open source software for high availability is a “Do-It-Yourself” (DIY) project, that is highly manual and prone to human error. Linux open-source HA extensions require a high degree of technical skill, creating complexity and reliability issues that challenge most operators. SUSE Linux Enterprise Server and Red Hat Enterprise Linux both solutions offer both a SAN and SANless environment but require that a replication software called DRBD be installed and configured in the OS to support data replication in the SANless environment. Unfortunately, this requires heavy custom scripting, which can take a long time to test and validate and requires retesting when any updates are made to the environment. Since these companies are operating system companies first and foremost, their support is geared towards operating system-level issues and often there is little to no HA expertise to help a customer with their issues. Oracle RAC is a high availability solution, but it is primarily architected for the database management tier. This means you will need a different HA solution for those components that do the monitoring, management, and recovery of your application tiers. Oracle RAC is also very expensive – requiring you to upgrade to Oracle Enterprise Edition in addition to paying for the RAC option – typically hundreds of thousands of dollars – when compared to other Linux clustering solutions, such as SIOS Protection Suite. SIOS Protection Suite for Linux ClusteringThe SIOS Protection Suite for Linux provides a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies, protecting your business-critical applications from downtime and disasters. While SIOS Protection Suite can operate in a SAN environment to support a traditional HA hardware-based cluster, the architecture takes a shared-nothing approach to server clustering allowing it to run SANless. It delivers a robust, versatile and easily configurable solution with automatic and manual failover/failback recovery policies for a wide variety of applications. SIOS Protection Suite for Linux includes:
It is the SIOS’ team’s depth of knowledge in application recovery and the solution’s automation of application monitoring and recovery that makes it easier to use and a better, less expensive choice when compared to the Linux clustering solutions offered SUSE, Red Hat, and Oracle. In addition, SIOS LifeKeeper supports all major Linux distributions, including Red Hat Enterprise Linux, SUSE Linux Enterprise Server, CentOS, and Oracle Linux and accommodates a wide range of storage architectures. SIOS software has been adapted and optimized to run on these operating systems and the components are tested so ensure the SANless cluster solution will work on each OS. Lastly, with the SIOS Protection Suite for Linux, you can run your business-critical applications in a flexible, scalable cloud environment, such as Amazon Web Services (AWS) without sacrificing performance, high availability, or disaster protection. Linux Clustering in AWSWhile cloud providers, such as AWS, provide high availability options, they do not provide the level of high availability and breadth of protection across the whole application infrastructure that customers demand and that you once achieved by using clusters before cloud computing. That is why AWS is partnering with SIOS. SIOS Protection Suite for Linux achieves these desired levels of high availability for our mutual customers and the critical applications they are moving to the AWS cloud. SIOS Protection Suite for Linux on AWS provides all the elements you need to create a high availability Linux cluster in a virtual private cloud (VPC) within a single AWS Region across two Availability Zones. It also supports out-of-the-box protection for SAP systems, Oracle databases, and other business-critical applications. SIOS and AWS offer SIOS Protection Suite Quickstart on AWS, which helps you create a fully configured and operational Linux high availability cluster in a few short steps. It sets up an AWS architecture for SIOS Protection Suite for Linux and deploys it into your AWS account in about half an hour. This Quick Start, available in the AWS Marketplace, is for enterprise users who want to deploy SIOS Protection Suite for Linux on AWS into their test or production environment. SIOS Clustering for LinuxSIOS is a high availability company that has spent the past 20 years focused on delivering HA that is specifically designed for SAP, SQL, Linux, Oracle, and other applications. Its experience is built into its product, and installation and configuration take a fraction of the time and cost when compared to custom scripting with the Linux distributions. In addition, SIOS tests and validates new versions of operating systems and applications so its customers don’t have to. When a customer calls SIOS for support, they are connected to a high availability expert – someone who only focuses on HA and has been doing so for a very long time. For more information, refer to the SIOS white paper, “Implementing High Availability in a Linux Environment.” Additional References https://whatis.techtarget.com/definition/clustered-storage Reproduced from SIOS
|
December 18, 2021 |
Failover ClusterFailover ClusterFailover Cluster Software Solutions: What You Need to Know |
December 13, 2021 |
Data Replication |