How to Protect Applications in Cloud Platforms – SANless Clusters for Cloud Environments
Reproduced with permission from SIOS
SIOS SANless clusters High-availability Machine Learning monitoring
Reproduced with permission from SIOS
Unless you’ve been under a rock or frozen in time you’ve likely heard from one source or another that employers and employees are in the midst of a trend being called “The Great Resignation”. As reported in US News and World Report, “According to the U.S. Bureau of Labor Statistics, 4 million Americans quit their jobs in July 2021 and the trend isn’t slowing down.” No matter your company size or current revenue stream, if it hasn’t already, this trend will impact your IT team in the near future. Yes, let that sink in. The same team that is responsible for ensuring your mission-critical application availability is vulnerable in one way or another to the effects of “The Great Resignation.”
So, how do you recognize the warning signs, come to terms with the reality, and navigate with empathy and clarity through “The Great Resignation” so that it doesn’t cause a “Great Disaster” for your critical applications?
Don’t quit. Seriously! As colleagues and good people are choosing to change jobs, careers, or otherwise leave the workforce it can be tempting to quit. Especially when you begin to consider the prospect of carrying your already heavy workload with an even shortened bench. But don’t quit.
Of course this process of identifying risks is two-pronged. After a resignation, your team is at risk from further personnel changes. But, your High Availability is also at risk due to a loss of capacity, technical knowledge, or expertise. To prevent your enterprise from experiencing unplanned downtime in the wake of new team resignations, you’ll need to identify key areas of risks. Some technical risks include:
Many times as people begin to leave a company, it is very easy to say that it is “them, not us!” We want to focus on all the reasons why their issues led to them leaving, quitting, or choosing a different career or job. It is quite possible that their reason for leaving is entirely personal, however sometimes, the issue is in the mirror and it is not them, but us. Why does figuring out whether it is a problem with them or you matter for HA? Well, if the problem is with your company, such as it’s mission, vision, culture around HA and IT, or hiring and staffing issues for IT and HA system management, then simply adding an additional headcount will be a temporary fix. In addition, the risks to the team morale, commitment, and knowledge transfer may be further eroded as the focus remains on blame shifting versus issue resolution.
Almost every company has had someone quit their team over the past two years. No matter whether they were seeking higher pay, staying at home to care for family members, retiring or pursuing other options, they have left. If you’ve lost a team member, it is essential to assess the remaining team. This assessment will be both technical and non-technical in nature. Technically, you will need to:
a. Identify current skills, abilities and knowledge gaps
What skills are remaining on the team, and what is the level of technical expertise and ability? Where are the knowledge gaps between, especially those between theory and practice?
b. Understand both existing and missing roles.
Many of your team members may be covering multiple roles and responsibilities. The loss of a single team member may actually mean the loss of coverage for multiple roles and responsibilities.
c. Evaluate immediate training or augmentation needs
Where are you covered, but needing additional training to stabilize and solidify the team? What areas do you lack coverage that can be mitigated by training of existing personnel or some form of contract professional services? As VP of Customer Experience, see this firsthand. Our team recently worked with a company needing professional services after losing key team members responsible for their HA environment.
Non-Technically, you will need to:
a. Understand how remaining team members feel
Even prior to the COVID pandemic and period of “The Great Resignation,” many teams were running on fumes. A 24/7 world of HA leaves a lot of work to be done with normal team numbers, norms, and tasks. If your team has been impacted, it is as critical as a down production server to check in and listen to the stories of remaining team members. Find out who is depleted, burned out, confused, nearing a collapse or conversely, full alive and ready for a new challenge. Be sure to listen to verbal and non-verbal cues, empathize (not just with the loss of a colleague, but with their emotions, concerns, and fears).
b. Understand the reasons that the remaining team members are still on board
Knowing how team members feel is both a technical and non-technical necessity, but nearly equal to this task is discovering their reasons for staying. Of course, some reasons may surprise you. Author and speaker Carey Nieuwhof states that some team members are only staying because they “feel trapped on the team because they didn’t leave first.” Other reasons team members stay may not surprise you, but regardless of the reason, comfort, opportunity, salary, location, stock options, passion, teamwork, culture, all of the reasons your team members stay for are important.
c. Evaluate the impact of being short-handed
There is obviously a technical component of being short handed previously discussed; assessing skills gaps, etc. But there is a corollary to the technical assessment of being short handed, and that is non-technical. Be sure to assess and evaluate the impact that being short handed, even if only momentarily, will have on the mental, emotional, and personal health of remaining team members. Early in my career as a manager, our team dealt with a downsizing event that left several employees emotionally vulnerable and mentally exhausted. This led to higher fatigue, more mental fog, and increased rates of defects and mistakes by those team members. If your team is severely impacted mentally and physically by being short-handed, the risk to your HA could increase. Your team may be scrambling to pick up the slack, and they may rally quickly to cover for the leader or team member who has resigned, but it is critical that you understand if those who remain are also exhausted, feeling trapped, or at risk to leave.
Years ago, a senior executive left the company. Despite having transitioned his roles and tasks throughout nearly a year of transition, there were still roles and tasks that surprised the remaining staff. In today’s wave of resignations you don’t have a full year of transition. Furthermore, if your team has experienced more than one resignation, you probably haven’t completed the analysis and transition of the first person so it is very critical to identify and prioritize the most critical tasks, and assign responsibilities. Be sure to list out tasks such as: security scans, updates, maintenance, backups, tests, new application deployments, cost analysis, cloning and redeployment of images, patch application, and vulnerability remediation. These tasks will all remain necessary despite the losses and can have devastating effects if left to linger.
Tasks, roles and responsibilities still need to be covered. Critical issues will need to be addressed. Unplanned downtime will not wait to happen after you have rebuilt your staff, trained existing personnel, and fitted your company to be more resilient to the transitions and changes of the Great Resignation. In order to navigate in the short term, you will need to develop a smart, realistically achievable short term plan. This plan should map out the procedures, tasks and processes identified so that maintenance and operation can continue. Furthermore, it should define how existing critical infrastructure policies can be managed carefully through the tumultuous seasons to come.
The previous steps have led up to this. With an assessment of the current team, and identification of your key risks, and a transition plan in place the next step is to focus on the future. You still have a mission. You still have critical applications that need to be highly available. You still have data that needs to be protected, mined, replicated, and available for your business. Start making plans for the future team.
Not all of the news about “The Great Resignation” is bad news for your team and HA. In the wake of team members leaving for new or different positions and opportunities, you have a real and rare opportunity to take all the information of your assessments and turn them into tools for growth and alignment and a better HA future. Building this brighter future includes defining the duties, roles, and skills needed, updating architectures and designs, planning for new hires and services engagements, and focusing on building a healthier team.
I discussed this subject in more detail in this recent TFir interview.
-Cassius Rhue, VP, Customer Experience
Reproduced from SIOS
Downtime has become more costly than ever before for modern businesses. The ITIC 2021 Hourly Cost of Downtime Survey found that in 91% of organizations, one hour of downtime in a business-critical system, database, or application costs an average of more than $300,000, and for 18% of large enterprises, the cost of an hour of downtime exceeds $5 million.
High availability (HA) is an attribute of a system, database, or application that’s designed to operate continuously and reliably for extended periods. The goal of HA is to reduce or eliminate unplanned downtime for critical applications. This is achieved by eliminating single points of failure by incorporating redundant components and other technologies in the design of a business-critical system, database, or application.
Service-level agreements (SLAs) are used by service providers to guarantee that a customer’s business-critical systems, databases, or applications are up and running when the business needs them.
IDC has created an SLA model that defines uptime requirements at five levels as follows:
According to ITIC, 89% of surveyed organizations now require “four-nines” availability for their business-critical systems, databases, and applications, and 35% of those organizations further endeavor to achieve “five-nines” availability.
In addition to uptime and availability, two other important HA metrics are Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs). RTO is the maximum tolerable duration of any outage and RPO is the maximum amount of data loss that can be tolerated when a failure happens. Unlike RTO and RPO metrics for disaster recovery which are typically defined in hours and days, RTO and RPO metrics for business-critical systems, databases, and applications are often only a few seconds (RTO) and zero (RPO).
HA clustering typically consists of server nodes, storage, and clustering software.
A traditional, on-premises HA cluster is a group of two or more server nodes connected to shared storage (typically, a storage area network, or SAN) that are configured with the same operating system, databases, and applications (see Figure 1).
One of the nodes is designated as the primary (or active) node and the other(s) are designated as secondary (or standby) nodes. If the primary node fails, clustering allows a system, database, or application to automatically fail over to one or more secondary nodes and continue operating with minimal disruption. Since the secondary node is connected to the same storage, operation continues with zero data loss.
However, the use of shared storage in the traditional clustering model creates several challenges, including:
SANless or “shared nothing” clusters (see Figure 2) address the challenges associated with shared storage. In these configurations, every cluster node has its own local storage. Efficient host-based, block-level replication is used to synchronize storage on the cluster nodes, keeping them identical. In the event of a failover, secondary nodes access an identical copy of the storage used by the primary node.
Clustering software lets you configure your servers as a cluster so that multiple servers can work together to provide HA and prevent data loss. A variety of clustering software solutions are available for Windows, Linux distributions, and various virtual machine hypervisors. However, each of these solutions limits your flexibility and deployment options and introduces various challenges such as technical complexity and expensive licensing.
HA is crucial for business-critical systems, databases, and applications. But with the myriad platforms available, complexity ramps up significantly. That’s why an application-aware solution makes so much sense. What you need is a trusted partner who has extensive expertise in high availability—a partner like SIOS, which has the technological know-how to ensure that your business stays up and running.
Don’t wait for an outage or disaster to find out if you have the resiliency your business needs. Schedule a personalized demo today at https://us.sios.com to see what SIOS can do for your business.
Reproduced from SIOS
A failover cluster is a way of providing high availability protection for applications by eliminating single points of failure by running the same operating system and databases and applications on multiple servers all of which share the same storage or connect to storage that is continuously synchronized. Oracle runs on one of these servers, called the primary. If it fails, application orchestration software (clustering software) moves operations over to one or more secondary servers in a process called a failover. Since the primary and remote servers access the same or identical storage, the Oracle operation can continue with minimal recovery time or data loss. Many organizations consider Oracle to be the backbone of their operations, especially if they are using an Oracle-based SAP system or Oracle ERP System.
Oracle’s clustering software is called Oracle Real Application Clusters (RAC). RAC “enables you to combine smaller commodity servers into a cluster to create scalable environments that support mission-critical business applications.” [1] With Oracle RAC, you can cluster Oracle databases and use Oracle Clusterware to connect multiple servers, so they operate as a single system.
While RAC was previously bundled with Oracle Database Standard Edition (at no extra charge), Oracle has now removed the RAC feature from Standard Edition from version 19c onward. You can purchase Oracle RAC for an additional cost with Oracle Database Enterprise Edition. Unfortunately, this means that any customer wanting to use RAC must upgrade to Oracle Database Enterprise or migrate to the Oracle cloud, both of which are substantially more expensive solutions than the Standard Edition.
SIOS provides a high availability Oracle clustering solution without upgrading to the Enterprise Edition, saving up to 70 percent on licensing costs.
The SIOS Protection Suite for Linux provides a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies, protecting your Oracle database and applications from downtime and disasters. Unlike other clustering solutions that only monitor the server’s operation, SIOS LifeKeeper monitors the health of servers, network connections, storage, all Oracle processes, and any associated applications. Problems are immediately corrected via a set of policy-defined actions ensuring fast recovery without disruption to end-users.
SIOS Protection Suite can operate in a shared storage (SAN) environment to support a traditional HA cluster, or in a shared-nothing (SANless) storage configuration in cloud, hybrid, and other environments where shared storage is impractical or impossible. It delivers a robust, versatile, and easily configurable cluster with automatic and manual failover/failback recovery policies for your Oracle databases and applications.
SIOS Protection Suite for Linux includes:
SIOS LifeKeeper supports all major Linux distributions, including Red Hat Enterprise Linux, SUSE Linux Enterprise Server, CentOS, and Oracle Linux and accommodates a wide range of storage architectures. SIOS software has been adapted and optimized to run on these operating systems and the components are tested to ensure the SANless cluster solution will work on each OS.
With the SIOS Protection Suite for Linux, you can run your Oracle applications in a flexible, scalable public cloud environment, such as Amazon Web Services (AWS) or Microsoft Azure, without vendor lock-in or sacrificing performance, high availability, or disaster protection.
SIOS Protection Suite for Linux on AWS or Azure provides the elements you need to create a high availability Linux cluster across cloud Fault Domains and Availability Zones giving you geographical separation for protection from sitewide and regional disasters and outages.
In a Windows Server Failover Clustering (WSFC) environment, you can use SIOS DataKeeper Cluster Edition to synchronize local storage using efficient host-based replication for SANless clustering. SIOS DataKeeper Cluster Edition software protects your business-critical Windows environments, including Oracle, from downtime and data loss.
SIOS SANless cluster software provides the enterprise-grade high availability, reliability, and flexibility needed for your Oracle databases and applications when operating in VMware, Hyper-V, KVM, and XenServer environments.
SIOS Protection Suite for Linux protects your Oracle databases and applications running on Linux in a virtual environment. If you are running Oracle on Windows in a virtual environment, SIOS DataKeeper Cluster Edition protects your business-critical Windows environments, including your Oracle databases and applications.
SIOS offers integrated data replication, high availability clustering and disaster recovery solutions supporting Oracle on both Linux and Windows to provide fault-resilient protection for small and large organizations alike at a fraction of the cost of other Oracle clustering solutions. With SIOS SANless clusters, you do not need expensive shared storage to achieve full high availability application and database protection. Instead, you can run your Oracle databases and applications in the cloud where there is no SAN. And, SIOS can protect your Oracle database and applications on-premises and in virtual and hybrid environments as well.
For more information on how SIOS can protect your Oracle databases and applications, click here or a personalized demo.
[1] https://docs.oracle.com/cd/B28359_01/rac.111/b28254/admcon.htm#RACAD7148
Reproduced with permission from SIOS
Your SAP system is the lifeblood of your organization and if the system is down, your operations stop. To support high availability of your SAP system, your IT team can install SAP in a cluster environment.
A cluster is a group of two or more connected servers that are configured with the same operating system, databases, and applications. These connected servers are referred to as “nodes.” One of the nodes is designated as the primary node. If a primary node fails, clustering allows your organization to automatically fail over application operation to one or more secondary nodes, mitigating downtime, eliminating data loss, and maintaining data integrity.
High availability SAP clustering solutions are available for servers that run in Linux or in Windows environments.
The front-end application needs high availability, i.e. S/4 HANA, as does any other app that is HANA dependent.
There are several open-source HA solutions for SAP from Linux vendors, such as SUSE and RedHat, that include HA extensions with their “Enterprise for SAP” subscriptions. These vendors bundle in open source software you can use to build high-availability clusters for HANA database, ABAP SAP Central Service (ASCS), Evaluated Receipt Settlement (ERS), and other SAP components.[1]
SUSE HAE (and other open-source clustering options) are highly manual and only protect individual components. For example, integrating SUSE HAE and other open source solutions with SAP or SAP HANA can be time-consuming and complex, requiring careful, manual scripting and tedious confirmation steps. Specific deep expertise in the applications and database are also required to create an application-aware HA solution.
SAP also offers HANA System Replication, a feature that comes with the HANA software. It provides continuous synchronization of an SAP HANA database to a secondary location either in the same data center, remote site, or in the cloud. The data is replicated to the secondary site and preloaded into memory. When a failure happens, the secondary site will take over without a database restart, which helps to reduce the Recovery Time Objective (RTO). Unfortunately, failback to the primary node must be manually triggered with separate commands issued. There is also no integrated HA failover orchestration together with SAP Central Services etc. components. [2]
SIOS HA clustering software provides comprehensive SAP-certified protection for your applications and data, including high availability, data replication, and disaster recovery in an easy, cost-efficient solution. SIOS software lets you protect SAP in Windows or Linux environments, using the server hardware of your choice in any combination of physical, virtual, cloud (public, private, and hybrid) and high-performance flash storage environments. SIOS software is easily configured and provides fast replication, comprehensive monitoring, and protection of the entire SAP application environment. It offers continuous data availability in either a shared (SAN) storage or share-nothing (SANless) storage environment.
For SAP S/4HANA and the SAP HANA databases, SIOS can be used to complement what SAP is already doing with the HANA system replication to provide complete automated high-availability – automated monitoring of key SAP HANA application processes, and automated failover and failback.[3]
The SIOS Protection Suite for Linux provides a tightly integrated combination of high availability failover clustering, continuous SAP application monitoring, data replication, and configurable recovery policies, protecting your SAP application from downtime and disasters. While SIOS Protection Suite can operate in a SAN environment to support a traditional HA hardware-based cluster, the architecture takes a shared-nothing approach to server clustering allowing it to run SANless. It delivers a robust, versatile and easily configurable solution with automatic and manual failover/failback recovery policies for a wide variety of applications.
SIOS Protection Suite for Linux Supports SAP Clustering as Follows:
ARKs provide application-specific awareness and connect the application stack to the HA solution in context, including all dependent components. For example, SIOS offers an SAP HANA Application Recovery Kit, which provides host auto-failover, storage replication, and system replication to increase availability.
Lastly, with the SIOS Protection Suite for Linux, you can run your business-critical applications in a flexible, scalable cloud environment, such as Amazon Web Services (AWS) and Azure, without sacrificing performance, high availability, or disaster protection.
SIOS DataKeeper Cluster Edition is a software add-on that simply and seamlessly integrates with WSFC to add performance-optimized, host-based synchronous or asynchronous replication. With DataKeeper, you can easily create a SANless cluster to achieve high availability and disaster recovery for your SAP application, whether operating in the cloud, in a virtualized environment such as VMware, or on physical servers using only local storage. It adds efficient replication to synchronize local storage on each cluster node, creating a SANless cluster that appears to WSFC like traditional storage. With it, you can create a Windows cluster in a cloud, hybrid cloud, or extend a traditional on-premises SAN-based cluster with a node in the cloud for disaster recovery.
Using SIOS DataKeeper Cluster Edition, you can achieve high availability protection to critical SAP components including ABAP SAP Central Service (ASCS) Instance, back-end databases (Microsoft SQL Server, Oracle, DB2, MaxDB, MySQL, and PostgreSQL), the SAP Central Services Instance (SCS).
SIOS DataKeeper not only eliminates the cost, complexity, and single-point-of-failure risk of a SAN, but also lets you use the latest in fast PCIe Flash and SSD in your local storage for performance and protection in a single cost-efficient solution.
SIOS DataKeeper also provides SAP high availability and disaster recovery in cloud environments, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Services without sacrificing performance.
If your organization does not use WSFC, SIOS offers Protection Suite for Windows, which includes SIOS DataKeeper, SIOS LifeKeeper, and optional Application Recovery Kits (ARKs) for leading applications such as SAP, and infrastructure operations. It is a tightly integrated SAP clustering solution that combines high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies to protect your business-critical SAP application and data from downtime and disasters.
Organizations across the globe use SIOS HA solutions to protect their SAP application, whether running in a Windows or Linux environment. Here are just a few examples:
For more information on high availability SAP clustering, click here.
References
[1] Ibid.
[2] https://blogs.sap.com/2020/05/03/high-availability-and-dr-for-sap-hana-sap-s-4hana-and-sap-central-services/
[3] Ibid.
Reproduced with permission from SIOS