High Availability Archives - Page 24 of 46

Stages of IT Disaster Recovery Grief

March 8, 2021 by Jason Aw Leave a Comment

Stages of IT Disaster Recovery Grief

Disaster recovery grief can hit you out of nowhere if you haven’t implemented the right enterprise availability architecture. Meet our friend Dave in IT to walk us through the 5 stages of disaster grief.

Stage 1: Denial

Dave in IT: “Uh oh. What’s that alert? It’s just a little application crash, right? No big deal. I’ll have things up and running in no time.”

In the land of enterprise availability, there is no such thing as a little application crash or no big deal. Companies have SLA with real money on the line. Your selective reality is probably not the same perspective of your customers and stakeholders.

Stage 2: Anger

Dave in IT: “Are you kidding me. Of all the… [censored]...times, today the application won’t start. Ughh. I hate this[censored]...[censored]... application. Wait, what’s this new alert. Seriously, now, the datacenter is down!”

It gets messy really, really fast in the fast pace, and high stakes environments. When unchecked alerts and failures happen, problems can mount quickly along with pressure, frustration and anger.

State 3: Bargaining

Dave in IT: “Hey Ard in Applications, this is Dave in IT. Do you guys have any backups for the App1 environment? . . .Ard are you sure? Could you just check again? I know you’ve checked twice, but can you check one more time. I’ll buy drinks on Taco Tuesday!”

Dave in IT: “Hey Donna DBA, this is Dave in IT. Art in Applications said you might help me out. Did you by chance setup any database replication for that finance database or the inventory management system? . . . Are you sure? Umh, do you remember if we have any way to recover from a umh . . . datacenter crash?”

When my daughter gets in trouble, bargaining is her first go to. Okay, second. The first is to disappear, but you’re too smart to just walk away from the flames. But, Dave in IT isn’t the only one to realize that bargaining and begging is a poor substitute for a well defined strategy for high availability and disaster recovery. Skip the bargaining and begging about your disaster because “80% of the people don’t care, and 20% are glad it’s you (paraphrased from Les Brown).”

Stage 4: Sadness

Dave in IT: “This is just great. The application server crashed, the datacenter is down, and backups, if I can find them and if I can load them, will take hours to get restored. There is no way I’m getting out of this… where did I put that updated resume.”

Of course you have backups, and you’ve validated them. But there is an RTO and RPO impact of going back to those backups. Are you able to absorb this time? That is of course, after your data center recovers.

Step 5: Acceptance

Dave in IT: “It’s been two hours. I never knew we had this many Executive stakeholders before. No way I’m making it to my 2nd year anniversary after this. Well, I guess I’ll clean out my office tomorrow. No way I’m making it through this!”

Failures happen. Datacenters go down. Applications fail. There is no denying the possibility of losing a data center, having a server fail, or an application crash. This type of acceptance is normal, a part of improving your availability. Accepting that you may lose your job or worse because you failed to implement an availability strategy is something the experts at SIOS Technology Corp. want to make sure you avoid.

Don’t be like Dave in IT. Avoid the stages of disaster grief, and the hours of disaster recovery and downtime by architecting and implementing an enterprise availability architecture that includes the best of hybrid, on-premise, or cloud coupled with the best solution for monitoring, recovery, and system failover automation.

– Cassius Rhue, VP Customer Experience

Reproduced from SIOS

Why Does High Availability Have To Be So Complicated?

March 1, 2021 by Jason Aw Leave a Comment

Why Does High Availability Have To Be So Complicated?

Why Does love High Availability Have To Be So Complicated?

It’s the Hallmark movie season, I mean Christmas season, I mean Hallmark Christmas movie season… (don’t judge too harshly, I’m a father of six young ladies, a hopeless romantic, and married to an amazing spouse who enjoys a good holiday laugh and happy ending). If you are in the Hallmark movie season, you know that it is highly likely that you’ll hear the phrase, “Why is love so complicated?” It will be spoken just before the heartbroken young person has developed feelings for a new love interest, and is ready to dance the night away in their arms, just as the old flame walks into the party. If you aren’t into the Hallmark holiday romances, maybe it isn’t love that you are wondering about. Perhaps you want to know: “Why does high availability have to be so complex.

Ten Reasons That High Availability Is So ‘Gosh Darn’ Complicated:

The speed of innovation

Cloud computing, edge computing, hyper converged, multi-cloud, containers, and machine learning are changing the landscape of enterprise availability at a blistering pace. By conservative estimates, AWS currently has over 175 services, and “provides a highly reliable, scalable, low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world.” Choosing an HA solution that allows consistent management across all of these environments, with infrastructure and application awareness is an important way to reduce complexity.
Randomness of disasters

Someone once said, “make your solution disaster proof, and the universe will build a better disaster.” Not only are we seeing innovations in the realm of technology, but also in the world of disasters. Resource starvation, cooling system disasters, natural disasters, power grid failures, and a host of new and random disasters often make it harder to insulate the entirety of your enterprise. Last year’s solutions will likely need updates to handle this year’s unprecedented outages. It’s important to work with a vendor that has focused on high availability for many years – who has firsthand experience with finding solutions to the randomness of disasters.
Application complexity

As technology moves head in the realm of virtualization and cloud computing, applications are following suite. As these application vendors add new options to take advantage of the cloud, they are also adding additional complexity. Your applications should be protected by solutions designed for higher availability and clustering in AWS, Azure, GCP or other environments. Look for vendors who provide greater application awareness, understanding of best practices, and who deliver availability solutions architected to taking account of how the application may have been architected and are able to optimize the application’s orchestration in the cloud.
Advances in threats

The threats to your enterprise also impact your availability. Systems have always had to handle the attacks from intruders, hackers, and even the self-inflicted. These attacks have become more sophisticated, and the solutions and methods to avoid being victimized often impact the layout, architecture, and software that is deployed within your organization. This software has to “play nice” with your availability solution and your applications. As VP of Customer Experience for SIOS Technology, I have seen how an overly aggressive virus scanner can impact your application and your availability solution. Ensure you understand the impact of your security systems on your HA/DR environment and choose a HA solution that works with, not against your security goals.
Regulatory requirements

Data breaches impact the architecture for your application, hypervisor and environment, but so too does the regulatory requirements. Businesses that have become global now have to make sure they are compliant with data handling regulations in multiple countries. This can impact what region your solutions can be deployed in, and how many zones you can use for redundancy. Additional, regulatory requirements can also impact the teams that can support your organization which may impact the choices for your availability software and support.
Shrinking windows

In the world of 24/7 searches, shopping, gaming, banking, and research the windows are shrinking. Queries must run faster and take less time. Responses have to be quicker and have better data. This means that the allowable downtime for your environment is shrinking faster than you previously imagined. It also means that maintenance windows are tighter, packed, and have to be optimized and highly coordinated. Work with an HA vendor that can provide guidance on optimizing your cluster configuration for both application performance and fast recovery time.
Increasing competitive pressure

I grew up in a small town. The hardware store had one competitor. The grocery store had one competitor. The bookstore, antique shop, car dealership, rental office, and bank all had one competitor. Today, you have thousands upon thousands of competitors who want nothing more than to see your customers in their checkout carts. This competition impacts the complexity of your entire business. It weighs heavily on what can and cannot be done in maintenance windows, with upgrades, and at what speed you innovate. Environments that may have been refreshed once every five years have moved to the cloud where optimizations and advancements in processor speed and memory can be had in seconds or minutes. Systems that once had a single run book covering a simple list of applications now look closer to “War and Peace” and cover the growing number of processes, products, services and intelligence being added to increase profits while simultaneously working to reduce risks and downtime.
High availability solution costs

We all wish we had an unlimited budget, but the reality between what you have available is sometimes somewhere between a little and not enough. Teams are often forced to balance consumption versus fixed cost, license costs for applications on the standby clusters, and associated costs for availability software. Enterprise licenses often add a ‘tough to swallow’ price tag for a standby server in an availability environment. Architecting an availability solution is never free, even if you are a hard core ‘DIY’ team. DIY comes with additional costs in maintenance, management, source control, testing, deployment, version management and version control, patches, and patch management. While your team of experts may be clearly up for the challenge, your business likely would prefer their highly valued talents be applied to creating more revenue opportunities.
Business growth

Growth of your business due to innovation means that your teams are now responsible for more critical applications, more sites, more offices, and more data that needs to be accessible and highly available. As your business grows and thrives the challenges that come with scaling up and scaling out add to the complexities mentioned previously, but also just expand what you have to prepare and plan for.
Team turnover

The complexity of the environments, speed of innovation, growth of your business, advances in the application tier, and growth in the competitive landscape brings with it the challenge of retaining top talent to keep your infrastructure running smoothly. Most companies understand that availability is a merger of people, process, product, and architecture among other things. So finding ways to reduce the complexity of clustering environments with automated configuration, documented run books, leveraging products with consistent HA strategies across the infrastructure is a key to both retaining the talent that installs and manages your infrastructure, and mitigating the risks and heavy lifting of those responsible for the key components of availability.

Let’s face it, love takes hard work, good communication, time, investment, skill and determination. There are no shortcuts to a successful relationship. The same can be said about achieving the best outcomes in an ever emerging, increasingly complex, and fluid technology space within your enterprise. Availability, clustering, disaster recovery and up time is so ‘gosh darn’ hard because it requires a serious, dedicated, non-stop top to bottom cultural shift accounting for the speed of innovation, the complexity of applications and orchestration, competition and growth, and the other components of keeping applications, databases, and critical infrastructure available to those who need them, when they need them.

-Cassius Rhue, Vice President, Customer Experience

How to Fix Inherited Application Availability Problems

February 26, 2021 by Jason Aw Leave a Comment

How to Fix Inherited Application Availability Problems

What to do when you inherit a mess

I grew up in a large immediate family, and an even larger group of well-meaning aunts, uncles, and family friends. Anyone who has ever been a part of a large family has probably, on more than one occasion, received a hand-me-down or had well-intentioned relatives give you a freebie. And if so, you know that beneath the surface of that cool-sounding inheritance, the rumored stylish clothes, or the old “family car” a nightmare could be lurking. Suddenly, your sudden fortune on four wheels feels like a curse that is two-thirds money pit and one third eye-sore.

So what do you do when you inherit a mess of Application Availability Problems? Well some DIYers bring in the dumpsters and start fresh. But this isn’t HGTV and we aren’t talking about inherited furniture but an inherited application availability problem. You usually know you have a mess on your hands the first time you try to do a cluster switchover for simple, planned maintenance and your application goes offline. Now, what do you do when you have inherited a high availability mess.

Two Practical Tips For When You Inherit A High Availability Mess (I mean responsibility)

I. Research

Perhaps one of the best things you can do before taking action is to gather as much data as quickly as possible. Of course, the state of your inheritance might indicate the speed at which you’ll need to gather your data. Some key things to consider during your research of to solve your Application Availability Problems:

Previous owner. Research the previous owner of the configuration including their chain of command, reach of authority, background, team dynamics and if possible, charter. Find out what were the original organizational structures.
Research what was done in the past to achieve high or higher availability, and what was left out. In some environments, the focus for high availability falls squarely on a portion of the infrastructure while neglecting the larger workflow. Dig into any available requirements. As well as what changes have been implemented or added since the requirements were originally instated. If you’re in the midst of a cloud migration, understand the goals of moving this environment to the cloud.
Owners and requirements provide a lot of history. However, you’ll also want to research why key decision makers made the choices and tradeoffs on designs and solutions, as well as software and hardware architecture requirements. Evaluate whether these choices were either successful or unsuccessful. Your research should focus on original problems and proposed solutions.
You may also want to consider why the environment you inherited feels like a mess. For example, is it due to lack of documentation, training, poor or missing design details, the absence of a run book, or other specification details.
Research what, if any, enterprise grade high availability software solutions have been used to complement the architecture of virtual machines, networks, and applications. Is there a current incumbent? If not, what were the previous methods for availability?

II. Act

Once you’ve gathered this research, your next step is to act: update, improve, implement, or replace. Don’t make the mistake of crossing your fingers and hoping you never need a cluster failover.

Upgrade

In some cases, your research will lead to a better understanding of the incumbent solution and a path to upgrade that solution to the latest version. Honestly, we have been there with our own customers. Transitions are mishandled. A solution that works flawlessly for years becomes outdated.
Improve

Consider alternatives if an upgrade is not warranted. If the data points to other areas of improvement such as software or hardware tuning, migration to cloud or hybrid, network tuning, or some other identified risk or single point of failure. Perhaps your environment is due for a health check or the increases in your workload warrants an improvement in your instance sizes, disk types, or other parameters.
Implement

In other cases, your research will uncover some startling details regarding the lack of a higher availability strategy or solution. In which case, you will use your research as a catalyst to design and implement a high availability solution. This solution might necessitate private cloud, public cloud, or hybrid cloud architectures coupled with the enterprise grade HA software to enable successful monitoring and recovery.
Replace

In extreme cases, your research will lead you to a full replacement of the current environment. Sometimes this is required when a customer or partner migrated to the cloud. But their high availability software offering was not cloud ready. While many applications boast of being cloud ready, in some cases this is more slideware than reality. Your on-premise solution is not cloud ready? Then your only recourse may be to go with a solution that is capable of making the cloud journey with you, such as the SIOS Protection Suite products.

As VP of Customer Experience for SIOS Technology I experienced a situation that shows the importance of these steps – when our Services team was engaged by an enterprise partner to deploy SIOS Protection Suite products. As we worked jointly with the customer, doing research, we uncovered a wealth of history. The customer professed to have a limited number of downtime or availability issues. But our research revealed an unsustainable and highly complex hierarchy of alerts, manually executed scripts, global teams, and hodgepodge of tools kludged together. We were able to successfully architect and replace their homemade solution with a much more elegant and automated solution with this information. Best part, it was wizard based, including automated monitoring, recovery, and system failover protection. No more kludge. No more trial-and-error DIY. Just simple, reliable application failover and failback for HA/DR protection.

If you have inherited a host of Application Availability Problems, contact the deployment and availability experts at SIOS Technology Corp. Our team can walk you through the research process, help you hone your requirements. Finally, upgrade, improve, replace or implement the solution to provide your enterprise with higher availability.

– Cassius Rhue, Vice President, Customer Experience

Reproduced from SIOS

Quick Start Guide to High Availability for SQL Server Using SIOS Protection Suite for Linux

February 18, 2021 by Jason Aw Leave a Comment

Quick Start Guide to High Availability for SQL Server Using SIOS Protection Suite for Linux

This guide is intended to illustrate Microsoft SQL Server protection using SIOS Protection Suite for Linux. The environment used here is VMware ESXi with virtual machines added running CentOS 7.6. Microsoft SQL 2017 is being used to create a database server. Database and transaction logs will be stored on local disks that will be replicated between nodes using DataKeeper – demonstrating that shared storage could be used as a simple replacement for local disks.

This guide is available here as a pdf.

Download Required Microsoft Software

Open the following Microsoft guide to installing SQL at https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup?view=sql-server-ver15

Plan SQL Environment Configuration

The following configuration settings will be used for creating the cluster environment described by this quick-start guide. Adapt your configuration settings according to your specific system environment.

General Configuration

The example we installed during this quick start guide uses CentOS. The Red Hat instructions apply since CentOS is binary compatible with Red Hat.
The example in this quick start guide will be very similar, whether they are running in a VMware environment, cloud or physical installations.

Node 1 configuration

Hostname: IMAMSSQL-1
Public IP: 192.168.4.21
Private IP: 10.1.4.21
/dev/sdb (10GiB)
/dev/sdc (10GiB)

Node 2 configuration

Hostname: IMAMSSQL-2
Public IP: 192.168.4.22
Private IP: 10.1.4.22
/dev/sdb (10GiB)
/dev/sdc (10GiB)

Virtual IP used for SQL Access

168.4.20, this will be protected by LifeKeeper and “floats” between nodes

Operating System

CentOS 7.6

SQL Database Configuration

SQL Database:
SQL Virtual Hostname: IMAMSSQL
SQL Virtual IP: 192.168.4.20

SQL File System Mount Points

/database/data
/database/xlog

PREPARE SYSTEM FOR INSTALLATION

Installing MS-SQL

Initial SQL install

In this section we will add the Microsoft package location into our Linux OS and then instruct the OS to install SQL Server.

Open the following Microsoft guide to installing SQL Server:
https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup?view=sql-server-ver15
Login with root privilege or you use sudo before each command
curl -o /etc/yum.repos.d/mssql-server.repo
https://packages.microsoft.com/config/rhel/7/mssql-server-2017.repo
yum install -y mssql-server
/opt/mssql/bin/mssql-conf setup, I installed my SQL Server with an Evaluation license
yum install -y mssql-tools unixODBC-devel
echo ‘export PATH=”$PATH:/opt/mssql-tools/bin”‘ >> ~/.bash_profile
echo ‘export PATH=”$PATH:/opt/mssql-tools/bin”‘ >> ~/.bashrc
source ~/.bashrc
systemctl stop mssql-server.service, we stop the SQL service and cannot start the SQL service
until we have configured the disks used as storage in the section titled
“Create database and transaction log file-systems and mount points”.
/opt/mssql/bin/mssql-conf set filelocation.masterdatafile /database/data/master.mdf
/opt/mssql/bin/mssql-conf set filelocation.masterlogfile /database/xlog/mastlog.ldf

Create database and transaction log file-systems and mount points

We will use the xfs file-system type for this installation. Refer to LifeKeeper supported file-system types to determine which file-system you want to configure. Make sure you configure the disk to use GUID identifiers. Here we will partition and format the locally attached disks; mount, create and permission the database locations we want SQL to use, finally we will start SQL which will create new Master DB and transaction logs in the location we specified. Note when creating the partition, DataKeeper requires the number of blocks in the partition to be odd. E.g. 20973567 (end) – 2048 (start) = 20971519.

fdisk /dev/sdb
mkfs -t xfs /dev/sdb1
fdisk /dev/sdc
mkfs -t xfs /dev/sdc1
mkdir /database; mkdir /database/data; mkdir /database/xlog
chown mssql /database/; chgrp mssql /database/
chown mssql /database/data/; chgrp mssql /database/data/
chown mssql /database/xlog/; chgrp mssql /database/xlog/
vi /etc/fstab
1. Add /dev/sdb1 mounting to /database/data, e.g. /dev/sdb1 /database/data xfs defaults 0 0
2. Add /dev/sdb1 mounting to /database/xlog, e.g. /dev/sdb1 /database/xlog xfs defaults 0 0
mount /dev/sdb1
mount /dev/sdc1
chown mssql /database/data/; chgrp mssql /database/data/
chown mssql /database/xlog/; chgrp mssql /database/xlog/
systemctl start mssql-server.service, we start the SQL service now that local disks are mounted
– this will create new Master DB and transaction logs

Installing LifeKeeper

Refer to the Installation Guide
http://docs.us.sios.com/spslinux/9.5.1/en/topic/sios-protection-suite-for-linux-installation-guide

Create LifeKeeper Resource Hierarchies

Open the LifeKeeper GUI on the primary node:

# /opt/LifeKeeper/bin/lkGUIapp &

Communication Paths

Create backend and/or frontend IP routes, in our case backend is 10.2.4.21 & 22 and frontend is 192.168.4.21 & 22

[AWS only] Right-click on each instance in the AWS Management Console and select Networking → Change Source/Dest. Check and ensure that source/destination checking is disabled.
In the LifeKeeper GUI, click Create Comm Path.
In the Remote Server(s) dialog, add the host names of the other cluster nodes and select them.

Select the appropriate local (10.2.4.21) and remote (10.2.4.22) IP addresses.
Repeat this process, creating communication paths between all pairs of remote nodes for each network (e.g., 12.0.1.30 and 12.0.2.30). After completion, communication paths should exist between all pairs of cluster nodes.

IP Resources

The IP resource is the virtual IP that will be used to access the SQL server – in this case 192.168.4.20

Verify that all of the virtual IP’s have been removed from the network interface by running
‘ip addr show’.
Create the IP resource for the MSSQL virtual IP.
In the LifeKeeper GUI, click Create Resource Hierarchy and select IP.

4. When prompted, enter the IP 192.168.4.20 and choose the subnet mask 255.255.0.0.

5. Enter a tag name such as ip-192.168.4.20-MSSQL.

DataKeeper Resources

This is the drives used to store the database and transaction logs, /database/data and /database/xlog

Data Replication Resources

Ensure that all SQL file systems are mounted at the appropriate mount points under /database on the primary cluster node.
# mount
…
/dev/sdb1 on /database/data type xfs (rw,relatime,attr2,inode64,noquota)

/dev/sdc1 on /database/xlog type xfs (rw,relatime,attr2,inode64,noquota)
…

2.Ensure that the file systems are not mounted on the backup cluster node(s).

3. In the LifeKeeper GUI, click Create Resource Hierarchy and select Data Replication.

4. For Hierarchy Type, select Replicate Existing Filesystem.

5. For Existing Mount Point, select /database/data

6. Select the appropriate values for the rest of the creation dialogs as appropriate for your environment

Repeat steps 3-6 for the /database/data and /database/xlog file systems.

Quick-Service Protection

We will use LifeKeeper’s Quick Service Protection ARK to protect the mssql-server service, this will monitor the MSSQL service and make sure it’s running.

Use systemctl status mssql-server.service on node 1 to ensure MSSQL is running
Use systemctl status mssql-server.service on node 2 to ensure that MSSQL isn’t running, if it is then you will need to stop the service using systemctl stop mssql-server.service, then unmount the /database/data and /database/xlog directories.
In the LifeKeeper GUI, click add resource
Select the QSP ARK from the drop-down
When the list of services available populates, choose mssql-server.service
Select the appropriate values for the rest of the creation dialogs as appropriate for your environment
Extend the hierarchy to node 2
At the linux CLI on node 1, run “/opt/LifeKeeper/bin/lkpolicy -g –v”, output will look similar to this:
If LocalRecovery: On is set for QSP-mssql-server then we need to disable local recovery on both nodes, this is done by executing (on both nodes):
/opt/LifeKeeper/bin/lkpolicy -s LocalRecovery -E tag=”QSP-mssql-server”
Confirm that Local Recovery is disabled on both nodes, “/opt/LifeKeeper/bin/lkpolicy -g –v” :

Reproduced from SIOS

SIOS Protection Suite for Linux Quick Service Protection

February 6, 2021 by Jason Aw Leave a Comment

How to add custom application support to SIOS Protection Suite - SIOS Protection Suite for Linux Quick Service Protection

Using SIOS Protection Suite for Linux Quick Service Protection Resource

On a recent engagement with the SIOS Professional Services team, a customer inquired about how to protect a custom application with the SIOS Protection Suite for Linux solution. One of the highly experienced high availability experts at SIOS Technology Corp., helped understand the customer’s application and laid out the methods SIOS provides for custom application support.

SIOS Protection Suite for Linux provides multiple methods for adding high availability and application monitoring to custom applications. These options include the following:

Creating a custom application recovery kit (ARK)¹
Creating a generic application resource hierarchy
Creating a quick service protection resource

Type	Coding Complexity	Monitoring	Recovery
Custom Application Recovery Kit Resource¹	Highest	Highest	Highest
Generic Application Resource	Medium	High	High
Quick Service Protection Resource	Low	Medium	Medium

Definitions Used in Chart

Monitoring – defined as the ability to make a determination of the availability, accessibility and functioning of the protected application, database or service. A low level of application, database, or service monitoring provides basic coverage, such as a check for a running process, existence of a pid_file, or that the status command returns a ‘true’ result when executed. Note: A ‘true’ or ‘0 (zero)’ return code does not mean that the application, database, or service is running. But only that the command executed was able to successfully complete with a positive (‘true’ or ‘0 (zero)’) status result. The highest level of monitoring indicates that application specific knowledge is applied to determine the health and functioning of the application beyond lower level methods such as process status, ps output, or systemd status returns. The highest level of monitoring typically applies knowledge of recommended order of healthcheck operations, knowledge of dependencies, and analysis of the results obtained from status and monitoring commands.

Recovery – defined as the ability to restart a failed application, database or service. A low level of recovery capability implies that commands for a restart are issued and expected output are obtained from the issuance of the command. The highest level of monitoring indicates that application-specific knowledge is applied to determine how to initiate an orderly restart of the application, database, or service, which may require knowledge of recommended order of operations, dependencies, rollbacks or other related remediation of a failed service.

Solution: Quick Service Protection Resource

In this engagement, the customer’s application had systemd compatibility. Based on their overall requirements for avoiding coding, minimal monitoring needs, and simple recovery procedures, we recommended the Quick Service Protection (QSP) Resource.

The QSP resource works to quickly add support of a systemd service to the SIOS Protection Suite for Linux resource protection. In the case of Customer Example.com, they have a systemd compatible service, with the minimal required definition needed to start and stop their application.

[Unit]

Description=SIOS ‘as-is’ Example Service 2020

After=network.target

[Service]

Type=simple

Restart=always

RestartSec=3

User=root

ExecStart=/example_app/bin/exampleapp start

ExecStop=/example_app/bin/exampleapp stop

[Install]

WantedBy=multi-user.target

Example.com systemd file

SIOS recommends that prior to attempting the protection of the resource with the SIOS Protection Suite for Linux product, verify via systemctl that the example application stops and starts accordingly:

# systemctl status example

* example.service – SIOS ‘as-is’ Example Service 2020

Loaded: loaded (/usr/lib/systemd/system/example.service; disabled; vendor preset: disabled)

Active: inactive (dead)

# systemctl start example

# systemctl status example

* example.service – SIOS ‘as-is’ Example Service 2020

Loaded: loaded (/usr/lib/systemd/system/example.service; disabled; vendor preset: disabled)

Active: active (running) since Fri 2020-08-21 14:53:27 EDT; 5s ago

Main PID: 19937 (exampleapp)

CGroup: /system.slice/example.service

`-19937 /usr/bin/perl /example_app/bin/exampleapp start

# systemctl stop example

# systemctl status example

* example.service – SIOS ‘as-is’ Example Service 2020

Loaded: loaded (/usr/lib/systemd/system/example.service; disabled; vendor preset: disabled)

Active: inactive (dead)

After verifying that the application functions correctly via systemd, restart the service and ensure that the service is running.

# systemctl start example

# systemctl status example

* example.service – SIOS ‘as-is’ Example Service 2020

Loaded: loaded (/usr/lib/systemd/system/example.service; disabled; vendor preset: disabled)

Active: active (running) since Fri 2020-08-21 15:59:44 EDT; 3min 2s ago

Main PID: 30740 (exampleapp)

Refer to the SIOS Protection Suite for Linux Quick Service Protection Suite documentation for additional details on the resource create process.

Using the SPS-L UI select the Create option, indicated in the Global UI Resource Toolbar by the following icon:

Once the create wizard is launched, select the Quick Service Protection option in the Create Resource Wizard Window

In the next prompt for ‘Switchback Type’, choose whether you will use intelligent switchback or automatic switchback.

After selecting the ‘Switchback Type’, the Server dialogue appears allowing you to choose the primary server for the custom application.

(Note: If the service requires storage, be sure to choose the same primary server previously selected for the storage resources.)

In the Service Name dialog box, find the service for your custom application.

Once you’ve selected the correct service, example, determine whether you will enable monitoring or disable the monitoring service. Refer to the documentation to gain an understanding of the monitoring provided by the QSP resource.²

Next, choose a resource tag. A resource tag should be a meaningful name that will help your IT team quickly identify which SPS-L resource protects your application or service.

Lastly, follow the final dialogue to complete the resource creation process. Once the resource is created, use the UI to extend the resource to additional servers. If necessary, create dependencies between the newly protected custom service/application and any other required resources such as storage or IP resources.

NOTES:

¹ Creating a customer application recovery kit can be accomplished via an engagement with the SIOS Technology Corp. Professional Services Team. For more information contact professional-services@us.sios.com

² The QSP Recovery Kit quickCheck can only perform simple health (using the “status” action of the service command). QSP doesn’t guarantee that the service is provided or the process is functioning. If complicated starting and/or stopping is necessary, or more robust health checking operations are necessary, using a Generic Application or Custom Application ARK is recommended

Reproduced from SIOS