April 12, 2024 |
Disaster Recovery Solutions: How to Handle “Recommendations” Versus “Requirements”Disaster Recovery Solutions: How to Handle “Recommendations” Versus “Requirements”Let’s say you experience an issue in your cloud cluster environment, and you have to contact one of your application vendors to get it resolved. They give you the resolution, but they note in their response that the way you have these systems configured is “not recommended”. How do you handle this information? After all, everything’s been working very well so far, and it could take a lot of time and resources to get them reconfigured in the “recommended” way. On the other hand, surely it’s recommended by the vendor for a reason, right? What if it causes other complications down the road? Let’s take a look at what exactly constitutes a recommendation, and ways that you can approach them from either side of acceptance. DR Solution Recommended ConfigurationYou should start looking at how to handle a recommendation by taking it completely literally, defined as “a suggestion or proposal as to the best course of action”. Already we could see a couple of hints here as to how we can approach them with the words “suggestion” and “proposal” being used to identify it. Looking at it this way, it is easy to turn down a vendor recommendation because it is inconvenient, or perhaps it is deemed unnecessary. However, before taking any action on a recommendation, make sure to also take a more pragmatic look at it. After all, there is a reason that the vendor would suggest this particular kind of configuration. They are just as interested in your success as you are as part of an ongoing relationship, so surely it must carry some kind of positive benefit. It could be that without the recommended configuration, you are more susceptible to certain types of errors. It could also be a case of degraded performance, where everything is working fine but it could be working better or faster. Taking this into account, wouldn’t it be better to put in the time and effort to meet these recommendations now, as opposed to starting on it after you have been affected by the drawbacks of not following the recommendation? How to Handle DR Solution Configurations Outside of the RecommendationNow we can build our full perspective on recommendations by drawing together both ends of this discussion. The summarized version is: “It is okay to not follow vendor recommendations, as long as you are aware of why it is recommended and accept the potential drawbacks of doing so”. The crucial first step is always going to be simply talking to the vendor. Ask them questions about why they recommend it, the impact of having it versus not, if they have any methods or procedures to easily transition to a recommended environment, and anything else you can think of to help better inform yourself and your internal teams. Once you understand the impact, you are in the right position to refuse it if you have the proper justification. An example of a good justification for turning down a recommendation is for security purposes. Perhaps the recommended environment would turn off or circumvent certain security measures you have in place, so using that environment would not only make you more vulnerable, but it could also lead to violation of SLAs, partner agreements, or standards that you are bound to. In this case you can inform the vendor of why you are not following the recommended configuration. This can be very beneficial to the vendor as well, as they can take this feedback and in the future implement improvements that can allow for the recommended configuration and the security measures at the same time. As stated earlier, they are also invested in your success, so this is a win for everyone. Disaster Recovery Solution RequirementsSometimes, though, it’s not so easy to say “no” to what the vendor is telling you. This is where you cross the border from a vendor “recommendation” to a vendor “requirement”, and it becomes unavoidable. When it is presented to you as a requirement, it becomes something that you cannot just decline to follow. Still, as with recommendations, it is important to understand why it is a requirement, and what it is actually a requirement for. Certain practices can be required as part of a SLA you agreed on with the vendor, or a TSA for the products, applications, or services. In these cases it would indeed follow that the change needed to meet this requirement has to be made. Requirements also commonly fall in the more technical side of things. For example, specifications on disk size, I/O capacity, or available machine resources, just to name a few. These tend to be necessary for the application to work as intended, so the value in making sure these requirements are met is readily apparent. Disaster Recovery Solution FlexibilityJust because you have to follow the requirement does not mean that you must simply resign yourself. There is still much value to be seen in understanding why that requirement is in place. As with a recommendation, talking to your vendor is vital. Perhaps a reason you do not like the requirement is rooted in a misunderstanding, and discussing the reasoning with your vendor can reveal that and clear away some apprehension. Again, your feedback on these requirements can be very important for your vendor to improve the products or services, and help them understand the value you see in being able to do something a different way. All it takes is just starting a dialog. SIOS High Availability and Disaster RecoverySIOS Technology Corporation provides high availability and Disaster Recovery products that protect & optimize IT infrastructures with cluster management for your most important applications. Contact us today for more information about our services and professional support. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||
April 6, 2024 |
A Step-by-Step Guide to Setting Up an NFS File Witness with SIOS LifeKeeper on LinuxA Step-by-Step Guide to Setting Up an NFS File Witness with SIOS LifeKeeper on LinuxGetting Started with SIOS Lifekeeper and NFS-Based File WitnessIn high availability clustering, a witness plays a crucial role in ensuring the integrity and reliability of the cluster. Without a 3rd node, it can be hard to achieve quorum as there is no data to help break a tie where both nodes think they should go live (This is known as a split-brain). You can solve this problem in many ways, for example, by providing a dedicated witness server, a shared storage path seen by the whole cluster, or simply by having more nodes in the cluster itself (minimum 3!). Thankfully, SIOS LifeKeeper offers robust solutions for setting up high-availability clusters on Linux environments, and configuring a witness to improve quorum is an essential feature. In this guide, we’ll walk you through the steps to set up an NFS-based file witness with SIOS LifeKeeper on Linux, helping you enhance the availability and resilience of your clustered applications. Goal:To achieve a 2-node cluster using an NFS-based storage witness as shown in the diagram below: Prerequisites: Before getting started, ensure you have the following:
Step 1: Install/Modify SIOS LifeKeeper:We will need to either install LifeKeeper at this stage or re-run the setup to add Witness functionality unless you already included it earlier. In my case, I’m using RHEL8.8, so I will mount the ISO before running the setup with the supplementary package needed for RHEL8.8. [root@server1-LK ~]# mount /root/sps.img /mnt/loop -t iso9660 -o loop [root@server1-LK ~]# cd /mnt/loop/ [root@server1-LK loop]# ./setup –addHADR /root/HADR-RHAS-4.18.0-477.10.1.el8_8.x86_64.rpm Here the important part for our purposes is enabling the witness function like in the screenshot below. However, you will also need an additional license file, which you can either add here or add via the command line later at your discretion: Otherwise, configure LifeKeeper for your purposes, or if it was already configured simply proceed through the setup once you’ve included the “Use Quorum / Witness Function” option. If you decided to add the license via the command line also run the following command on each node in the cluster with the correct path to your license file: [root@server1-LK ~]# /opt/LifeKeeper/bin/lkkeyins /<path-to-license-file>l/quorum-disk.lic Step 2: Set up and mount shared storage:Ensure that you have shared storage accessible to all servers in the cluster. You can check each server using either the ‘mount’ command or with ‘findmnt’ to verify that you have it locally mounted: [root@server1-LK loop]# mount | grep nfs sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) 172.16.200.254:/var/nfs/general on /nfs/general type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard, or [root@server1-LK ~]# findmnt -l /nfs/general TARGET SOURCE FSTYPE OPTIONS /nfs/general 172.16.200.254:/var/nfs/general nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard, Should you still need to mount the share yourself, please follow these steps: First, confirm you can see the NFS share on the host server. [root@server1-LK ~]# showmount -e 172.16.200.254 Export list for 172.16.200.254: /home 172.16.205.244,172.16.205.151 /var/nfs/general 172.16.205.244,172.16.205.151 In my case, I want to mount the ‘/var/nfs/general’ share. To mount this share, first, make sure your directory you plan to mount it to exists. If not, create it: [root@server1-LK ~]# mkdir -p /nfs/general Now you can manually mount the share using the following command to confirm you can connect, and it works: [root@server1-LK ~]# mount 172.16.200.254:/var/nfs/general /nfs/general Finally, once happy, add the mount point to you’re /etc/fstab file so it will mount on boot: [root@server1-LK ~]# cat /etc/fstab # # /etc/fstab # Created by anaconda on Thu Jan 25 12:07:15 2024 # # Accessible filesystems, by reference, are maintained under ‘/dev/disk/’. # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info. # # After editing this file, run ‘systemctl daemon-reload’ to update systemd # units generated from this file. # /dev/mapper/rhel-root / xfs defaults 0 0 UUID=6b22cebf-8f1c-405b-8fa8-8f12e1b6b56c /boot xfs defaults 0 0 /dev/mapper/rhel-swap none swap defaults 0 0 #added for NFS share 172.16.200.254:/var/nfs/general /nfs/general nfs4 defaults 0 0 Now, you can confirm it is mounted using the mount command: [root@server1-LK ~]# mount -l | grep nfs sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) 172.16.200.254:/var/nfs/general on /nfs/general type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576, As you can see from the highlighted text above, it has now been mounted successfully. Repeat on all servers until you are sure all servers have the share mounted before proceeding. Step 4: Check your hostnames and configure /etc/default/LifeKeeper settings:You can see the hostname LifeKeeper knows for each of your servers by running the following command on each node: /opt/LifeKeeper/bin/lcduname Example of settings you’ll need to add to the /etc/default/LifeKeeper file: WITNESS_MODE=storage QWK_STORAGE_TYPE=file QWK_STORAGE_HBEATTIME=6 QWK_STORAGE_NUMHBEATS=9 QWK_STORAGE_OBJECT_server1_LK_localdomain=/nfs/general/nodeA QWK_STORAGE_OBJECT_server2_LK_localdomain=/nfs/general/nodeB For ‘QWK_STORAGE_OBJECT_<server-name>’, you need to declare this for each node, and it is formed using your hostname as well as the path, and the desired location of the witness file itself. It should be noted that if the hostname contains a “-” or “.”, replace them with an underscore “_” In my example, I had the following hostnames: server1-LK.localdomain server2-LK.localdomain Which meant adding the following ‘QWK_STORAGE_OBJECT_’ definitions: QWK_STORAGE_OBJECT_server1_LK_localdomain=/nfs/general/nodeA QWK_STORAGE_OBJECT_server2_LK_localdomain=/nfs/general/nodeB In addition, we will need to adjust one of the existing settings in /etc/default/LifeKeeper: QUORUM_MODE=storage To help understand why we have set both our WITNESS_MODE and QUORUM_MODE to storage take a look at the following table: Supported Combinations of a Quorum Mode and Witness Mode LifeKeeper supports the following combinations.
We have a two-node cluster that wants to use external storage for a quorum, so the only supported combination would be ‘storage’ for both values. However, you can see from the table how flexible this can be when you require more nodes, offering many ways to achieve communication and provide a quorum. Step 4: Initialize the Witness file:To initialize the witness file and enable its use, you must run the following command on each node: [root@server1-LK ~]# /opt/LifeKeeper/bin/qwk_storage_init It will pause when run until each node has completed so execute the command on the first node in the cluster, then the second, and so on before coming back to check the command is completed with no errors. Example: [root@server1-LK ~]# /opt/LifeKeeper/bin/qwk_storage_init ok: LifeKeeper is running. ok: The LifeKeeper license key is successfully installed. ok: QWK parameter is valid. QWK object of /nfs/general/nodeA is not yet avail. /nfs/general/nodeA already exsits as not QWK_STORAGE_OBJECT: overwrite? (y/N): y ok: The path of QWK object is valid. ok: down: /opt/LifeKeeper/etc/service/qwk-storage: 1377s ok: Initialization of QWK object of own node is completed. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. ok: quorum system is ready. ok: run: /opt/LifeKeeper/etc/service/qwk-storage: (pid 14705) 1s, normally down Successful. Step 5: Validate Configuration:The configuration can be validated by running the following command: /opt/LifeKeeper/bin/lktest Should it find any errors, they will be printed to the terminal for you. In the example below, I hadn’t replaced the special characters in my hostname so it highlighted it was unable to find the storage. [root@server1-LK ~]# /opt/LifeKeeper/bin/lktest /opt/LifeKeeper/bin/lktest: /etc/default/LifeKeeper[308]: QWK_STORAGE_OBJECT_server1_LK.localdomain=/nfs/general/nodeA: not found /opt/LifeKeeper/bin/lktest: /etc/default/LifeKeeper[309]: QWK_STORAGE_OBJECT_server2_LK.localdomain=/nfs/general/nodeB: not found F S UID PID PPID C CLS PRI NI SZ STIME TIME CMD 4 S root 2348 873 0 TS 39 -20 7656 15:49 00:00:00 lcm 4 S root 2388 882 0 TS 39 -20 59959 15:49 00:00:00 ttymonlcm 4 S root 2392 872 0 TS 29 -10 10330 15:49 00:00:00 lcd 4 S root 8591 8476 0 TS 19 0 7670 15:58 00:00:00 lcdremexec -d server2-LK.localdomain -e — cat /proc/mdstat You can also confirm that the witness file is being updated via the command line like so: [root@server1-LK ~]# cat /nfs/general/nodeA signature=lifekeeper_qwk_object local_node=server1-LK.localdomain time=Thu Feb 15 14:10:56 2024 sequence=157 node=server2-LK.localdomain commstat=UP checksum=13903688106811808601 A Successful File Share Witness Using NFSSetting up a file share witness using NFS is easy! It can be powerful if you are restricted to two nodes but need better resilience to split-brain events, especially in the cloud where you can leverage something like AWS’s EFS… Another essential part can be utilizing more communications paths, but that’s a different blog. However, by following the steps outlined in this guide, you can enhance the resilience of your clustered applications and minimize the risk of downtime. Always refer to the SIOS documentation and best practices for further guidance and optimization of your high-availability setup. It’s publicly available and extremely comprehensive! SIOS High Availability and Disaster RecoverySIOS Technology Corporation provides high availability and Disaster Recovery products that protect & optimize IT infrastructures with cluster management for your most important applications. Contact us today for more information about our services and professional support. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||
March 30, 2024 |
SIOS Product Management team is pleased to announce the general availability of SIOS LifeKeeper for Linux v 9.8.1.SIOS Product Management team is pleased to announce the general availability of SIOS LifeKeeper for Linux v 9.8.1.New in LifeKeeper Linux v 9.8.1
Reproduced with permission from SIOS
|
||||||||||||||||||||||||||||
March 25, 2024 |
First 30 days: Key things to know for a newbie to SIOS LifeKeeper or SIOS DataKeeperFirst 30 days: Key things to know for a newbie to SIOS LifeKeeper or SIOS DataKeeperAs a relatively new employee, my boss asked me to write down my impressions of SIOS products and things that newbie’s to SIOS might like to know. Here are my thoughts. Key Product Concepts: Clustering and Data MirroringLifeKeeper (Windows or Linux) is clustering software that monitors the whole application stack (network, storage, O/S, database, application software and server hardware). It allows you to specify backup physical or virtual resources (called nodes), and a communication path to connect them. Associations on each node can be created to represent resource hierarchy, for example an association can be made between a database application and the database data. This association keeps the app and the data together when systems are migrated. Lifekeeper also offers the ability to view system logs of the nodes. DataKeeper is a software tool that is bundled with LifeKeeper. It provides capability to real-time mirror local source drives to destination drives which reside elsewhere on the customer’s network or in the cloud. This provides resilience to a drive outage or failure. Drive data mirroring is handled by SIOS software which does automatic synchronization of data from the source to the destination when changes occur on the source drive. A bitmap is utilized to map the writes to specific blocks and block-level writing is used to perform the copies. Key Datakeeper and Lifekeeper Product Features and DetailsLinux and Windows operating systems are supported for both products. Lifekeeper offers high IT resilience to problems, keeping systems up and running. If a problem is detected, the system will attempt to restart the application. If this is unsuccessful, it will perform a failover to the standby node. If a communication path goes down, intervention occurs and makes a determination on which node becomes the source node based on data available to each node and provisioned quorum settings. DataKeeper allows you to configure source and destination connections for Synchronous or Asynchronous drive writing. Synchronous file writing, means that the system completes the write to the destination before it reports that the write is complete; it is slower response, but safer. With asynchronous file writing, the write operations are performed in the background providing faster response. Datakeeper uses WAN throttling and data compression for efficiency. The combination of products can be used to migrate applications to new VMs or perform maintenance on secondary systems while keeping the primaries live. Datakeeper and Lifekeeper Product ValueA main benefit of using SIOS Datakeeper is that you can use locally attached drives that already exist on your system. There is no need to plan for and purchase storage hardware. There isn’t the concern of having a RAID controller failing, preventing access to all of the storage, or the whole storage unit being targeted to attacks such as ransomware. Lifekeeper is available as a Cluster solution using multiple nodes with resource failure detection and failover capability, or is available in a single node variant (Single Server Protection) providing resource failure detection and reboot capability for a single server system. Both are available for Linux and Windows offering protection for a variety of types of customer’s system. LifeKeeper does not require any customized, fault-tolerant hardware. Linux Lifekeeper supports RHEL9-7, SLES15-12, Oracle Linux 9-6, CentOS 8-6 Rocky 8-6, Miracle 9-8, and can be hosted using VMware vSphere, VMware Cloud on AWS, KVM, Oracle VM Server and Nutanix Acropolis Hypervisor. Linux LifeKeeper installation setup script utilizes package manager tools to install the product. Key Points to KnowA newbie to SIOS LifeKeeper or DataKeeper can run into a few common points of confusion. Here are some to be aware of: Datakeeper:
Lifekeeper:
SIOS Technical DocumentationRead the official SIOS technical documentation to learn more about the product details and how to troubleshoot issues. From the support page, you can go to the Support Portal. The Support Portal has the following tabs: – Solutions tab takes you to a page showing Problem / Solution combinations. – Cases tab takes you to a page showing various cases in detail Both pages have search panels allowing the customer to hone in on relevant records. Key Disaster Recovery Terms and TerminologyAutomatic failover – detection of failure and switching of primary and standby drives is handled by the SIOS software, allowing the customer’s system to still function properly should an outage occur. Application Recovery Kits (ARKs) – are available to protect your business-critical applications and data from downtime and disasters. ARKs provide the capability for performing setup, automation of manual tasks and failover. Cluster – group of physical or virtual machines that behave as a single system, providing redundancy to create a high-availability resource. Mirroring – intentionally synchronizing primary drive content changes to a standby drive in real-time. Switchover – User initiated switching of source and standby drives. Used when system maintenance needs to be performed on a drive. Lessons and Tips for the Next Newbie:What has proved most useful for me for retaining what I have learned so far is to take lots of notes, and record screen video on training sessions with peers. This gives you something concrete to refer to at a later date. Practice setting up mirrors, getting them connected and working, and then performing switch-overs has been very helpful to my understanding of the product. Practice practice practice. The official documentation is an excellent resource to read up on how to perform an operation. SIOS High Availability and Disaster RecoverySIOS Technology Corporation provides high availability and Disaster Recovery products that protect & optimize IT infrastructures with cluster management for your most important applications. Contact us today for more information about our services and professional support. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||
March 18, 2024 |
What is a License Rehost?What is a License Rehost?How to Perform a License Rehost in the SIOS Licensing PortalWhen a SIOS perpetual product license is first activated in the SIOS Licensing Portal, it is tied to a unique identifier that is local to that server. That unique identifier is typically a MAC address (which is also referred to as the system’s hostid) that is assigned to a Network Interface Controller (NIC). When this system’s unique identifier changes from the original unique identifier that was used to activate the license, a license rehost is required to continue using the SIOS Products. A license rehost is a procedure that is required to activate an updated product license key when the system’s unique identifier (the MAC address/system’s hostid) no longer matches the original system’s unique identifier in the product license key. When is a License Rehost required?A rehost is required when the system’s unique identifier changes from the original unique identifier used to create the original license key. There are several things that can cause the unique identifier to change:
What Problem occurs when a License Rehost is requiredThe most common problem that occurs when a license rehost is required is that the SIOS product will not start running properly. When this occurs the logs will show a failure due to an invalid license key because the license key that is installed on the system will not match the system’s unique identifier. On a Windows system, this error appears in the Event Viewer under the Application logs On a Linux system, this error appears in the LifeKeeper log located in /var/log. What is the Procedure for Rehosting a License KeyThe first step in the procedure is to login to the SIOS Licensing portal. Go to support.us.sios.com and select Manage Licenses ![]() After selecting Manage Licenses, login to the Licensing site: ![]() Once logged in, go to License Support and List Licenses ![]()
![]() Click on the Green Plus sign ![]() Enter the new 12 character hostid / MAC address and click OK. Do not include commas, colons, or spaces between the 12 characters. ![]()
SIOS High Availability and Disaster RecoverySIOS Technology Corporation provides high availability cluster software that protects & optimizes IT infrastructures with cluster management for your most important applications. Contact us today for more information about our services and professional support. Reproduced with permission from SIOS |