April 6, 2024 |
A Step-by-Step Guide to Setting Up an NFS File Witness with SIOS LifeKeeper on LinuxA Step-by-Step Guide to Setting Up an NFS File Witness with SIOS LifeKeeper on LinuxGetting Started with SIOS Lifekeeper and NFS-Based File WitnessIn high availability clustering, a witness plays a crucial role in ensuring the integrity and reliability of the cluster. Without a 3rd node, it can be hard to achieve quorum as there is no data to help break a tie where both nodes think they should go live (This is known as a split-brain). You can solve this problem in many ways, for example, by providing a dedicated witness server, a shared storage path seen by the whole cluster, or simply by having more nodes in the cluster itself (minimum 3!). Thankfully, SIOS LifeKeeper offers robust solutions for setting up high-availability clusters on Linux environments, and configuring a witness to improve quorum is an essential feature. In this guide, we’ll walk you through the steps to set up an NFS-based file witness with SIOS LifeKeeper on Linux, helping you enhance the availability and resilience of your clustered applications. Goal:To achieve a 2-node cluster using an NFS-based storage witness as shown in the diagram below: Prerequisites: Before getting started, ensure you have the following:
Step 1: Install/Modify SIOS LifeKeeper:We will need to either install LifeKeeper at this stage or re-run the setup to add Witness functionality unless you already included it earlier. In my case, I’m using RHEL8.8, so I will mount the ISO before running the setup with the supplementary package needed for RHEL8.8. [root@server1-LK ~]# mount /root/sps.img /mnt/loop -t iso9660 -o loop [root@server1-LK ~]# cd /mnt/loop/ [root@server1-LK loop]# ./setup –addHADR /root/HADR-RHAS-4.18.0-477.10.1.el8_8.x86_64.rpm Here the important part for our purposes is enabling the witness function like in the screenshot below. However, you will also need an additional license file, which you can either add here or add via the command line later at your discretion: Otherwise, configure LifeKeeper for your purposes, or if it was already configured simply proceed through the setup once you’ve included the “Use Quorum / Witness Function” option. If you decided to add the license via the command line also run the following command on each node in the cluster with the correct path to your license file: [root@server1-LK ~]# /opt/LifeKeeper/bin/lkkeyins /<path-to-license-file>l/quorum-disk.lic Step 2: Set up and mount shared storage:Ensure that you have shared storage accessible to all servers in the cluster. You can check each server using either the ‘mount’ command or with ‘findmnt’ to verify that you have it locally mounted: [root@server1-LK loop]# mount | grep nfs sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) 172.16.200.254:/var/nfs/general on /nfs/general type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard, or [root@server1-LK ~]# findmnt -l /nfs/general TARGET SOURCE FSTYPE OPTIONS /nfs/general 172.16.200.254:/var/nfs/general nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard, Should you still need to mount the share yourself, please follow these steps: First, confirm you can see the NFS share on the host server. [root@server1-LK ~]# showmount -e 172.16.200.254 Export list for 172.16.200.254: /home 172.16.205.244,172.16.205.151 /var/nfs/general 172.16.205.244,172.16.205.151 In my case, I want to mount the ‘/var/nfs/general’ share. To mount this share, first, make sure your directory you plan to mount it to exists. If not, create it: [root@server1-LK ~]# mkdir -p /nfs/general Now you can manually mount the share using the following command to confirm you can connect, and it works: [root@server1-LK ~]# mount 172.16.200.254:/var/nfs/general /nfs/general Finally, once happy, add the mount point to you’re /etc/fstab file so it will mount on boot: [root@server1-LK ~]# cat /etc/fstab # # /etc/fstab # Created by anaconda on Thu Jan 25 12:07:15 2024 # # Accessible filesystems, by reference, are maintained under ‘/dev/disk/’. # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info. # # After editing this file, run ‘systemctl daemon-reload’ to update systemd # units generated from this file. # /dev/mapper/rhel-root / xfs defaults 0 0 UUID=6b22cebf-8f1c-405b-8fa8-8f12e1b6b56c /boot xfs defaults 0 0 /dev/mapper/rhel-swap none swap defaults 0 0 #added for NFS share 172.16.200.254:/var/nfs/general /nfs/general nfs4 defaults 0 0 Now, you can confirm it is mounted using the mount command: [root@server1-LK ~]# mount -l | grep nfs sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) 172.16.200.254:/var/nfs/general on /nfs/general type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576, As you can see from the highlighted text above, it has now been mounted successfully. Repeat on all servers until you are sure all servers have the share mounted before proceeding. Step 4: Check your hostnames and configure /etc/default/LifeKeeper settings:You can see the hostname LifeKeeper knows for each of your servers by running the following command on each node: /opt/LifeKeeper/bin/lcduname Example of settings you’ll need to add to the /etc/default/LifeKeeper file: WITNESS_MODE=storage QWK_STORAGE_TYPE=file QWK_STORAGE_HBEATTIME=6 QWK_STORAGE_NUMHBEATS=9 QWK_STORAGE_OBJECT_server1_LK_localdomain=/nfs/general/nodeA QWK_STORAGE_OBJECT_server2_LK_localdomain=/nfs/general/nodeB For ‘QWK_STORAGE_OBJECT_<server-name>’, you need to declare this for each node, and it is formed using your hostname as well as the path, and the desired location of the witness file itself. It should be noted that if the hostname contains a “-” or “.”, replace them with an underscore “_” In my example, I had the following hostnames: server1-LK.localdomain server2-LK.localdomain Which meant adding the following ‘QWK_STORAGE_OBJECT_’ definitions: QWK_STORAGE_OBJECT_server1_LK_localdomain=/nfs/general/nodeA QWK_STORAGE_OBJECT_server2_LK_localdomain=/nfs/general/nodeB In addition, we will need to adjust one of the existing settings in /etc/default/LifeKeeper: QUORUM_MODE=storage To help understand why we have set both our WITNESS_MODE and QUORUM_MODE to storage take a look at the following table: Supported Combinations of a Quorum Mode and Witness Mode LifeKeeper supports the following combinations.
We have a two-node cluster that wants to use external storage for a quorum, so the only supported combination would be ‘storage’ for both values. However, you can see from the table how flexible this can be when you require more nodes, offering many ways to achieve communication and provide a quorum. Step 4: Initialize the Witness file:To initialize the witness file and enable its use, you must run the following command on each node: [root@server1-LK ~]# /opt/LifeKeeper/bin/qwk_storage_init It will pause when run until each node has completed so execute the command on the first node in the cluster, then the second, and so on before coming back to check the command is completed with no errors. Example: [root@server1-LK ~]# /opt/LifeKeeper/bin/qwk_storage_init ok: LifeKeeper is running. ok: The LifeKeeper license key is successfully installed. ok: QWK parameter is valid. QWK object of /nfs/general/nodeA is not yet avail. /nfs/general/nodeA already exsits as not QWK_STORAGE_OBJECT: overwrite? (y/N): y ok: The path of QWK object is valid. ok: down: /opt/LifeKeeper/etc/service/qwk-storage: 1377s ok: Initialization of QWK object of own node is completed. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. QWK object of /nfs/general/nodeB is not yet avail. ok: quorum system is ready. ok: run: /opt/LifeKeeper/etc/service/qwk-storage: (pid 14705) 1s, normally down Successful. Step 5: Validate Configuration:The configuration can be validated by running the following command: /opt/LifeKeeper/bin/lktest Should it find any errors, they will be printed to the terminal for you. In the example below, I hadn’t replaced the special characters in my hostname so it highlighted it was unable to find the storage. [root@server1-LK ~]# /opt/LifeKeeper/bin/lktest /opt/LifeKeeper/bin/lktest: /etc/default/LifeKeeper[308]: QWK_STORAGE_OBJECT_server1_LK.localdomain=/nfs/general/nodeA: not found /opt/LifeKeeper/bin/lktest: /etc/default/LifeKeeper[309]: QWK_STORAGE_OBJECT_server2_LK.localdomain=/nfs/general/nodeB: not found F S UID PID PPID C CLS PRI NI SZ STIME TIME CMD 4 S root 2348 873 0 TS 39 -20 7656 15:49 00:00:00 lcm 4 S root 2388 882 0 TS 39 -20 59959 15:49 00:00:00 ttymonlcm 4 S root 2392 872 0 TS 29 -10 10330 15:49 00:00:00 lcd 4 S root 8591 8476 0 TS 19 0 7670 15:58 00:00:00 lcdremexec -d server2-LK.localdomain -e — cat /proc/mdstat You can also confirm that the witness file is being updated via the command line like so: [root@server1-LK ~]# cat /nfs/general/nodeA signature=lifekeeper_qwk_object local_node=server1-LK.localdomain time=Thu Feb 15 14:10:56 2024 sequence=157 node=server2-LK.localdomain commstat=UP checksum=13903688106811808601 A Successful File Share Witness Using NFSSetting up a file share witness using NFS is easy! It can be powerful if you are restricted to two nodes but need better resilience to split-brain events, especially in the cloud where you can leverage something like AWS’s EFS… Another essential part can be utilizing more communications paths, but that’s a different blog. However, by following the steps outlined in this guide, you can enhance the resilience of your clustered applications and minimize the risk of downtime. Always refer to the SIOS documentation and best practices for further guidance and optimization of your high-availability setup. It’s publicly available and extremely comprehensive! SIOS High Availability and Disaster RecoverySIOS Technology Corporation provides high availability and Disaster Recovery products that protect & optimize IT infrastructures with cluster management for your most important applications. Contact us today for more information about our services and professional support. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||
March 30, 2024 |
SIOS Product Management team is pleased to announce the general availability of SIOS LifeKeeper for Linux v 9.8.1.SIOS Product Management team is pleased to announce the general availability of SIOS LifeKeeper for Linux v 9.8.1.New in LifeKeeper Linux v 9.8.1
Reproduced with permission from SIOS
|
||||||||||||||||||||||||||||
March 25, 2024 |
First 30 days: Key things to know for a newbie to SIOS LifeKeeper or SIOS DataKeeperFirst 30 days: Key things to know for a newbie to SIOS LifeKeeper or SIOS DataKeeperAs a relatively new employee, my boss asked me to write down my impressions of SIOS products and things that newbie’s to SIOS might like to know. Here are my thoughts. Key Product Concepts: Clustering and Data MirroringLifeKeeper (Windows or Linux) is clustering software that monitors the whole application stack (network, storage, O/S, database, application software and server hardware). It allows you to specify backup physical or virtual resources (called nodes), and a communication path to connect them. Associations on each node can be created to represent resource hierarchy, for example an association can be made between a database application and the database data. This association keeps the app and the data together when systems are migrated. Lifekeeper also offers the ability to view system logs of the nodes. DataKeeper is a software tool that is bundled with LifeKeeper. It provides capability to real-time mirror local source drives to destination drives which reside elsewhere on the customer’s network or in the cloud. This provides resilience to a drive outage or failure. Drive data mirroring is handled by SIOS software which does automatic synchronization of data from the source to the destination when changes occur on the source drive. A bitmap is utilized to map the writes to specific blocks and block-level writing is used to perform the copies. Key Datakeeper and Lifekeeper Product Features and DetailsLinux and Windows operating systems are supported for both products. Lifekeeper offers high IT resilience to problems, keeping systems up and running. If a problem is detected, the system will attempt to restart the application. If this is unsuccessful, it will perform a failover to the standby node. If a communication path goes down, intervention occurs and makes a determination on which node becomes the source node based on data available to each node and provisioned quorum settings. DataKeeper allows you to configure source and destination connections for Synchronous or Asynchronous drive writing. Synchronous file writing, means that the system completes the write to the destination before it reports that the write is complete; it is slower response, but safer. With asynchronous file writing, the write operations are performed in the background providing faster response. Datakeeper uses WAN throttling and data compression for efficiency. The combination of products can be used to migrate applications to new VMs or perform maintenance on secondary systems while keeping the primaries live. Datakeeper and Lifekeeper Product ValueA main benefit of using SIOS Datakeeper is that you can use locally attached drives that already exist on your system. There is no need to plan for and purchase storage hardware. There isn’t the concern of having a RAID controller failing, preventing access to all of the storage, or the whole storage unit being targeted to attacks such as ransomware. Lifekeeper is available as a Cluster solution using multiple nodes with resource failure detection and failover capability, or is available in a single node variant (Single Server Protection) providing resource failure detection and reboot capability for a single server system. Both are available for Linux and Windows offering protection for a variety of types of customer’s system. LifeKeeper does not require any customized, fault-tolerant hardware. Linux Lifekeeper supports RHEL9-7, SLES15-12, Oracle Linux 9-6, CentOS 8-6 Rocky 8-6, Miracle 9-8, and can be hosted using VMware vSphere, VMware Cloud on AWS, KVM, Oracle VM Server and Nutanix Acropolis Hypervisor. Linux LifeKeeper installation setup script utilizes package manager tools to install the product. Key Points to KnowA newbie to SIOS LifeKeeper or DataKeeper can run into a few common points of confusion. Here are some to be aware of: Datakeeper:
Lifekeeper:
SIOS Technical DocumentationRead the official SIOS technical documentation to learn more about the product details and how to troubleshoot issues. From the support page, you can go to the Support Portal. The Support Portal has the following tabs: – Solutions tab takes you to a page showing Problem / Solution combinations. – Cases tab takes you to a page showing various cases in detail Both pages have search panels allowing the customer to hone in on relevant records. Key Disaster Recovery Terms and TerminologyAutomatic failover – detection of failure and switching of primary and standby drives is handled by the SIOS software, allowing the customer’s system to still function properly should an outage occur. Application Recovery Kits (ARKs) – are available to protect your business-critical applications and data from downtime and disasters. ARKs provide the capability for performing setup, automation of manual tasks and failover. Cluster – group of physical or virtual machines that behave as a single system, providing redundancy to create a high-availability resource. Mirroring – intentionally synchronizing primary drive content changes to a standby drive in real-time. Switchover – User initiated switching of source and standby drives. Used when system maintenance needs to be performed on a drive. Lessons and Tips for the Next Newbie:What has proved most useful for me for retaining what I have learned so far is to take lots of notes, and record screen video on training sessions with peers. This gives you something concrete to refer to at a later date. Practice setting up mirrors, getting them connected and working, and then performing switch-overs has been very helpful to my understanding of the product. Practice practice practice. The official documentation is an excellent resource to read up on how to perform an operation. SIOS High Availability and Disaster RecoverySIOS Technology Corporation provides high availability and Disaster Recovery products that protect & optimize IT infrastructures with cluster management for your most important applications. Contact us today for more information about our services and professional support. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||
March 18, 2024 |
What is a License Rehost?What is a License Rehost?How to Perform a License Rehost in the SIOS Licensing PortalWhen a SIOS perpetual product license is first activated in the SIOS Licensing Portal, it is tied to a unique identifier that is local to that server. That unique identifier is typically a MAC address (which is also referred to as the system’s hostid) that is assigned to a Network Interface Controller (NIC). When this system’s unique identifier changes from the original unique identifier that was used to activate the license, a license rehost is required to continue using the SIOS Products. A license rehost is a procedure that is required to activate an updated product license key when the system’s unique identifier (the MAC address/system’s hostid) no longer matches the original system’s unique identifier in the product license key. When is a License Rehost required?A rehost is required when the system’s unique identifier changes from the original unique identifier used to create the original license key. There are several things that can cause the unique identifier to change:
What Problem occurs when a License Rehost is requiredThe most common problem that occurs when a license rehost is required is that the SIOS product will not start running properly. When this occurs the logs will show a failure due to an invalid license key because the license key that is installed on the system will not match the system’s unique identifier. On a Windows system, this error appears in the Event Viewer under the Application logs On a Linux system, this error appears in the LifeKeeper log located in /var/log. What is the Procedure for Rehosting a License KeyThe first step in the procedure is to login to the SIOS Licensing portal. Go to support.us.sios.com and select Manage Licenses ![]() After selecting Manage Licenses, login to the Licensing site: ![]() Once logged in, go to License Support and List Licenses ![]()
![]() Click on the Green Plus sign ![]() Enter the new 12 character hostid / MAC address and click OK. Do not include commas, colons, or spaces between the 12 characters. ![]()
SIOS High Availability and Disaster RecoverySIOS Technology Corporation provides high availability cluster software that protects & optimizes IT infrastructures with cluster management for your most important applications. Contact us today for more information about our services and professional support. Reproduced with permission from SIOS |
||||||||||||||||||||||||||||
March 8, 2024 |
Ensuring HA for Building Management SystemsEnsuring HA for Building Management SystemsWhy BMS Serves as a Role Model for Other IndustriesAt the heart of every modern building, there is a building management system (BMS) that controls the heating, ventilation, and air conditioning (HVAC), the lighting and security systems, the fire suppression systems, and more. Building managers rely on BMS consoles to streamline and optimize a building’s operations and ensure the tenants’ environment is comfortable and safe. A BMS needs to operate reliably whether it is in an on-premises data center or a public cloud environment, such as AWS EC2, Azure, or Google Cloud Platform (GCP). There are several approaches to protecting a BMS from downtime and disasters—using fault tolerant (FT) solutions, high availability (HA) solutions, and disaster recovery (DR) solutions—but determining which approach is best depends upon multiple considerations. How SIOS Ensures HA for BMSSIOS’ EMEA Technical Director Harry Aujla explains in this TFiR video interview why high availability is an integral part of BMS solutions and a good model for other critical applications. He goes on to explain some of the ways BMS has evolved over the years with the increasing reliance on IT. He says, “If BMS customers fully understand how high availability works on the cloud, then I think we’ll see a more significant migration of BMS solutions to cloud platforms.”SIOS Technology Corporation provides high availability cluster software that protects & optimizes IT infrastructures with cluster management for your most important applications. Contact us today for more information about our professional services and support. Reproduced with permission from SIOS |