SIOS SANless clusters - Page 61 of 190 - SIOS SANless clusters High-availability Machine Learning monitoring

March 8, 2021	Stages of IT Disaster Recovery Grief Stages of IT Disaster Recovery Grief Disaster recovery grief can hit you out of nowhere if you haven’t implemented the right enterprise availability architecture. Meet our friend Dave in IT to walk us through the 5 stages of disaster grief. Stage 1: Denial Dave in IT: “Uh oh. What’s that alert? It’s just a little application crash, right? No big deal. I’ll have things up and running in no time.” In the land of enterprise availability, there is no such thing as a little application crash or no big deal. Companies have SLA with real money on the line. Your selective reality is probably not the same perspective of your customers and stakeholders. Stage 2: Anger Dave in IT: “Are you kidding me. Of all the… [censored]...times, today the application won’t start. Ughh. I hate this[censored]...[censored]... application. Wait, what’s this new alert. Seriously, now, the datacenter is down!” It gets messy really, really fast in the fast pace, and high stakes environments. When unchecked alerts and failures happen, problems can mount quickly along with pressure, frustration and anger. State 3: Bargaining Dave in IT: “Hey Ard in Applications, this is Dave in IT. Do you guys have any backups for the App1 environment? . . .Ard are you sure? Could you just check again? I know you’ve checked twice, but can you check one more time. I’ll buy drinks on Taco Tuesday!” Dave in IT: “Hey Donna DBA, this is Dave in IT. Art in Applications said you might help me out. Did you by chance setup any database replication for that finance database or the inventory management system? . . . Are you sure? Umh, do you remember if we have any way to recover from a umh . . . datacenter crash?” When my daughter gets in trouble, bargaining is her first go to. Okay, second. The first is to disappear, but you’re too smart to just walk away from the flames. But, Dave in IT isn’t the only one to realize that bargaining and begging is a poor substitute for a well defined strategy for high availability and disaster recovery. Skip the bargaining and begging about your disaster because “80% of the people don’t care, and 20% are glad it’s you (paraphrased from Les Brown).” Stage 4: Sadness Dave in IT: “This is just great. The application server crashed, the datacenter is down, and backups, if I can find them and if I can load them, will take hours to get restored. There is no way I’m getting out of this… where did I put that updated resume.” Of course you have backups, and you’ve validated them. But there is an RTO and RPO impact of going back to those backups. Are you able to absorb this time? That is of course, after your data center recovers. Step 5: Acceptance Dave in IT: “It’s been two hours. I never knew we had this many Executive stakeholders before. No way I’m making it to my 2nd year anniversary after this. Well, I guess I’ll clean out my office tomorrow. No way I’m making it through this!” Failures happen. Datacenters go down. Applications fail. There is no denying the possibility of losing a data center, having a server fail, or an application crash. This type of acceptance is normal, a part of improving your availability. Accepting that you may lose your job or worse because you failed to implement an availability strategy is something the experts at SIOS Technology Corp. want to make sure you avoid. Don’t be like Dave in IT. Avoid the stages of disaster grief, and the hours of disaster recovery and downtime by architecting and implementing an enterprise availability architecture that includes the best of hybrid, on-premise, or cloud coupled with the best solution for monitoring, recovery, and system failover automation. – Cassius Rhue, VP Customer Experience Reproduced from SIOS
March 1, 2021	Why Does High Availability Have To Be So Complicated? Why Does ~~love~~ High Availability Have To Be So Complicated? It’s the Hallmark movie season, I mean Christmas season, I mean Hallmark Christmas movie season… (don’t judge too harshly, I’m a father of six young ladies, a hopeless romantic, and married to an amazing spouse who enjoys a good holiday laugh and happy ending). If you are in the Hallmark movie season, you know that it is highly likely that you’ll hear the phrase, “Why is love so complicated?” It will be spoken just before the heartbroken young person has developed feelings for a new love interest, and is ready to dance the night away in their arms, just as the old flame walks into the party. If you aren’t into the Hallmark holiday romances, maybe it isn’t love that you are wondering about. Perhaps you want to know: “Why does high availability have to be so complex. Ten Reasons That High Availability Is So ‘Gosh Darn’ Complicated: The speed of innovation Cloud computing, edge computing, hyper converged, multi-cloud, containers, and machine learning are changing the landscape of enterprise availability at a blistering pace. By conservative estimates, AWS currently has over 175 services, and “provides a highly reliable, scalable, low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world.” Choosing an HA solution that allows consistent management across all of these environments, with infrastructure and application awareness is an important way to reduce complexity. Randomness of disasters Someone once said, “make your solution disaster proof, and the universe will build a better disaster.” Not only are we seeing innovations in the realm of technology, but also in the world of disasters. Resource starvation, cooling system disasters, natural disasters, power grid failures, and a host of new and random disasters often make it harder to insulate the entirety of your enterprise. Last year’s solutions will likely need updates to handle this year’s unprecedented outages. It’s important to work with a vendor that has focused on high availability for many years – who has firsthand experience with finding solutions to the randomness of disasters. Application complexity As technology moves head in the realm of virtualization and cloud computing, applications are following suite. As these application vendors add new options to take advantage of the cloud, they are also adding additional complexity. Your applications should be protected by solutions designed for higher availability and clustering in AWS, Azure, GCP or other environments. Look for vendors who provide greater application awareness, understanding of best practices, and who deliver availability solutions architected to taking account of how the application may have been architected and are able to optimize the application’s orchestration in the cloud. Advances in threats The threats to your enterprise also impact your availability. Systems have always had to handle the attacks from intruders, hackers, and even the self-inflicted. These attacks have become more sophisticated, and the solutions and methods to avoid being victimized often impact the layout, architecture, and software that is deployed within your organization. This software has to “play nice” with your availability solution and your applications. As VP of Customer Experience for SIOS Technology, I have seen how an overly aggressive virus scanner can impact your application and your availability solution. Ensure you understand the impact of your security systems on your HA/DR environment and choose a HA solution that works with, not against your security goals. Regulatory requirements Data breaches impact the architecture for your application, hypervisor and environment, but so too does the regulatory requirements. Businesses that have become global now have to make sure they are compliant with data handling regulations in multiple countries. This can impact what region your solutions can be deployed in, and how many zones you can use for redundancy. Additional, regulatory requirements can also impact the teams that can support your organization which may impact the choices for your availability software and support. Shrinking windows In the world of 24/7 searches, shopping, gaming, banking, and research the windows are shrinking. Queries must run faster and take less time. Responses have to be quicker and have better data. This means that the allowable downtime for your environment is shrinking faster than you previously imagined. It also means that maintenance windows are tighter, packed, and have to be optimized and highly coordinated. Work with an HA vendor that can provide guidance on optimizing your cluster configuration for both application performance and fast recovery time. Increasing competitive pressure I grew up in a small town. The hardware store had one competitor. The grocery store had one competitor. The bookstore, antique shop, car dealership, rental office, and bank all had one competitor. Today, you have thousands upon thousands of competitors who want nothing more than to see your customers in their checkout carts. This competition impacts the complexity of your entire business. It weighs heavily on what can and cannot be done in maintenance windows, with upgrades, and at what speed you innovate. Environments that may have been refreshed once every five years have moved to the cloud where optimizations and advancements in processor speed and memory can be had in seconds or minutes. Systems that once had a single run book covering a simple list of applications now look closer to “War and Peace” and cover the growing number of processes, products, services and intelligence being added to increase profits while simultaneously working to reduce risks and downtime. High availability solution costs We all wish we had an unlimited budget, but the reality between what you have available is sometimes somewhere between a little and not enough. Teams are often forced to balance consumption versus fixed cost, license costs for applications on the standby clusters, and associated costs for availability software. Enterprise licenses often add a ‘tough to swallow’ price tag for a standby server in an availability environment. Architecting an availability solution is never free, even if you are a hard core ‘DIY’ team. DIY comes with additional costs in maintenance, management, source control, testing, deployment, version management and version control, patches, and patch management. While your team of experts may be clearly up for the challenge, your business likely would prefer their highly valued talents be applied to creating more revenue opportunities. Business growth Growth of your business due to innovation means that your teams are now responsible for more critical applications, more sites, more offices, and more data that needs to be accessible and highly available. As your business grows and thrives the challenges that come with scaling up and scaling out add to the complexities mentioned previously, but also just expand what you have to prepare and plan for. Team turnover The complexity of the environments, speed of innovation, growth of your business, advances in the application tier, and growth in the competitive landscape brings with it the challenge of retaining top talent to keep your infrastructure running smoothly. Most companies understand that availability is a merger of people, process, product, and architecture among other things. So finding ways to reduce the complexity of clustering environments with automated configuration, documented run books, leveraging products with consistent HA strategies across the infrastructure is a key to both retaining the talent that installs and manages your infrastructure, and mitigating the risks and heavy lifting of those responsible for the key components of availability. Let’s face it, love takes hard work, good communication, time, investment, skill and determination. There are no shortcuts to a successful relationship. The same can be said about achieving the best outcomes in an ever emerging, increasingly complex, and fluid technology space within your enterprise. Availability, clustering, disaster recovery and up time is so ‘gosh darn’ hard because it requires a serious, dedicated, non-stop top to bottom cultural shift accounting for the speed of innovation, the complexity of applications and orchestration, competition and growth, and the other components of keeping applications, databases, and critical infrastructure available to those who need them, when they need them. -Cassius Rhue, Vice President, Customer Experience Reproduced from SIOS
February 26, 2021	How to Fix Inherited Application Availability Problems How to Fix Inherited Application Availability Problems What to do when you inherit a mess I grew up in a large immediate family, and an even larger group of well-meaning aunts, uncles, and family friends. Anyone who has ever been a part of a large family has probably, on more than one occasion, received a hand-me-down or had well-intentioned relatives give you a freebie. And if so, you know that beneath the surface of that cool-sounding inheritance, the rumored stylish clothes, or the old “family car” a nightmare could be lurking. Suddenly, your sudden fortune on four wheels feels like a curse that is two-thirds money pit and one third eye-sore. So what do you do when you inherit a mess of Application Availability Problems? Well some DIYers bring in the dumpsters and start fresh. But this isn’t HGTV and we aren’t talking about inherited furniture but an inherited application availability problem. You usually know you have a mess on your hands the first time you try to do a cluster switchover for simple, planned maintenance and your application goes offline. Now, what do you do when you have inherited a high availability mess. Two Practical Tips For When You Inherit A High Availability Mess (I mean responsibility) I. Research Perhaps one of the best things you can do before taking action is to gather as much data as quickly as possible. Of course, the state of your inheritance might indicate the speed at which you’ll need to gather your data. Some key things to consider during your research of to solve your Application Availability Problems: Previous owner. Research the previous owner of the configuration including their chain of command, reach of authority, background, team dynamics and if possible, charter. Find out what were the original organizational structures. Research what was done in the past to achieve high or higher availability, and what was left out. In some environments, the focus for high availability falls squarely on a portion of the infrastructure while neglecting the larger workflow. Dig into any available requirements. As well as what changes have been implemented or added since the requirements were originally instated. If you’re in the midst of a cloud migration, understand the goals of moving this environment to the cloud. Owners and requirements provide a lot of history. However, you’ll also want to research why key decision makers made the choices and tradeoffs on designs and solutions, as well as software and hardware architecture requirements. Evaluate whether these choices were either successful or unsuccessful. Your research should focus on original problems and proposed solutions. You may also want to consider why the environment you inherited feels like a mess. For example, is it due to lack of documentation, training, poor or missing design details, the absence of a run book, or other specification details. Research what, if any, enterprise grade high availability software solutions have been used to complement the architecture of virtual machines, networks, and applications. Is there a current incumbent? If not, what were the previous methods for availability? II. Act Once you’ve gathered this research, your next step is to act: update, improve, implement, or replace. Don’t make the mistake of crossing your fingers and hoping you never need a cluster failover. Upgrade In some cases, your research will lead to a better understanding of the incumbent solution and a path to upgrade that solution to the latest version. Honestly, we have been there with our own customers. Transitions are mishandled. A solution that works flawlessly for years becomes outdated. Improve Consider alternatives if an upgrade is not warranted. If the data points to other areas of improvement such as software or hardware tuning, migration to cloud or hybrid, network tuning, or some other identified risk or single point of failure. Perhaps your environment is due for a health check or the increases in your workload warrants an improvement in your instance sizes, disk types, or other parameters. Implement In other cases, your research will uncover some startling details regarding the lack of a higher availability strategy or solution. In which case, you will use your research as a catalyst to design and implement a high availability solution. This solution might necessitate private cloud, public cloud, or hybrid cloud architectures coupled with the enterprise grade HA software to enable successful monitoring and recovery. Replace In extreme cases, your research will lead you to a full replacement of the current environment. Sometimes this is required when a customer or partner migrated to the cloud. But their high availability software offering was not cloud ready. While many applications boast of being cloud ready, in some cases this is more slideware than reality. Your on-premise solution is not cloud ready? Then your only recourse may be to go with a solution that is capable of making the cloud journey with you, such as the SIOS Protection Suite products. As VP of Customer Experience for SIOS Technology I experienced a situation that shows the importance of these steps – when our Services team was engaged by an enterprise partner to deploy SIOS Protection Suite products. As we worked jointly with the customer, doing research, we uncovered a wealth of history. The customer professed to have a limited number of downtime or availability issues. But our research revealed an unsustainable and highly complex hierarchy of alerts, manually executed scripts, global teams, and hodgepodge of tools kludged together. We were able to successfully architect and replace their homemade solution with a much more elegant and automated solution with this information. Best part, it was wizard based, including automated monitoring, recovery, and system failover protection. No more kludge. No more trial-and-error DIY. Just simple, reliable application failover and failback for HA/DR protection. If you have inherited a host of Application Availability Problems, contact the deployment and availability experts at SIOS Technology Corp. Our team can walk you through the research process, help you hone your requirements. Finally, upgrade, improve, replace or implement the solution to provide your enterprise with higher availability. – Cassius Rhue, Vice President, Customer Experience Reproduced from SIOS
February 18, 2021	Quick Start Guide to High Availability for SQL Server Using SIOS Protection Suite for Linux Quick Start Guide to High Availability for SQL Server Using SIOS Protection Suite for Linux This guide is intended to illustrate Microsoft SQL Server protection using SIOS Protection Suite for Linux. The environment used here is VMware ESXi with virtual machines added running CentOS 7.6. Microsoft SQL 2017 is being used to create a database server. Database and transaction logs will be stored on local disks that will be replicated between nodes using DataKeeper – demonstrating that shared storage could be used as a simple replacement for local disks. This guide is available here as a pdf. Download Required Microsoft Software Open the following Microsoft guide to installing SQL at https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup?view=sql-server-ver15 Plan SQL Environment Configuration The following configuration settings will be used for creating the cluster environment described by this quick-start guide. Adapt your configuration settings according to your specific system environment. General Configuration The example we installed during this quick start guide uses CentOS. The Red Hat instructions apply since CentOS is binary compatible with Red Hat. The example in this quick start guide will be very similar, whether they are running in a VMware environment, cloud or physical installations. Node 1 configuration Hostname: IMAMSSQL-1 Public IP: 192.168.4.21 Private IP: 10.1.4.21 /dev/sdb (10GiB) /dev/sdc (10GiB) Node 2 configuration Hostname: IMAMSSQL-2 Public IP: 192.168.4.22 Private IP: 10.1.4.22 /dev/sdb (10GiB) /dev/sdc (10GiB) Virtual IP used for SQL Access 168.4.20, this will be protected by LifeKeeper and “floats” between nodes Operating System CentOS 7.6 SQL Database Configuration SQL Database: SQL Virtual Hostname: IMAMSSQL SQL Virtual IP: 192.168.4.20 SQL File System Mount Points /database/data /database/xlog PREPARE SYSTEM FOR INSTALLATION Installing MS-SQL Initial SQL install In this section we will add the Microsoft package location into our Linux OS and then instruct the OS to install SQL Server. Open the following Microsoft guide to installing SQL Server: https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup?view=sql-server-ver15 Login with root privilege or you use sudo before each command curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2017.repo yum install -y mssql-server /opt/mssql/bin/mssql-conf setup, I installed my SQL Server with an Evaluation license yum install -y mssql-tools unixODBC-devel echo ‘export PATH=”$PATH:/opt/mssql-tools/bin”‘ >> ~/.bash_profile echo ‘export PATH=”$PATH:/opt/mssql-tools/bin”‘ >> ~/.bashrc source ~/.bashrc systemctl stop mssql-server.service, we stop the SQL service and cannot start the SQL service until we have configured the disks used as storage in the section titled “Create database and transaction log file-systems and mount points”. /opt/mssql/bin/mssql-conf set filelocation.masterdatafile /database/data/master.mdf /opt/mssql/bin/mssql-conf set filelocation.masterlogfile /database/xlog/mastlog.ldf Create database and transaction log file-systems and mount points We will use the xfs file-system type for this installation. Refer to LifeKeeper supported file-system types to determine which file-system you want to configure. Make sure you configure the disk to use GUID identifiers. Here we will partition and format the locally attached disks; mount, create and permission the database locations we want SQL to use, finally we will start SQL which will create new Master DB and transaction logs in the location we specified. Note when creating the partition, DataKeeper requires the number of blocks in the partition to be odd. E.g. 20973567 (end) – 2048 (start) = 20971519. fdisk /dev/sdb mkfs -t xfs /dev/sdb1 fdisk /dev/sdc mkfs -t xfs /dev/sdc1 mkdir /database; mkdir /database/data; mkdir /database/xlog chown mssql /database/; chgrp mssql /database/ chown mssql /database/data/; chgrp mssql /database/data/ chown mssql /database/xlog/; chgrp mssql /database/xlog/ vi /etc/fstab Add /dev/sdb1 mounting to /database/data, e.g. /dev/sdb1 /database/data xfs defaults 0 0 Add /dev/sdb1 mounting to /database/xlog, e.g. /dev/sdb1 /database/xlog xfs defaults 0 0 mount /dev/sdb1 mount /dev/sdc1 chown mssql /database/data/; chgrp mssql /database/data/ chown mssql /database/xlog/; chgrp mssql /database/xlog/ systemctl start mssql-server.service, we start the SQL service now that local disks are mounted – this will create new Master DB and transaction logs Installing LifeKeeper Refer to the Installation Guide http://docs.us.sios.com/spslinux/9.5.1/en/topic/sios-protection-suite-for-linux-installation-guide Create LifeKeeper Resource Hierarchies Open the LifeKeeper GUI on the primary node: # /opt/LifeKeeper/bin/lkGUIapp & Communication Paths Create backend and/or frontend IP routes, in our case backend is 10.2.4.21 & 22 and frontend is 192.168.4.21 & 22 [AWS only] Right-click on each instance in the AWS Management Console and select Networking → Change Source/Dest. Check and ensure that source/destination checking is disabled. In the LifeKeeper GUI, click Create Comm Path. In the Remote Server(s) dialog, add the host names of the other cluster nodes and select them. Select the appropriate local (10.2.4.21) and remote (10.2.4.22) IP addresses. Repeat this process, creating communication paths between all pairs of remote nodes for each network (e.g., 12.0.1.30 and 12.0.2.30). After completion, communication paths should exist between all pairs of cluster nodes. IP Resources The IP resource is the virtual IP that will be used to access the SQL server – in this case 192.168.4.20 Verify that all of the virtual IP’s have been removed from the network interface by running ‘ip addr show’. Create the IP resource for the MSSQL virtual IP. In the LifeKeeper GUI, click Create Resource Hierarchy and select IP. 4. When prompted, enter the IP 192.168.4.20 and choose the subnet mask 255.255.0.0. 5. Enter a tag name such as ip-192.168.4.20-MSSQL. DataKeeper Resources This is the drives used to store the database and transaction logs, /database/data and /database/xlog Data Replication Resources Ensure that all SQL file systems are mounted at the appropriate mount points under /database on the primary cluster node. # mount … /dev/sdb1 on /database/data type xfs (rw,relatime,attr2,inode64,noquota) /dev/sdc1 on /database/xlog type xfs (rw,relatime,attr2,inode64,noquota) … 2.Ensure that the file systems are not mounted on the backup cluster node(s). 3. In the LifeKeeper GUI, click Create Resource Hierarchy and select Data Replication. 4. For Hierarchy Type, select Replicate Existing Filesystem. 5. For Existing Mount Point, select /database/data 6. Select the appropriate values for the rest of the creation dialogs as appropriate for your environment Repeat steps 3-6 for the /database/data and /database/xlog file systems. Quick-Service Protection We will use LifeKeeper’s Quick Service Protection ARK to protect the mssql-server service, this will monitor the MSSQL service and make sure it’s running. Use systemctl status mssql-server.service on node 1 to ensure MSSQL is running Use systemctl status mssql-server.service on node 2 to ensure that MSSQL isn’t running, if it is then you will need to stop the service using systemctl stop mssql-server.service, then unmount the /database/data and /database/xlog directories. In the LifeKeeper GUI, click add resource Select the QSP ARK from the drop-down When the list of services available populates, choose mssql-server.service Select the appropriate values for the rest of the creation dialogs as appropriate for your environment Extend the hierarchy to node 2 At the linux CLI on node 1, run “/opt/LifeKeeper/bin/lkpolicy -g –v”, output will look similar to this: If LocalRecovery: On is set for QSP-mssql-server then we need to disable local recovery on both nodes, this is done by executing (on both nodes): /opt/LifeKeeper/bin/lkpolicy -s LocalRecovery -E tag=”QSP-mssql-server” Confirm that Local Recovery is disabled on both nodes, “/opt/LifeKeeper/bin/lkpolicy -g –v” : Reproduced from SIOS
February 17, 2021	Version 8.7.2 SIOS Protection Suite-Windows and DataKeeper Cluster Edition Announcing Version 8.7.2 of SIOS Protection Suite-Windows and DataKeeper Cluster Edition We are pleased to announce the release of SIOS Protection Suite for Windows version 8.7.2 including DataKeeper Cluster Edition. The new release features the following: New Oracle Pluggable Databases (PDB) Application Recovery Kit Oracle Pluggable Database is recommended on Oracle 19c and required on Oracle 20c onward No additional SIOS license required for the SIOS Protection Suite PDB application recovery kit but an existing Oracle resource is needed. Support for additional platforms and operating systems, including: Azure Stack (Hub) (to include Windows Server 2019), vSphere 7, and PostgreSQL 12. For our full list of supported products, visit the SIOS Protection Suite – Windows 8.7.2 support matrix. Reproduced from SIOS