March 3, 2022 |
Disney’s Encanto – Lessons on High Availability, IT Teams & downtimeLessons on High Availability, IT Teams, and defeating downtime from Disney’s EncantoOver the weekend I’ve joined the masses of people who have tuned in to Disney’s Encanto and become a fan of the story, a student of the lessons and opportunities, and an absolute fan of Lin-Manuel Miranda. What does Disney’s Encanto provide in relation to High Availability, Clustering, and Resiliency? Lessons on High Availability, IT Teams, and defeating downtime from Disney’s Encanto In Encanto you quickly learn that the Family Madrigal is a special family. In one of the opening songs, “The Family Madrigal” we understand that all of the members of the family have unique and special gifts; superhuman strength, the ability to hear for miles, prophecy and prediction, the power to conjure beautiful flowers and plants, the ability to shape-shift, the ability to heal, and the ability to control the weather. Well, everyone it seems has a ‘gift’ except Mirabel. Lesson 1: You don’t need superhuman gifts to make a difference.Mirabel, while not gifted like the other siblings and members of the family, is the central figure in understanding the health, and disease of the family. Moreover, she is able to help the family put things back together when it all falls apart, without the other gifts. You need High Availability, but you don’t have to break the budget, develop supernatural abilities, or depend on a miracle to achieve it. As the movie continues, Pepa’s youngest son Antonio is readied for his gift ceremony. However, during the party and celebration Abuela notices cracks in the foundation of Casita. But her warnings go unheeded. Lesson 2: Don’t ignore the cracks.When Mirabel sees the cracks it leads her on a quest to find out what is endangering Casita and how she can help. Initially, she is ignored by the others and even rebuked. How will you respond if you see cracks or shortcomings in your IT infrastructure, or cracks in your architecture and design? Will you ignore the cracks, pretend they aren’t seen or even rebuke the team for finding them? Don’t ignore the cracks. Responding to the first sign of an issue is most often the perfect way to prevent a greater issue. On her quest to find answers and save the miracle’s magic, Dolores tells Mirabel to talk to her super-strong older sister, Luisa who initially suggests that everything is okay and that there is absolutely nothing wrong. But Luisa eventually begins to reveal that the weight of knowing there is an is becoming too much for her to carry alone. Lesson 3: The weight of HA is too big for a single person or team.As Luisa put it, “It is pressure that breaks the camel’s back, pressure that’ll never stop.”. Developing an High Availability solution, designing and architecting for resilience and data availability is not a simple process, and it is definitely not a task for a single person or single team. Your DBA, IT Admin, and ERP Administrators cannot handle the weight of maintaining critical enterprise availability alone. Likewise, a one-dimensional approach cannot carry the weight of four (4) nines of availability. Instead, it takes a fully aligned team working in concert with a complete HA solution to understand, design, develop, and deploy the tools and techniques. How well are the roles and responsibilities on your IT teams distributed and defined? Ensure no one is bearing the responsibility for HA alone. When Mirabel seeks Bruno for the answers she is looking for, everyone says, “We Don’t Talk About Bruno.” Bruno’s gift is precognition, but because of his warnings and seemingly negative visions, he disappeared. Lesson 4: Don’t be afraid of the person who sees trouble ahead.As VP of Customer Experience, I’ve helped customers perform health assessments for their infrastructure and clustering solutions. When the health check completes, not all customers are happy to hear that they have issues to resolve. We all do all we can to avoid the bad news. But, ignoring upgrades, forgetting to do maintenance, and downplaying risks identified by the Bruno of your team will not make the trouble disappear. In fact, it may make your worst fears a reality. Mirabel eventually finds a secret passage leading to Bruno and discovers that Bruno never left, but felt that he had to destroy her vision to protect her and himself. Lesson 5: Corporate culture can crush or create higher availabilityYour culture can either crush or create a space for higher availability and resiliency. Mirabel asks Bruno if he has been patching the cracks in Casita, but Bruno replies that he is afraid of the cracks. Lesson 6: Don’t be afraid of the cracksHA requires continuous, coordinated ongoing effort. An essential part of the effort is finding solutions and fixes for those IT cracks that could jeopardize your application or the gaps between architecture and execution. Even as Bruno (or Hernando) tries to patch the cracks, it is apparent that the foundational issues are too much for spackle and superficial solutions. Lesson 7: Spackle won’t fix a foundational problemTake a look at your infrastructure and look at the ways in which problems are being addressed. Are you deploying workarounds, band-aids, and temporary “hacks”, or are you looking at architectural and foundational solutions that address the root cause of the problem with your clusters, enterprise availability, and execution during disasters? Lesson 8: Find your JorgeIf you’ve been deploying more hacks and workarounds than root cause solutions, find your Jorge. Find a skilled team member, partner, or solution provider and give them permission to grapple with implementing the foundational solution that will fix the problem or strengthen the infrastructure. Bruno sees another vision that Casita could be saved if Mirable hugged Isabela. Mirabel offers Isabela an opportunity to blossom but Abuela doesn’t see it that way. An argument between Mirabel and Abuela ensues,and Abuela blames Mirabel for the cracks in ‘Casita’. Mirabel blames Abuela for her impossible demands, unrealistic expectations, and misplaced hopes. Lesson 9: Blame creates more problemsPass the Blame is a great party game, but it is not great for HA, cluster resilience, or data protection. I once helped a customer whose organization illustrated the unproductiveness of blame. After a proof of concept cluster hit an issue causing a delay, the Project Manager blamed the application team for the delay. The applications team blamed the backup administrator, who in turn blamed the infrastructure admin. Throughout the blaming session, their cluster remained unavailable, the proof-of-concept remained stalled, and the only progress being made was in the cracks of anger growing between teams. It was only when they put these differences aside that they could make the adjustments they needed to resolve their issue and continue with a successful POC. ‘Casita’ collapses and Mirabel runs away. Later, Alma finds Mirabel and after reconciling they join the family and village in building back Casita Better than ever. Lesson 10: Build it back strongerOf course, the final scenes of Encanto are filled with lessons in the confession of Alma (Abuela) such as:
But the most important of the final lessons is to build back better, stronger, and together. After every unplanned or planned outage, there will be lessons learned from root cause analysis, experiences and fresh understanding. As a result of this, there will also be an opportunity to build back a stronger solution and architecture for your high availability and disaster recovery. Consider the case of a customer who was able to create a standard deployment pipeline and QA system after discovering an outage was caused by code deployed directly to production. Or another customer who uncovered that disk and database warnings were being suppressed for weeks before the outage. Don’t waste the time and opportunity that comes when you have downtime. Be sure to work together to avoid the silos, dependencies on single strengths, or placing the hope of your infrastructure on the wrong thing. Of course, you should watch the whole movie for yourself, but there are even more lessons for HA as you walk through the magic and music of the movie and pick up on the lives and lessons from a few of the other characters
The movie closes with a great reunion and Mirabel and the Madrigals stand in front of the finished house. When Mirabel touches the doorknob to the door the ‘Casita’ springs back to life and the home along with the magical gifts of the family all return. Try these ten lessons for High Availability from Encanto, enjoy the movie, and remember “There is nothing you can’t do… together” with your team of customers, partners, solution providers, and administrators. |
February 27, 2022 |
How To Activate a License for SIOS Protection Suite for LinuxHow To Activate a License for SIOS Protection Suite for LinuxSince you have acquired your SIOS Protection Suite for Linux software, you will need to activate your license. This seven-minute video will help you get started. It walks you through all of the steps needed to begin running your SIOS Protection Suite for Linux software. Watch as a SIOS support representative demonstrates the steps that are necessary to install SIOS licenses: how to insert entitlement/activation IDs, how to obtain and insert host IDs, and activation file download. The video illustrates where to access software for download, how to view and validate host name and ID from purchased or trial entitlements, and how to download the activation files contained in your welcome email to complete the process. You will also learn how to access our SIOS Documentation portal, where you can find release notes, installation guides, technical documentation and in depth information on SIOS Protection Suite for Linux as well as a wide range of topics for every SIOS product. Receive helpful tips and convenient insights on how to complete the steps quickly and easily. See how simple it is to start running SIOS Protection Suite for Linux. How To Activate a License for SIOS Protection Suite for Linux Reproduced with permission from SIOS |
February 23, 2022 |
How To Install A SIOS Protection Suite for Linux License KeyHow To Install A SIOS Protection Suite for Linux License KeyOnce you have installed SIOS Protection Suite for Linux software and have activated your license, you will need to install your license key before you can begin to run it. This 4 minute video will review how to install SIOS Protection Suite for Linux software and demonstrate how to activate your license to get started using your SIOS Protection Suite for Linux software. Watch as a SIOS support representative shows you how to check that your SPS image file is mounted, to ensure you have the license file, and how to install and enter the complete path name. Use our simple license key manager to validate your activated licenses from purchased entitlements, download and apply license keys and start your SIOS Protection Suite for Linux software. This video also walks through the process of how to access our SIOS Documentation portal, where you can find release notes, installation guides, technical documentation and information detailing SIOS Protection Suite for Linux as well as a wide range of topics on everything SIOS. View tips and convenient insights on how to complete steps fast and simply. Now you can begin protecting your critical applications with SIOS Protection Suite for Linux. How To Install A SIOS Protection Suite for Linux License Key |
February 19, 2022 |
How to Eliminate Single Points of Failure in the Cloud with High Availability ClusteringHow to Eliminate Single Points of Failure in the Cloud with High Availability ClusteringWhen providing high availability protection, it is a general principle to ensure all components are redundant to avoid Single Points of Failure (SPOF). That is, ensure that no single element causes the entire system to stop if it fails. However, it is important to note that the operational infrastructure is hard to access in the public cloud. In a cloud-based high availability cluster, there is a possibility that the standby node(s) will be located on the same host server, in the same rack, and using the same network switch as the operating node. Unless you configure these elements with redundancy, any of them could be a SPOF and put the application at risk for catastrophic failure. It is necessary to ensure cluster nodes are on different cloud “regions” and “availability zones” that physically separate the data center and operational infrastructure in different geographic locations. What are the main principles for ensuring availability?You cannot expect the various components that make up a physical IT infrastructure to operate according to specifications forever. Parts wear out, systems become incompatible, and settings change. Although regular maintenance can reduce the risk of downtime, it’s likely that something will fail over the course of the product lifecycle. In some rare cases, you may have a serious bug that is latent in the OS or embedded software that causes the application to stop working. As you may have already noticed, the High Availability cluster configuration is exactly in line with this principle, and a single point of failure is eliminated by making the important server and its resources redundant to the active system (production system). However, it is important to remember two things. One, the server hardware is not the only critical component. The second point, other critical SPOF components may be invisible to you in a public cloud infrastructure. Beware of the pitfalls of a single point of failure hidden in the cloud’s invisible infrastructureMost public clouds operate in a so-called “multi-tenant” mode. That is, they run the VMs of multiple companies on the same physical host server. And with a regular contract, you can’t specify which host server your system runs on. This may cause problems as the standby node in your cloud cluster may be placed on the same host server that operates the active node. Even if you configure an HA cluster configuration, if the host server goes down, the operating node and the standby node will both go down too. In this scenario, your cloud operator decides when and how your system will be restored. The host server that operates the active node and the host server that operates the standby node may be in the same rack. In this case, the rack becomes a SPOF, so if a failure occurs there both the active and standby nodes under it will also fail. Furthermore, in the upper layers of your infrastructure such as network switches that bundle multiple racks, gateways and routers, and power supply units in data centers, the operating system node and the standby system node may coexist in the same system. And if these key components aren’t redundant, then you have an inescapable single point of failure. Again, for a company that is a public cloud user, such a data center infrastructure is a black box. It may impossible to see into the detailed configuration to identify SPOFs. Public cloud availability zones and regions should be leveraged for availabilityHow can we explicitly avoid hidden single points of failures in the public cloud? The most robust method is to use the “Availability Zones” and “Regions” prepared on the cloud side. An Availability Zone is an independent physical separation of the infrastructure within your data center. And regions are independent data centers that are geographically separated. Some public clouds allow you to deliberately use these Availability Zones or regions for different purposes. For example, Amazon Web Service (AWS) has 12 regions worldwide. In addition, Microsoft Azure has 22 regions. By constructing an HA cluster configuration in which operating nodes and standby nodes are distributed in different availability zones across these two or more regions, almost all SPOFs can be avoided with certainty. If you adhere to these best practices, you can confidently ensure availability, DR (Disaster Recovery) and BCP (Business Continuity Planning). |
February 15, 2022 |
How to Protect Applications in Cloud Platforms – Clusters for Microsoft Azure High AvailabilityClusters for Microsoft Azure High AvailabilityHigh Availability & Clustering Solutions for AzureWhat is Azure Clustering?An Azure cluster is a set of technologies that are configured to ensure high availability protection for applications running Microsoft Azure cloud environments. In an Azure cluster environment, two or more nodes are configured in a failover cluster and monitored with clustering software. The application runs on a primary node in the cluster. If clustering software detects an application operation failure, it orchestrates a failover of the application operation to secondary node(s) in the cluster. SIOS DataKeeper Cluster Edition clustering software is a unique add-on to Microsoft Windows Server Failover Clusters (WSFC) that enables Microsoft clusters to run in Azure and Azure Stack. SIOS Protection Suite for Linux protects critical Linux applications like SAP, HANA Oracle, MySQL, or Postgres in Azure and Azure Stack. SIOS clusters uniquely enable cluster failover across Azure regions and availability zones for true 99.99% uptime and disaster recovery protection. Register Now for the SIOS Cloud AvailabilitySymposiumMicrosoft Azure-Certified Software for HA Clusters w/WSFCSIOS DataKeeper Cluster Edition software is Microsoft Azure-certified and available in the Azure Marketplace. It is the only Azure-certified software that enables customers to create a SANless high availability cluster in Azure or Azure Stack using Microsoft Windows Server Failover Clustering (WSFC). By adding SIOS DataKeeper software to WSFC they can quickly and easily protect business-critical Windows environments from downtime and data loss in a cloud or any combination of physical, virtual, or hybrid cloud environment. Now, for the first time, customers using SAN-based Windows server failover clusters to protect their most important applications are free to move them to Azure or Azure Stack and achieve the high availability protection they need. Find a step-by-step guide to creating an HA failover cluster in Azure here. Find SIOS DataKeeper in the Azure Marketplace here. Azure Site Recovery Compatibility for High Availability and Disaster Protection SIOS DataKeeper Cluster Edition is the only high availability solution certified for use with Microsoft Azure Site Recovery for cost-efficient high availability and disaster recovery protection for business-critical applications in Azure. SIOS DataKeeper’s compatibility enables customers to protect important applications, including SAP, SQL Server, and Oracle, in Azure cloud environments. SIOS DataKeeper Cluster Edition provides a simple way to use Windows Server Failover Clustering – including SQL Server Always On Failover Clustering – in a cloud environment. Customers can replicate the cluster to a geographically separated location using Azure Site Recovery for cost-efficient, robust disaster protection. Learn more about SQL Server High Availability in Azure. Together SIOS DataKeeper and Microsoft Azure Site Recovery enable the only option for local high availability protection along with disaster recovery in a highly flexible and on-demand solution. Protect Linux Applications in AzureSIOS Protection Suite for Linux lets you run your business-critical applications in Azure or Azure Stack without sacrificing performance, high availability or disaster protection. Learn more about SIOS SANless Software for Cloud High Availability. Protect SAP Applications in AzureSIOS Protection Suite and SIOS DataKeeper Cluster Edition provide comprehensive, fully SAP-certified protection for your SAP applications and data, including high availability, data replication, and disaster recovery in an easy, cost-efficient solution that can operate in the cloud, on-premises or in hybrid cloud configurations. Learn more about High Performance and High Availability for SAP on Azure Microsoft High Availability for SAP HANA database on Azure using SIOS Protection Suite Learn more about SIOS Protection Suite for SAP. See our latest blog posts about cloud high availability here. Reproduced with permission from SIOS |