January 8, 2021 |
How To Choose A Cloud When You Need High AvailabilityHow To Choose A Cloud When You Need High AvailabilityUnderstand the cloud marketA number of analyst firms are predicting an ever-increasing number of deployments of applications, databases, and solutions in the cloud. According to Gartner, firms are “moving to the cloud at an increasing rate.”[1] In fact, Gartner and other analysts expect the pace of cloud migration and deployment will continue to accelerate, driven in large part by the pace of innovation in the cloud. In a TechTarget article by Kurt Marko, of MarkoInsights, Marko notes that the pace of innovation that is “being undertaken in the cloud likely can’t be replicated on premises due to the elastic, scalable, and on-demand nature of managed public cloud services.” We see more and more companies that had been using the cloud only for DevOps applications and databases that were not essential to their business, are now moving mission-critical applications, ERPs and databases that require high availability protection to the cloud. If you are considering a move to the cloud – and it seems likely that you are – there are several keys to understand when you need high availability. Familiarize yourself with the cloud high availability optionsTo plan for the proper availability solution for a cloud or hybrid cloud deployment, consider what the pain points are with regards to both availability (99.9% uptime) and high availability (99.99% uptime). You also need to understand the options that are available for high availability with an eye towards your plans to migrate to the cloud. Notable analysts and experts suggest looking for solutions that will not only mitigate and reduce the pain of migrating your workloads, but will also provide a balanced and comprehensive approach to availability throughout the lifespan of your cloud architecture. Note, it is also wise to consider solutions that can provide protection and high availability for portions of your workload that may one day repatriate from the cloud back to your on-premises environment. Here are ten things to consider when comparing your availability options in the cloud:1. The deployment method. Is it possible to deploy the availability solution you are considering using an image, CLI, UI, or other repeatable solution such as cloud formation template or packaged scripts. 2. The system requirements. Most notably, consider the operating system (OS), disk, CPU, and memory requirements. 3. The deployment environments. Do your availability options support on-premises only, one or more public clouds, or can they support a mixture, and/or hybrid cloud deployment. Is there a SaaS offering available as well? 4. The breadth and depth of application protection. “Breadth” meaning what types of applications, databases, front-ends, networking, and infrastructure components can be protected? Is there a flexible framework for adding new applications and variants? “Depth” meaning – is the solution application-aware – and able to maintain application-specific best practices throughout the application failover/failback processes? 5. Performance requirements. We often think of RTO and RPO, but what about other performance needs of your solution. Will your availability solution cause performance issues on failover? 6. Resilience requirements. How large a cluster can the availability solution support?, How many faults and failures can it detect and recover from. How will replication be handled while keeping metadata in sync? 7. Supportability and maintenance. Does the availability vendor have experience with a wide range of availability needs and configurations? Do they have longevity, and a support system designed to address issues that may go beyond their solution? Can they help you minimize disruption and planned downtime during your system management and maintenance (patches, upgrades, and general maintenance). 8. Total cost of ownership. There are entire industries and services dedicated to helping you calculate the total cost of ownership, so we won’t cover that here. Suffice it to say, your calculations will be unique to your organization, cloud provider, applications, and IT team. You should consider whether your availability solution vendor can help you identify strategies for saving utilization, licensing, and other costs? Does the solution automate manual tasks, reduce IT labor time? 9. Licensing and pricing model. How do you consume the cost of the software? Is there a subscription fee, subscription model, pay-as-you-go offering, bring your own license (BYOL), or combination of flexible options. How will you enable the product licensing? Is there a license server, licensing service, or encrypted key based on virtual machine deployment details, such as address, hostname, MAC address. 10. The impact on IT staff. How much training with the solution require? How much manual intervention will be needed in the event of an application failure event or disaster? Will it require specialized scripting that needs to be maintained? Who will be responsible for ongoing maintenance? Weigh the benefits and trade-offsLike every important decision, you need to understand your tradeoffs and choose the best balance to meet your needs. For example, I recently asked a friend to recommend a good walking shoe. I bought a pair he raved about – noting how lightweight they were, how strong and durable the fabric, and how stylish they were. I went for my first long walk-run in them, and I donated my first pair of “one run” shoes immediately thereafter. When I went to ‘Fleet Feet’ to get an expert’s opinion I ended up with a heavier shoe, with more breathable fabric (also less durable), and an unrivaled level of hideousness. I made a tradeoff between appearance and function that worked for my needs and budget. Like running shoes, there is no silver bullet solution that will be the right fit for every company, every application, every database, and every possible server and architecture. You are officially free to stop looking for it. Instead, settle into the activity of weighing the trade-offs to determine what is the right fit for your company’s needs. Think about your tradeoffs. For example, if you’re sure you will be a full Microsoft shop, the importance of GCP and AWS support should be a little lower in your evaluation process. Take your IT infrastructure dynamics into accountThink holistically about availability in your entire IT infrastructure – both on premises and in the cloud. The reasons to do so are best explained with another analogy. In 2018, I was the coordinator for an outreach program feeding the homeless and hungry in Columbia, South Carolina. Our group met once a week to serve a meal and a message of hope to over 100 men, women and children. When we considered expanding – adding more days of the week, more hours, or additional services, we had to think well beyond simple scheduling requirements. Knowing that we were providing a critical service to clients who depend on us, we had to consider all the factors that affected our ability to deliver those services consistently for the long-term, such as: cost, ages of our team members, outside obligations, alternative methods to achieve our goals, risk factors, and other dynamics within our parent organization. When you are choosing your solution, after you’ve understood the market, familiarized yourself with options, and weighed the trade-offs, the last step is to take into account the various other dynamics in your overall environment. Will the solution meet the needs of your business as a whole? Will your critical data be protected from loss? Will your end-user productivity be protected from downtime? What training will be required to move to the cloud and how will that impact your ability to manage or maintain the solution that you choose? What IT roles will be added, removed, or changed in your cloud journey? Will any responsibilities for application availability move to any line-of-business owners? And how will the shifts in responsibilities, or team make up improve or decrease your overall potential for success. Consider whether your team needs to take a step-by-step approach, migrating smaller workloads first. As VP of Customer Experience, I have seen a wide range of cloud migrating planning – some straightforward others extremely disruptive. In one instance a customers’ move to the cloud was highly contentious because management saw it as an opportunity to eliminate an entire IT department. I’m not suggesting that you play politics, but you should be aware of all of the factors at play in these complex projects. Migrating to the cloud is supposed to save money, time and resources while affording improvements in availability and resilience. Regardless of which cloud you choose, make sure that you consider these tips and select the corresponding availability solution that gives you the flexibility to deliver the protection you need in the configuration you want. Learn more about cloud high availability options with SIOS. – Cassius Rhue, VP of Customer Experience, SIOS Reproduced with permission from SIOS |
December 30, 2020 |
How To Clone Availability In The Cloud With Better OutcomesHow To Clone Availability In The Cloud With Better OutcomesTips from the movies – MultiplicityMultiplicity is a 1996 American science fiction comedy film starring Michael Keaton as Doug Kinney, a busy construction worker struggling to make time for his family and his demanding job. When a scientist offers to clone him, Doug agrees to just make meeting his schedule and commitments easier. But then the copies of him begin making copies of themselves. By the time the last copy is made, the point is clear. Cloning may not be all it’s cracked up to be, or at the very least comes with some strong warnings, challenges and side effects. The famous original Star Trek episode “Trouble with Tribbles” illustrates a similar point. Like cloning on the big screen (or small), cloning in the cloud is a great tool, but not without its challenges. Tips for how to get better outcomes when you clone availability in the cloud1. Clone operational systemsThis sounds obvious, but I have seen it happen more than once in real enterprise environments. If you clone your non-functional system, the clone will be equally non-functional and problematic when you restore it. Be sure that the clone you make was from an operational and functional system. 2. Sync data to disk and resync on restoreFile system integrity is critical. If you don’t ensure your application and/or VM are in a consistent state, most vendors will not guarantee the resulting created image. Since snapshots only capture data that has been written to your volume at the time the snapshot command is issued, this might exclude any data that has been cached by any applications or the operating system. Making sure data has been properly synced to the file system is an important step, and absolutely critical in a cluster environment. File system integrity is also critical to keep in mind when you restore from an image. If you are using data replication and you restore an image as source or target in the cluster, making sure the two nodes are in sync is paramount. Failing to do so may lead to file system errors on failover or switchover, or even potential data loss. Clone availability in the cloud to get the result you want. 3. Stop your instanceMany environments do not require you to stop an instance to create an image, and some, such as AWS will do the step of powering down the node before making the copy. However, many tools and sites recommend making sure applications are stopped and file system access is properly synced to avoid damage, loss of integrity, or creating images that have trouble starting, stopping, or running installed applications. 4. Label everything in the cloud (nodes, disks, NICs, everything)While creating a clone is a free operation, the resulting disks and components typically are not. AWS states, for example, that you are “charged for the snapshots until you deregister the image and delete the snapshots.” When things aren’t labeled, knowing what is in use or not in use and why it was created can become problematic. It also becomes subjected to the fleeting memories or poor concentration of existing team members. Label everything. 5. Prune clones and snapshots often (cost savings and headache savings)Pruning old snapshots and clones is not only good for the cost savings, but it is also good for reducing headaches. Older snapshots run the risk of reintroducing vulnerabilities that have been addressed or resolved in newer copies. As VP of Customer Experience at SIOS Technology Corp., I saw the consequences firsthand when we worked with a customer who restored from a snapshot. They ran into several problems as they restarted the application. After troubleshooting, we determined that the clone was running an older version of security software. The cached credentials and metadata stored in the user profile were no longer in sync with the actual application data stored on the externally mounted data drives. 6. Limit or restrict cloning of clones in the cloudLastly, not everything you do in the cloud needs to be cloned. Consider limiting the types of workloads that you will clone and restrict the number or roles who can create clones in your environment. In the movie, when Doug’s clones sparked their own series of duplications, an already overwhelmed Doug (Michael Keaton) is forced to exert extra energy to manage his many clones while trying to hide the mess he created from his wife. Achieving clone availability in the cloud with better outcomes is not difficult. Clone carefully to avoid making more work and adding risk from a tool that was supposed to make your work easier and your environment safer. – Cassius Rhue, Vice President, Customer Experience Reproduced from SIOS |
December 26, 2020 |
New Product Release: SIOS Protection Suite for Linux 9.5.1New Product Release: SIOS Protection Suite for Linux 9.5.1SIOS is continually updating our products to meet our customer’s evolving needs for high availability for mission-critical applications. We are excited to announce the general availability of SIOS Protection Suite for Linux version 9.5.1! This release features adds support for a wider range of platforms and enhancements to our command-line interface feature. Key updates include
Reproduced with permission from SIOS |
December 22, 2020 |
Six Reasons Your Cloud Migration Has StalledReproduced from SIOS |
December 18, 2020 |
Calculating Application Availability In The CloudCalculating Application Availability In The CloudWhen deploying business critical applications in the cloud, you want to make sure they are highly available. The good news is that if you plan properly, you can achieve 99.99% (4-nines) of availability or more. However, calculating your true availability may not be as straightforward as it seems. When considering availability, you must consider the key components that make access to your application possible, which I’ll call the availability chain. Component of the availability chain are:
Your application is only as available as your weakest link, and your downtime increases exponentially with each additional link you add to the chain. Let’s examine each of the links. Compute AvailabilityEach of the three major cloud service providers have some similarities. One thing in common across all three platforms is the service level agreements (SLA) they will commit to for compute. The SLA for all three public cloud providers for VMs when you have two or more VMs configured across different availability zones is 99.99%. Keep in mind, this SLA only guarantees the remote accessibility of one of the VMs at any given time, it makes no promises as to the availability of the services or application(s) running inside the VM. If you deploy a single VM within a single datacenter, this SLA varies from “90% of each hour” (AWS) to 99.5% (Azure and GCP) or 99.9% (Azure single VM when using Premium SSD). True high availability starts at 99.99%, so the first step is to ensure your application is available is to make sure the application is distributed across two or more VMs that span availability zones. With two VMs spread across two availability zones, giving you 99.99% availability of at least one of those VMs, you could theorize that if you had three VMs spread across three availability zones your availability would be even greater than 99.99%. Although the cloud providers’ SLA will never guarantee beyond 99.99% availability regardless of the number of availability zones in use, if you use pure statistics you might come to the conclusion that your availability could jump to as high as 99.999999% or 8-nines of availability, 26.30 milliseconds downtime per month. 1-(.0001*.0001) = .99999999 99.999999% availability with three availability zones? Don’t go around quoting that number. But just keep in mind that it makes sense that if two availability zones can give you 99.99% availability. It stands to reason that three availability zones is going to give you something significantly more than 99.99% availability. Compute is just one link in the availability chain. We still have to address network, storage and other dependent services, which all represent possible points of failure. Network AvailabilityIn order for your application to be available, every network hop between the client and the application and all the resources that the application depends on, must be available and working within tolerable latency ranges. You need to understand the network links between database servers, application servers, web servers and clients to know precisely where the network might fail. Remember, the more links in your availability chain the lower your overall availability will be. Although network availability betweens VMs in the same vNet are covered under the standard compute SLA, there are other network services that you may be utilizing. Here are just a few examples of network services you could be utilizing which would impact overall application availability. Express Route – 99.95% Building on what we have learned so far, let’s take a look at the availability of an application that is deployed across two availability zones. 99.99% compute availability 99.99% load balancer availability .9999 * .9999 = .9998 99.98% availability = ~9 minutes downtime per month Now that we have addressed compute and network availability, let’s move on to storage. Storage AvailabilityNow here is where the story gets a little hairy. Have a look at the following storage SLAs https://azure.microsoft.com/en-us/support/legal/sla/storage/v1_5/ https://cloud.google.com/storage/sla https://aws.amazon.com/compute/sla/ It seems pretty clear that Azure and Google are giving you a 99.9% SLA on block storage solutions. AWS doesn’t mention EBS specifically here. They only talk about VMs and measure their single instance VMs availability by the hour instead of by the month as the other cloud providers do. For sake of discussion, lets use the 99.9% availability guarantee that both Azure and GCP have published. Building upon our previous example, let’s add some storage to the equation. 99.99% compute availability 99.99% load balancer availability 99.9% managed disk .9999 * .9999 * .999 = .9988 99.88% availability = ~53 minutes of downtime per month. 53 minutes of downtime is a lot more than the 9 minutes of downtime we calculated in our previous example. What can we do to minimize the impact of the 99.9% storage availability? We have to build more redundancy in the storage! Fortunately, we usually include storage redundancy when planning for application availability. For instance, when we stand up web servers, each web server will typically store data on the locally attached disk. When deploying domain controllers, Microsoft Active Directory takes care of replicating AD information across all the domain controllers. In the case of something like SQL Server, we leverage things Always On Availability Groups or SIOS DataKeeper to keep the data in sync across locally attached disks. The more copies of the data we have distributed across different availability zones, the more likely we will be able to survive a failure. For example, an application that stores its data across two different disks in different availability zones will benefit from the redundancy and instead of 99.9% availability it is more likely to achieve 99.9999% availability of the storage. 1 – (.001 * .001) = .999999 If we throw that into the previous equation, the picture starts to look a little brighter. .9999 * .9999 * .999999 = .9998 99.98% availability = ~9 minutes of downtime By duplicating the data across multiple AZs, and therefore multiple disks, we have effectively mitigated the downtime associated with cloud storage. Application And Dependent Services AvailabilityYou’ve done all you can do to ensure compute, network, and storage availability. But what about the application itself? Some applications can scale out and provide redundancy by load balancing between multiple instances of the same application. Think of your typical web server farm where you may typically load balance web requests between five servers. If you lose one server, the load balancer simply removes it from its rotation until it is once again responsive. Other applications require a little more care and monitoring. Take SQL Server for instance. Typically Always On Availability Groups or Failover Cluster Instances are used to monitor database availability and take recovery actions should a database become unresponsive due to application or system level failures. While there is no published SLA for SQL Server availability solutions, it is commonly accepted that when configured properly for high availability, a SQL Server can provide 99.99% availability. You may rely on other cloud based services, like hosted Active Directory, hosted DNS, microservices, or even the availability of the cloud portal itself should all be factored into your overall availability equation. SummaryApplication availability is the sum of all the moving parts. Skimping in just one area can exponentially impact the overall availability of your application. Take your time and investigate all the links in your availability chain for weakness including compute, network, storage, application and dependent services. In general the numbers presented here are hopefully worst case scenarios and your actual availability should exceed the published SLAs. Do your homework and be wary of any service that can not guarantee 99.99% availability, the typical threshold of what is considered highly available. Human error and security were not addressed in this article. You can make your application as highly available as possible. However, if you have not taken steps to secure your application against external threats and stupid human mistakes then all bets are off when it comes to availability. |