March 31, 2021 |
Seven Skills That Your Team Needs if You are Going with Open Source High AvailabilitySeven Skills That Your Team Needs if You are Going with Open Source High AvailabilityIn the realm of High Availability (HA) there are certain important skills your team needs if you decide to go the route of open source. Open source by definition denotes software that is freely available to use. Today, there are numerous commercial implementations of high availability clusters for many operating systems provided by vendors like Microsoft and SIOS Technology Corp. These commercial solutions provide resource monitoring, dependency management, failover and cluster policies, and some form of management prepackaged and priced. An alternative to commercial implementations are several open source options that also give companies the opportunity to provide high availability for their enterprise. As companies continue to look for optimizations, cost savings, and potential tighter control, a growing number of companies and customers are also considering moving to open source availability solutions. Here are seven skills that your team may need for a move to Open Source HA:1. Coding skillsIn many cases the lack of pre-packaged and bundled support for enterprise applications means that your team will need to be able to develop solutions to protect components, fix issues with bundled components, or write application connectors to ensure application awareness is properly handled. Lots of people can write scripts, but your team will need to know how to create and adhere to sound development practices and standards. The basics of this include things such as:
2. Knowledge of the technology environmentMany enterprise applications require integration with multiple systems in order to provide high availability that meets the Service Level Agreements (SLA) and Service Level Objectives (SLO). Your team will require deep application awareness and knowledge of the technology environment to build protection and solutions for this integration with multiple enterprise systems. You need people who know the ins and outs of the critical applications, the technology environment for those applications, networking, hardware, hypervisors, and an understanding of the environmental and application dependencies. You’ll also need team members who understand the architecture, features, and limitations of the set of HA technologies that you intend to use from the Open Source community. Consider how much of these areas your team knows and understands:
3. Business process knowledgeYou need someone to understand your business requirements, and the business process. Your team needs professionals who understand the enterprise’s business and the processes that drive it. Your team will need to know and understand how much budget is available to spend for developing the solution, how much risk the business is willing to take, and how to gather additional requirements that may be unspoken or unspecified. The team will also need to know, or to hire someone who knows how to convert those business requirements into software requirements and how to manage a process that brings a minimum viable high availability solution to fruition that meets the needs of the business, the speed of the business, and fits within the processes of the business. 4. Experience with OS, Applications and InfrastructureIf you are looking to go all open, your team will need experience understanding Operating Systems, Applications and Infrastructure. You’ll need to understand the various OS release cycles, including kernel versions for Linux, updates and hotfixes for Windows. You have applications in house that need to be supported, but you’ll need to also be diligent to understand the application update cycle, their dependencies, and the intersection of applications and OS support matrices. If your environment is homogeneous, great. Otherwise, your team will need to know the differences between RHEL, RHEL derivatives, and SUSE. If you are both Linux and Windows you’ll need to know these as well. You’ll also need to understand the difference that the infrastructure will make on the application and OS combination. AWS and Azure present differences for high availability that differs from GCP, on-premise, and other hypervisors. 5. Change management capabilitiesImagine that you have the development team to create the solution, with technical and business knowledge along with a firm grasp of the OS, Infrastructure and Applications. But, getting the scripts together is just the beginning. Your team will also need change management capabilities. How will your team keep track of the code changes and versions, packages, and package locations? How will your team manage the releases of updates and changes? Your team will need to be versed in a source repository, such as git, project management tools, such as Jira, and release train proficiency. You’ll need a team that understands how to make updates to code, deliver patches and fixes, all while avoiding unwanted impact. 6. Data analytics and troubleshooting experienceWhen you enter the space of delivering your own HA solution your team will need analytics and troubleshooting experience. You’ll need to have resources who understand the intersection of application code, system messages, and application error logs and trace files. When a system crash occurs, you’ll have to dig deeper into the logs to troubleshoot and find the root cause, analyze the data to make recommendations, and be prepare to roll out changes (see #5 above). Don’t forget, your team will also need to know and understand what the data from these logs and trace files can tell you about the health of your environment even when there isn’t an error, failure or system crash. 7. Connections (Dev, QA, Partners, Community)Let’s be honest, your business isn’t about delivering high availability, but if you decide to dive into the realm of open source HA you are going to need more help than just the brilliance on your team. Key to getting that additional help will be understanding where to start and then making the right connections to community developers, persons who are experts on testing, HA and application partners, and the open source community. Open forums have been really helpful, but you’ll need to double check if the response times are compliant with your SLAs and SLOs. Using Open Source solutions is an option that many companies choose to pursue for cost concerns and a perception of flexibility, lower cost, and less risk. But, buyer beware, there may be hidden costs in the form of new skills and management, and hidden risks in terms of the open source programs you use that will be needed for any “roll your own HA solution.” – Cassius Rhue, VP, Customer Experience Reproduced from SIOS |
March 25, 2021 |
Cloud Migration Best Practices for High AvailabilityCloud Migration Best Practices for High AvailabilityIn 2020 we have seen more enterprises migrating more of their mission-critical applications, ERPs and databases to the cloud. However, not all of these migrations have been smooth. I have personally witnessed cloud migration projects dramatically slowed and even stopped due to a lack of planning for application availability, the complexity of retrofitting ‘DIY High Availability’, misunderstanding related to what a ‘lift and shift’ entails and unexpected costs. There are a number of best practices, cloud checklists, and other ways for organizations to prepare for the cloud. The following best practices should be factored into every migration strategy for high availability clustering for those who have either hit pause on their 2020 cloud migration, or plan to forge ahead in 2021. Cloud Migration Best PracticesGather the requirementsMany organizations moving to the cloud think that the cloud is an on-premises architecture moved to the cloud. This misunderstanding in cloud migration often leads to stalls and delays when networking, storage, disk speeds, and system sizes for on-premises collide with the cloud reality. A smoother transition to cloud begins by gathering the real requirements for the infrastructure, governance and compliance, security, sizing, and related controls and resources. Design and DocumentIn the design phase, the architecture of on-premises environments is mapped to the cloud environment that has been chosen for maximum availability and thoroughly documented. In this phase, as the architecture takes shape and you identify the strategy for IPs, load balancers, IOPS, and data availability. Teams need to look at how availability native to the cloud needs to be augmented with a robust application and infrastructure availability solution capable of automating complexities of the cloud. At SIOS, our experts in AWS and Azure clustering and availability work with customers to swap on-premises NFS for AWS EFS, Azure ANF, or a standalone NFS cluster tier. Additionally, a key part of the successful implementation in this phase will be documenting everything. Documentation is an often-neglected, but essential element of migration success. Plan for High AvailabilityAchieving high availability in the cloud requires understanding the requirements, creating the design, and documenting a plan that lays out a strategy for achieving those requirements. A basic plan should include staffing, staff training, deploying a QA system testing, pre-production steps, deployment, post deployment validation, and on-going iterations. The best outcomes for cloud migration arise from a deliberate, planned process; not an ad hoc, break-fix approach. StaffHow well is your team staffed for the cloud migration? Traditional help desk, client/server IT, or IT teams may not be enough for the cloud migration. If your team is new to the cloud, it may be time to consider adding more resources or professional services-based solutions. Migrating to the cloud can be taxing, tedious, and difficult without the proper insight, information, or training. Does your staff need to incorporate training related to the cloud environment? And while you are looking into training and professional services to assist your IT team, check with your vendor for training related to the availability solution. Many vendors provide flexible training for the HA solution and cloud training can be obtained with the cloud vendors or popular sites such as Udemy. Deploy QAThe QA deployment phase is the phase in which the team executes the plans for deploying the actual systems into the cloud. Successful deployment teams validate their plans and strategy, understand the data migration process, uncover any missing dependencies, and prepare for the next step in the process, especially testing. When this step is skipped or skimped on, the once-promising migrations often stall or fail. When you reach the QA system deployment phase, your team will do the heavy lifting of the initial migration and configuration of the applications, databases, and critical data in the cloud. Test Your High AvailabilityTesting in your QA environment is a critical step. These tests are not a waste of time; they are a time saver. Deploying environments in the cloud is often easier than deploying on-premises. Your QA environment can be scripted with tools like Ansible, deployed quickly as templates from the cloud marketplace or a cloned image, or deployed and built from cloud formation templates. Once deployed, disaster scenarios can be ironed out and optimized before a disaster, not in them. Test scenarios can be leveraged to identify overprovisioning, under-provisioning or bottlenecks with networking or disk speeds. A full test scenario can also be used as a part of an on-boarding strategy for new staff. Additionally, testing should be performed on snapshots and backups as well. Deploy ProductionWhen the testing phase completes, and your team has validated the test results, the next phase is to move from QA to pre-production, and from pre-production to go-live. The testing phase is the last phase of the heavy lifting involving final user acceptance testing, a final cutover and update of the production data, and then the users. Review, Revise, and RepeatA successful migration does not end once you reach the go-live phase, but continues through the lifecycle phases. In the post go-live phase of the cloud migration strategy, your team continues to review, revise, and repeat the steps from ‘Gather’ through ‘Deploy Production’. In fact, your team should repeat this process again and again, based on requirements specific to releases, application updates, security updates, related system maintenance, operating system versions, disaster recovery planning, as well as the requirements from your high availability vendor’s own best practices. The cloud platform is always evolving and adding new features, functionality, and updates that can enhance your existing HA solution and architecture. Reviewing, revising, and repeating the process will be a necessary step in successful onboarding. In 2021 we’ll see more enterprises migrating more mission-critical applications, ERPs and databases to the cloud. A key major factor in their success will be utilizing cloud migration best practices to avoid delays and failures throughout the process. Understanding your business requirements and needs, documenting the design and plan, deploying in a QA environment with purpose built clustering solutions, and executing extensive testing before go-live will be essential. Contact SIOS Technology to understand how the SIOS Protection Suite can be included in your thoughtful cloud migration best practices. -Cassius Rhue, VP, Customer Experience Reproduced from SIOS
|
March 21, 2021 |
The New Normal Will Still Include High Availability |
March 16, 2021 |
How To Build A Highly Available Server Solution?How To Build A Highly Available Server Solution?A key component to any high availability solution is figuring out how to redirect the client traffic. Almost every user-based application needs to connect to the server. Redirecting the client traffic will allow users to connect without having to know where the application or the database actually resides. Most solutions recommend network-based IP redirection or network based DNS redirection. This works. However, the best solution for a high availability server in our experience is the use of a virtual IP address that can be switched from one server to another. The server is listening to connections from the virtual IP address, where it’s hosted on one server today and switched to another on another day. To take it one step further, you can automate the failover. This is where the system makes decisions and switches the application when there is a failure detected. Bear in mind this step is key to building a highly available solution. Benefits of Buy vs. Build High Availability SolutionThis can be implemented using scripts and logic to check the status of processes and virtual IP addresses from one server to another. But one of the challenges we face in a buy vs build high availability solution is how much time we really have to spend in build. This includes time for script coding, API development such as cloudwatch API or lambda functions. Let’s not forget testing, and maintenance. When I was younger, I was eager to write that code. But after working for large Fortune 100 companies, and getting yelled at by a high level manager, when one of my scripts didn’t work at 3 am in the morning, I feel differently. This issue was exacerbated when I discovered an issue for a code I wrote a year ago. My managers wanted the highly available solution to work 100%. If it didn’t work, time to call up someone and yell at them. SIOS Automates High AvailabilityIsn’t it cheaper in the long run to buy the solution and spend a little time to tweak it to fit into our setting? This is where SIOS high availability (HA) solutions come in, whatever the application or database. SIOS has the code to switch the stack of the processes from one server to another. This gives users and managers the peace of mind that comes from automating the failover orchestration and high availability. There are two things that I love about the SIOS HA umbrella are. One, the code for the virtual IP where the IP address is added to the server and the application is restarted to listen to the connections. The second is enabled through the use of the application agnostic API set that SIOS provides. This allows anyone to protect any application by the use of plugins. Contact SIOS today to learn more about high availability solutions specific to your environment. – Edmond Melkomian, PMP, MCSD, consultant, SIOS technology, Inc. Reproduced from SIOS |
March 8, 2021 |
Stages of IT Disaster Recovery GriefStages of IT Disaster Recovery GriefDisaster recovery grief can hit you out of nowhere if you haven’t implemented the right enterprise availability architecture. Meet our friend Dave in IT to walk us through the 5 stages of disaster grief. Stage 1: DenialDave in IT: “Uh oh. What’s that alert? It’s just a little application crash, right? No big deal. I’ll have things up and running in no time.” In the land of enterprise availability, there is no such thing as a little application crash or no big deal. Companies have SLA with real money on the line. Your selective reality is probably not the same perspective of your customers and stakeholders. Stage 2: AngerDave in IT: “Are you kidding me. Of all the… [censored]...times, today the application won’t start. Ughh. I hate this[censored]...[censored]... application. Wait, what’s this new alert. Seriously, now, the datacenter is down!” It gets messy really, really fast in the fast pace, and high stakes environments. When unchecked alerts and failures happen, problems can mount quickly along with pressure, frustration and anger. State 3: BargainingDave in IT: “Hey Ard in Applications, this is Dave in IT. Do you guys have any backups for the App1 environment? . . .Ard are you sure? Could you just check again? I know you’ve checked twice, but can you check one more time. I’ll buy drinks on Taco Tuesday!” Dave in IT: “Hey Donna DBA, this is Dave in IT. Art in Applications said you might help me out. Did you by chance setup any database replication for that finance database or the inventory management system? . . . Are you sure? Umh, do you remember if we have any way to recover from a umh . . . datacenter crash?” When my daughter gets in trouble, bargaining is her first go to. Okay, second. The first is to disappear, but you’re too smart to just walk away from the flames. But, Dave in IT isn’t the only one to realize that bargaining and begging is a poor substitute for a well defined strategy for high availability and disaster recovery. Skip the bargaining and begging about your disaster because “80% of the people don’t care, and 20% are glad it’s you (paraphrased from Les Brown).” Stage 4: SadnessDave in IT: “This is just great. The application server crashed, the datacenter is down, and backups, if I can find them and if I can load them, will take hours to get restored. There is no way I’m getting out of this… where did I put that updated resume.” Of course you have backups, and you’ve validated them. But there is an RTO and RPO impact of going back to those backups. Are you able to absorb this time? That is of course, after your data center recovers. Step 5: AcceptanceDave in IT: “It’s been two hours. I never knew we had this many Executive stakeholders before. No way I’m making it to my 2nd year anniversary after this. Well, I guess I’ll clean out my office tomorrow. No way I’m making it through this!” Failures happen. Datacenters go down. Applications fail. There is no denying the possibility of losing a data center, having a server fail, or an application crash. This type of acceptance is normal, a part of improving your availability. Accepting that you may lose your job or worse because you failed to implement an availability strategy is something the experts at SIOS Technology Corp. want to make sure you avoid. Don’t be like Dave in IT. Avoid the stages of disaster grief, and the hours of disaster recovery and downtime by architecting and implementing an enterprise availability architecture that includes the best of hybrid, on-premise, or cloud coupled with the best solution for monitoring, recovery, and system failover automation. – Cassius Rhue, VP Customer Experience Reproduced from SIOS |