Date: August 30, 2024
Tags: High Availability
6 High Availability Lessons Learned from Cybersecurity Nightmares
Recently, a security provider reported some best practice advice and recommendations for companies in light of rising security threats. While security threats should receive attention from every business, this advice isn’t limited to cybersecurity; it is equally relevant to HA partners and customers with critical applications and services to protect.
Six Takeaways for HA From Recent Articles on Cybersecurity
- Take IMMEDIATE steps to ramp up HA
Waiting for a downtime incident and focusing on fast recovery is a bad strategy. Preparing and preventing downtime is a better solution. Start identifying critical Tier1 and Tier2 applications, databases and services. Tier 1 should include all business-critical solutions that cannot be inaccessible. That is, applications that must be available 24/7 and will cause serious business consequences if they go offline. While Tier 2 applications should be running as often as possible, they are less critical and your business can tolerate outages of up to a few hours without significant business impact. - Prepare a comprehensive HA protection plan
Develop a plan for protecting the key applications, databases and services. Be sure this plan includes architecture and design documentation as well as personnel responsibilities for responding to downtime. Always prepare a process for deploying clusters in a QA or sandbox environment with an eye to documenting the activities and details into a runbook. These sandbox and QA systems can also be used for testing, training, and validating upgrades, hotfixes, and maintenance. - Recognize and address high availability risks
All organizations must recognize that no company is safe from the disruption of downtime, regardless of size or location. Small, medium and large businesses are all susceptible to disasters, whether natural or man-made. Large organizations will experience their fair share of user errors, data center failures resulting from local construction, failed infrastructure, and outages from networking components. Small and medium entities, esp. those with on-prem solutions or smaller IT teams, need to add HA protection as well. While larger companies may lose more money in an outage, small to medium businesses are likely to lose a comparable amount as well. It is important to note that moving to the cloud is not enough to prevent all risks. - Business executives need to lead in high availability strategy
Preparing for and guarding against downtime is not just an IT team issue. Business executives need to be onboard with protecting the business from known risks, exposure, and downtime threats. This means key executives and stakeholders need to proactively ask about HA coverage, plans and staffing. Business executives should also prepare to make investments in preparing for the unexpected by ensuring full coverage of tier1 applications, databases, services, and data. They should also proactively expand coverage to tier2 and beyond. - The importance of ongoing communication for HA
Business leadership, Admins from all areas (network, storage, compute, cloud, database, and applications) should convene frequently to discuss new and existing HA threats, new and existing challenges, and ongoing requirements. Keeping internal team discussions going to understand requirements and business continuity plans is a must. In fact, HA considerations need to become an integral part of all relevant internal planning and communications from the C-level to the entry level. However, this communication cannot stop with in-house stakeholders. Instead, business leaders and HA stakeholders must review corporate posture, HA requirements, and critical findings with the HA vendor and their R&D and support teams. - Don’t wait for a disaster to review your HA solution
Companies need to review their plans and ability to execute those plans on a frequent basis. This review needs to go deeper than reading the runbook, documentation, or cluster design. Review the runbook alongside hands-on exercises to validate and update the runbook. Test the process for restoring systems after downtime, including client and business operation restoration
Act Now to Protect Your Systems from Downtime
Similar to security recommendations, companies should take immediate action to secure their systems, solutions, applications and data from downtime and disasters. Don’t wait for a disaster to reveal gaps in your HA strategy. Contact SIOS today to enhance your HA strategy and safeguard your business against unexpected disruptions.
Reproduced with permission from SIOS