Fifty Ways to Improve Your High Availability
I love the start of another year. Well, most of it. I love the optimism, the mystery, the potential, and the hope that seems to usher its way into life as the calendar flips to another year. But, there are some downsides with the turn of the calendar. Every year the start of the New Year brings ‘____ ways to do_____. My inbox is always filled with, “Twenty ways to lose weight.” “Ten ways to build your portfolio.” “Three tips for managing stress.” “Nineteen ways to use your new iPhone.” The onslaught of lists for self improvement, culture change, stress management, and weight loss abound, for nearly every area of life and work, including “Thirteen ways to improve your home office.” But, what about high availability? You only have so much time every week. So how do you make your HA solution more efficient and robust than ever. Where is your list? Here it is, fifty ways to make your high availability architecture and solution better:
- Get more information from the cluster faster
- Set up alerts for key monitoring metrics
- Add analytics. Multiply your knowledge
- Establish a succinct architecture from an authoritative perspective
- Connect more resources. Link up with similar partners and other HA professionals
- Hire a consultant who specializes in high availability
- 100x existing coverage. Expand what you protect
- Centralize your log and management platforms
- Remove busywork
- Remove hacks and workarounds
- Create solid repeatable solution architectures
- Utilize your platforms: Public, private, hybrid or multi-cloud
- Discover your gaps
- Search for Single Points of Failure (SPOFs)
- Refuse to implement incomplete solutions
- Crowdsource ideas and enhancements
- Go commercial and purpose built
- Establish a clear strategy for each life cycle phase
- Clarify decision making process
- Document your processes
- Document your operational playbook
- Document your architecture
- Plan staffing rotation
- Plan maintenance
- Perform regular maintenance (patches, updates, security fixes)
- Define and refine on-boarding strategies
- Clarify responsibility
- Improve your lines of communication
- Over communicate with stakeholders
- Implement crisis resolution before a crisis
- Upgrade your infrastructure
- Upsize your VM; CPU, memory, and IOPs
- Add redundancy at the zone or region level
- Add data replication and disaster recovery
- Go OS and Cloud agnostic
- Get training for the team (cloud, OS, HA solution, etc)
- Keep training the team
- Explore chaos testing
- Imitate the best in class architectures
- Be creative. Innovation expands what you can protect and automate.
- Increase your automation
- Tune your systems
- Listen more
- Implement strict change management
- Deploy QA clusters. Test everything before updating/upgrading production
- Conduct root cause analysis exercises on any failures
- Address RCA and Closed Loop Corrective Action reports
- Learn your lesson the first time. Reuse key learnings.
- Declutter. Don’t run unnecessary services or applications on production clusters
- Be persistent. Keep working at it.
So, what are the ideas and ways that you have learned to increase and improve your enterprise availability? Let us know!
-Cassius Rhue, VP, Customer Experience
Reproduced from SIOS