March 11, 2023 |
Cloud Repatriation and HACloud Repatriation and HAThere is a small but growing media buzz about a phenomenon called “cloud repatriation”. In simple terms, cloud repatriation means taking your workload from the public cloud and bringing it back to your own data center. This move could potentially boost the demand for on-premises equipment, such as servers, storage, and networking gear. It could also ramp up the need for solutions that make it easy to manage both on-premises and cloud-based resources. For companies running critical workloads in the cloud, repatriation could have significant impact on the ways they deliver high availability protection. It’s worth noting that the impact of cloud repatriation on the high availability market depends on a few things, like why organizations are choosing to go back to on-premises data centers, as well as other industry trends and competition. So why might organizations opt to leave the cloud? Common Reasons for Cloud RepatriationCost: Running workloads in the cloud can be expensive and costs can be unpredictable, especially if an organization’s usage patterns and requirements change over time. Repatriating workloads back on-premises can help organizations reduce costs, particularly if they have unused capacity or can leverage existing infrastructure. It can also help make IT budgets more predictable. Data sovereignty: Some organizations may be subject to regulation that dictate what country their data is stored in, who has access to it, and how it is protected. Repatriating workloads can give organizations more control over their data and help them comply with data sovereignty laws and regulations. Security: Organizations may have security concerns about running workloads in the cloud, particularly if they handle sensitive data or are subject to strict regulatory requirements. While clouds have a variety of security measures, misconfguration is common and can result in security issues. By eliminating the need for cloud-specific knowledge, repatriating workloads can give organizations more control over their security posture. Latency: Cloud providers may be located far from the organization’s users, which can result in higher latency and slower response times. Repatriating workloads back on-premises can help organizations reduce latency and improve performance for their users. Control: While moving to the cloud saves companies the cost of IT infrastructure management, these savings come at the cost of control. Cloud providers manage and maintain the IT environments according their own schedules. Companies who repatriate their data centers regain complete control over their infrastructure, upgrades, updates, and maintenance. . Lack of Cloud Provider’s specific service or feature: Organizations may find that a particular service or feature is not available in the public cloud, and thus they might decide to repatriate the workload back on-premises. Please note that there could be other additional factors at play, but it’s crucial to keep in mind that these reasons may differ based on the organization’s industry and unique needs. High Availability in the Context of Public Cloud RepatriationFor years, the public cloud has been popular as businesses flock to cloud-based solutions for their computing needs. But according to a recent InfoWorld [link to article] article, we might see a shift in 2023 as companies start to bring their data and workloads back in-house or to private clouds. One major reason for this move is the desire for greater availability and control over infrastructure. High availability (HA) is a critical aspect of modern IT infrastructure, ensuring that applications and services remain accessible and operational even in the face of hardware failures, software bugs, or other unforeseen events. In a public cloud environment, high availability is typically achieved through a combination of redundant infrastructure and automatic failover mechanisms, such as load balancing and auto-scaling. However, some businesses may find that the level of control they have over their cloud infrastructure is limited, and they may have concerns about data security, compliance, and vendor lock-in. These concerns can lead to a desire to bring workloads and data back on-premises or to private clouds. How a Hybrid Cloud Model Can Solve ProblemsOne potential solution to these concerns is to adopt a hybrid cloud approach, where businesses leverage the best of both worlds by combining the scalability and flexibility of the public cloud with the control and security of on-premises or private cloud infrastructure. Hybrid cloud architectures can be designed to provide high availability by replicating data and services across multiple locations, both on-premises and in the cloud. Implementing a hybrid cloud architecture requires careful planning and design, with a focus on ensuring that workloads and data are distributed in a way that maximizes availability while minimizing latency and other performance issues. Some key considerations include selecting the appropriate cloud providers and on-premises infrastructure, ensuring that data is replicated and synchronized effectively, and designing failover mechanisms that can handle both planned and unplanned outages. Another important consideration is the need for effective monitoring and management of the hybrid cloud environment. This includes implementing automated monitoring tools to detect and respond to outages, ensuring that backups are regularly performed and tested, and establishing clear processes and procedures for handling incidents and disasters. SIOS High Availability SolutionsSo, while public cloud adoption has been on the rise for several years, concerns about control, security, and availability are leading some businesses to consider repatriating workloads and data to on-premises or private cloud environments. A hybrid cloud approach that combines the scalability and flexibility of the public cloud with the control and security of on-premises infrastructure can be an effective way to address these concerns while maintaining high levels of availability. In short, nailing a hybrid cloud setup takes serious prep work and know-how. Luckily, SIOS High Availability Solutions has got you covered. We invite you to learn more about our tools and services so you can confidently navigate your hybrid cloud journey. Reproduced with permission from SIOS |
March 7, 2023 |
Video: High Availability for State, local government, and education (SLED)Video: High Availability for State, local government, and education (SLED)In this video, Dave Bermingham, SIOS Director of Customer Success, discusses the company’s provision of high availability solutions to state, local government, and education (SLED) organizations. Dave highlights the importance of high availability for SLED organizations, specifically mentioning communication and collaboration tools used by emergency services, financial management systems, student information systems, and learning management systems, which all need to be constantly accessible. He highlights the key features that a high availability solution should have, such as being cost-effective, reliable, providing redundancy, maintaining high-performance levels, detecting failures and performing recovery actions, scalable, and integratable with existing systems and infrastructure. Bermingham gives two examples of SIOS’s SANless clustering solution in action. The first example is how they provided high availability at both the application and data center level to eliminate downtime during university enrollment. The second example is how they worked with an integrator to ensure the call center CAD system was highly available and able to dispatch police, fire, or rescue teams during multiple disasters. It’s important to consider adding a high availability clustering solution like SIOS that can address the application level high availability needs which can then contribute towards maintaining application performance. Reproduce with permission from SIOS |
March 2, 2023 |
8 Changes That Can Undermine Your High Availability Solution8 Changes That Can Undermine Your High Availability SolutionAs VP of Customer Experience I have observed that most organizations are conscious and careful of deploying any tools or processes that could have an impact on their businesses’ high availability. These companies typically go through great care with regards to HA including strict change vetting for any HA Clustering, DR, Security, or Backup solution changes. Most companies understand changes to these tools needs to be carefully considered and tested so as to avoid impact to the overall application availability and system stability. IT administrators are aware that even the most inconspicuous change in their HA Clustering, Disaster Recovery, Security, or Backup solution can lead to a major disruption. However, changes in other workplace and productivity tools are most often not considered with the same diligence. Eight changes that can be undermining your HA solution:
Your existing tools often encapsulate a lot of documentation around the company, decisions, integrations, and overall HA architecture. As teams transition to new tools, these documents are often lost, or access becomes blocked or hampered. Suggested Improvement: Export and import all existing documents into the new tool. Use archive storage and backups to retain complete copies of the data before the import.
Similar to the lost documents, requirements are often the first thing to be lost when transferring tools. Suggested Improvement: Document known requirements, export requirements related documents from any existing productivity tools.
Almost as important as the documentation and requirements is the history behind changes, revisions, and decisions. Many organizations keep historical information within workplace and office productivity tools. Such information could include decisions around tools and solutions that have been previously evaluated. When these workplace tools are changed or transitioned, this type of history can be lost. Existing tools often have a lot of tacit knowledge involved with them as well. As the new tools are integrated, that knowledge and mindshare disappears. Two decades ago our team migrated bug tracking solutions. The knowledge gap between the tools was huge and impacted multiple departments, including the IT team now tasked with managing, backing up, and resolving issues. Suggested Improvement: Be sure to adequately train and transfer mindshare and knowledge between new tools. Be sure that history, context and decisions around the current tools and previous tools are documented before terminating the current tool
Every new tool has a different set of security and access rules. Often in the transition teams end up with too many admins, not enough admins, or too many restrictions on permissions. Suggested Improvement: Map access and user controls, based on requirements and security rules, in advance and have a process for quick resolution.
Email and contact system migrations are rarely seamless. Even upgrades between existing versions can have consequences. One downside of a migration from one tool (Exchange to Gmail) could be lost contacts. Our team worked with a customer who once called our support team for help obtaining their partner contacts. Their transition for email systems had stalled and access to critical contacts was delayed. Suggested Improvement: Plan for contact migration and validation. Be sure that any critical contacts for your HA cluster are definitely a part of a validated migration step.
Broken integrations is a very common item that impacts high availability, monitoring and alerting. As companies move towards newer productivity tools, existing integrations may no longer work and require additional development. As a relative example, a company previously using Skype for messaging, moved away to Slack. Many of the tools that delivered messages via Skype needed to be adjusted. In your HA environment, a broken integration between dashboards or alert systems could mean critical notifications are not received in a timely manner. Suggested Improvement: Map out any automated workflows to help identify integration points between tools. Also work to identify any new requirements and integration opportunities. Plan and test integrations during the proof of concept or controlled deployment phase.
Every tool set has a champion and a critic. The champion may or may not be the same as your administrator. The role of the champion changes within each organization and often with each tool, but what is common among them is their willingness to address issues, problems, or challenges with the new productivity tool for the benefit of themselves and others. The champion is the first to find the new features, uncover and report new issues, and help onboard new people to the toolset. Champions go beyond mindshare and history. Often with the changing of tool sets, your team will lose a champion.
New tools, even those not directly related to HA have an impact on your team’s productivity. Even tools related to priority management, development and code repositories require ramp up and onboarding time. This time often translates into lost productivity, which can translate into risks to your cluster. Make sure your processes related to all of the existing and new tools are documented well so that the change to a new tool does not cause confusion, break process flow and lead to even greater losses of productivity. Suggested Improvement: Reduce the risk of lost productivity by using training tools, leveraging a product champion, and making sure that rollout focuses on shortening the learning curve Changing workplace productivity tools so that you don’t undermine your high availability solution requires capturing requirements, identifying key documents, transferring mindshare, mapping dependencies, testing and configuring proper access, identifying a toolset champion. It’s making sure that your new tools actual improve productivity rather than pull your key resources away from maintaining uptime. Cassius Rhue, VP Customer Experience Reproduced with permission from SIOS |
February 28, 2023 |
High Availability Options for SQL Server on Azure VMsHigh Availability Options for SQL Server on Azure VMsMicrosoft Azure infrastructure is designed to provide high availability for your applications and data. Azure offers a variety of infrastructure options for achieving high availability, including Availability Zones, Paired Regions, redundant storage, and high-speed, low-latency network connectivity. All of these services are backed by Service Level Agreements (SLAs) to ensure the availability of your business-critical applications. This blog post will focus on high availability options when running SQL Server in Azure Virtual Machines. Azure InfrastructureBefore we jump into the high availability options for SQL Server, let’s discuss the vital infrastructure that must be in place. Availability Zones, Regions, and Paired Regions are key concepts in Azure infrastructure that are important to understand when planning for the high availability of your applications and data. Availability Zones are physically separate locations within a region that provides redundant power, cooling, and networking. Each Availability Zone consists of one or more data centers. By placing your resources in different Availability Zones, you can protect your applications and data from outages caused by planned or unplanned maintenance, hardware failures, or natural disasters. When leveraging Availability Zones for your SQL Server deployment, you qualify for the 99.99% availability SLA for Virtual Machines. Regions are geographic locations where Azure services are available. Azure currently has more than 60 regions worldwide, each with multiple Availability Zones. By placing your resources in different regions, you can provide even greater protection against outages caused by natural disasters or other significant events. Paired Regions are pre-defined region pairs that have unique relationships. Most notably, paired Regions replicate data to each other when geo-redundant storage is in use. The other benefits of paired regions are region recovery sequence, sequential updating, physical isolation, and data residency. When designing your disaster recovery plan, it is advisable to use Paired Regions for your primary and disaster recovery locations. Using Availability Zones and Paired Regions in conjunction with high availability options such as Availability Groups and Failover Cluster Instances, you can create highly available, resilient SQL Server deployments that can withstand a wide range of failures, minimizing downtime. SQL Server Availability Groups and Failover Cluster InstancesSQL Server Availability Groups (AGs) and SQL Server Failover Cluster Instances (FCIs) are both high availability (HA) and disaster recovery (DR) solutions for SQL Server, but they work in different ways. An AG is a feature of SQL Server Enterprise edition that provides an HA solution by replicating a database across multiple servers (called replicas) to ensure that the database is always available in case of failure. AGs can be used to provide HA for both a single database and multiple databases. SQL Server Standard Edition supports something called a Basic AG. There are some limitations to Basic AGs in SQL Server. Firstly, a Basic AG only supports a single database. You need an AG for each database and the associated IP address and load balancer if you have more than one database. Additionally, Basic AGs do not support read-only replicas. While Basic AGs provide a simple way to implement HA for a single database, they may not be suitable for more complex scenarios. On the other hand, a SQL Server FCI is a Windows Server Failover Cluster (WSFC) that provides an HA solution by creating a cluster of multiple servers (called nodes) that use shared storage. In the event of a failure, the SQL Server instance running on one node can fail over to another. In SQL Server 2022 Enterprise Edition, the new Contained Availability Groups (CAG) address some of the AG limitations by allowing users to create system databases to CAG, which can then be replicated. CAG eliminates the need to synchronize things like SQL logins and SQL Agent jobs manually. Availability Groups and Failover Cluster Instances have their own pros and cons. AGs have advanced features like readable secondaries and synchronous and asynchronous replication. However, AGs require the Enterprise Edition of SQL Server, which can be cost-prohibitive, particularly if you don’t need any other Enterprise Edition features. FCIs protect the entire SQL Server instance, including all user-defined databases and system databases. FCIs make management easier since all changes, including those made to SQL Server Agent jobs, user accounts and passwords, and database additions and deletions, are automatically reconciled on all versions of SQL Server, not just SQL 2022 with CAG. FCIs are available with SQL Server Standard Edition, which makes it more cost-effective. However, FCIs require shared storage, which presents challenges when deploying in environments that span Availability Zones, Regions, or hybrid cloud configurations. Read more about how SIOS software enables high availability for SQL servers. Storage Options for SQL Server Failover Cluster InstancesRegarding storage options for SQL Server Failover Cluster Instances that span Availability Zones, there are three options: Azure File Share, Azure Shared Disk with Zone Redundant Storage, and SIOS DataKeeper Cluster Edition. There is a fourth option, Storage Spaces Direct (S2D), but that is limited to single AZ deployments, so clusters based on S2D would not qualify for the 99.99% SLA and would be susceptible to failures that impact and entire AZ. Azure File ShareAzure File Share with zonal redundancy (ZRS) is a feature that allows you to store multiple copies of your data across different availability zones in an Azure region, providing increased durability and availability. This data can then be shared as a CIFS file share, and the cluster connects to it using the SMB 3 protocol. Azure Shared DiskAzure Shared Disk with Zone Redundant Storage (ZRS) is a shared disk that can store SQL Server data for use in a cluster. SCSI persistent reservations ensure that only the active cluster node can access the data. If a primary Availability Zone fails, the data in the standby availability zone becomes active. Shared Disk with ZRS is only available in the West US 2, West Europe, North Europe, and France Central regions. SIOS DataKeeper Cluster EditionSIOS DataKeeper Cluster Edition is a storage HA solution that supports SQL Server Failover Clusters in Azure. It is available in all regions and is the only FCI storage option that supports cross Availability Zone failover and cross Region failover. It also enables hybrid cloud configurations that span on-prem to cloud configurations. DataKeeper is a software solution that keeps locally attached storage in sync across all the cluster nodes. It integrates with WSFC as a third-party storage class cluster resource called a DataKeeper volume. Failover Cluster controls all the management of the DataKeeper volume, making the experience seamless for the end user. Learn more about SIOS DataKeeper. SummaryIn conclusion, Azure provides various infrastructure options for achieving high availability for SQL Server deployments, such as Availability Zones, Regions, and Paired Regions. By leveraging these options, in conjunction with high availability solutions like Availability Groups and Failover Cluster Instances, you can create a highly available, resilient SQL Server deployment that can withstand a wide range of failures and minimize downtime. Understanding the infrastructure required and the pros and cons of each option is essential before choosing the best solution for your specific needs. It’s advisable to consult with a SQL and Azure expert to guide you through the process and also review the Azure documentation and best practices. With the proper planning and implementation, you can ensure that your SQL Server deployments on Azure are always available to support your business-critical applications. Contact us for more information about our high availability solutions. Reproduced with permission from SIOS |
February 24, 2023 |
Exploring High Availability Use Cases in Regulated IndustriesExploring High Availability Use Cases in Regulated IndustriesWhile downtime in business-critical systems, databases, and applications imposes costs on every organization, different industries have different consequences associated with unplanned downtime. In this post, we explore high availability use cases and SIOS customer success stories in the financial services, healthcare, manufacturing, and education industries. High Availability for Financial ServicesRanging from small credit unions to regional banks to global investment firms, financial services is a highly regulated, fast-paced industry in which billions of dollars in electronic transactions occur every second. Thus, the average cost of downtime ($300,000 for a single hour of downtime according to the ITIC 2021 Hourly Cost of Downtime Survey) can be significantly higher for a financial services firm than for other industries. Large Financial Services Firm Adds High Availability/Disaster Recovery for Critical Securities Applications on Oracle DatabasesOne of the oldest financial services firms in China provides securities and futures brokerage as well as investment banking, asset management, private equity, alternative investments, and financial leasing services. It is listed on both the Shanghai and the Hong Kong Stock Exchanges. The firm has 343 branches spanning 30 provinces, municipalities, and autonomous regions in the Peoples Republic of China. It also operates across 14 countries and major international cities, serving approximately 18 million customers. THE ENVIRONMENT THE CHALLENGE THE EVALUATION THE SOLUTION THE RESULTS High Availability for HealthcareDowntime for applications and storage in the healthcare industry can literally be a matter of life and death. It’s imperative to assure reliable access to critical systems used in hospitals and surgery centers, as well as electronic health records (EHR) and medical imaging technology such as picture archiving and communication systems (PACS). The healthcare industry has also increasingly been targeted in ransomware attacks, leading to significant downtime. Lifehouse Hospital Ensures High Availability in Amazon Web Services with SIOS DataKeeperChris O’Brien Lifehouse Hospital (www.mylifehouse.org.au) specializes in state-of-the-art research and treatment of rare and complex cancer cases. The not-for-profit hospital sees more than 40,000 patients annually for screening, diagnosis, and treatment. THE CHALLENGE Lifehouse chose Amazon Web Services (AWS) and had hoped to “lift and shift” its environment directly to the AWS cloud. To simulate its on-premises configuration, Singer chose a “cloud volumes” service available in the AWS Marketplace. Failover clusters were configured using Amazon FSx software-defined storage volumes to share data between active and standby instances. However, the software-defined cloud volumes had a substantial adverse impact on throughput performance. With the “No Protection” option, the cloud volumes performed well, but “no protection” wasn’t really an option for the mission-critical MEDITECH application and its database. THE SOLUTION THE RESULTS ![]() High Availability for ManufacturingThere’s a great deal of attention on the supply chain today. Although much of the recent focus has been on logistics issues (such as congestion in the Port of Los Angeles) and cyberattacks (such as the Colonial Pipeline ransomware attack), many critical challenges extend further up the supply chain. For more than 50 years, lean manufacturing (also known as just-in-time inventory management) has been a hallmark of efficiency in the manufacturing industry. However, “just-in-time” means exactly that and there’s no room for system or application downtime in manufacturing. SIOS DataKeeper Cluster Edition Protects Van de Lande Data SystemsVan de Lande BV (VDL) specializes in the manufacture of PVC-U and PE pressure fittings and valves for plastic piping systems, both made from tube and injection molded. Its products are used all over the world in industrial and technical installations. What sets VDL apart is its impressive range of product types and sizes, and its continuous commitment to product improvement and enhancement. As a result, VDL has been the brand of choice for builders of systems and installations for more than 50 years. THE CHALLENGE Before implementing the SIOS DataKeeper solution, VDL relied on shared storage (SAN) for its main storage. To improve performance, they decided to move to local storage based on solid-state disk (SSD) instead of traditional spinning disks. However, VDL relies heavily on the availability of its ERP database. With only one primary data processing system, VDL needed a reliable, comprehensive disaster recovery solution to ensure the availability of its systems in the event of a sitewide disaster. To prevent downtime, the company needed its servers to replicate data to a backup server for disaster protection. If one server fails, the other server takes over operation. This failover process sustains operations, maximizes uptime, and enables user productivity. THE SOLUTION VDL uses SIOS DataKeeper Cluster Edition software to ensure continuous availability of applications, databases, and web services. SIOS DataKeeper software integrates with WSFC to create a “mirrored” server system between two Windows cluster nodes. If the primary node fails, WSFC transfers all operations to the other node while enabling continuous access to applications and data (which is protected at the volume level). SIOS DataKeeper software enables disaster recovery without the long downtime and recovery time associated with traditional backup and restore technology. SIOS DataKeeper works with Microsoft WSFC to monitor system and application health, maintain client connectivity, and provide uninterrupted data access, giving VDL the reliable, fault-resilient system the company needed. SIOS DataKeeper Cluster Edition further extends the capabilities of Microsoft Cluster Services and Windows Server Failover Clustering. SIOS DataKeeper Cluster Edition also supports real-time replication of Hyper-V virtual machines between physical servers across either LAN or WAN connections. For companies like VDL, SIOS DataKeeper Cluster Edition software reduces the cost of deploying clusters by enabling them to create a SANless cluster that eliminates the cost, complexity, and single point of failure risk of a SAN in a traditional shared storage cluster. The cluster implementation ran smoothly and took less than a day. Following a thorough evaluation of the VDL server configuration and testing, the installation team found that the SANless cluster with SIOS DataKeeper Cluster Edition software met all of their criteria for disaster recovery, performance, and high availability. During the system failover test, the network services team easily failed over and failed back the system quickly and easily. THE RESULTS One two-node cluster works as a file server and iSCSI server, while the other supports a SQL Server cluster and Dynamics NAV web services. The IT infrastructure consists of three Hyper-V hosts with 60 VMs installed on it, one BackupExec server, 50 desktop users, and 25 mobile barcode scanners, which are connected via web services to the ERP system. Every host contains 240GB SSDs in a RAID 60 configuration with a total of 3TB local storage. The systems are connected through 10 Gigabit interfaces. High Availability for EducationIn the wake of the global pandemic, distance learning has become a key teaching format in postsecondary education, as well as primary and secondary education. In postsecondary education, distance learning enables global outreach for colleges and universities to attract a diverse student body. Thus, uptime has become increasingly important in education, with students and professors requiring access to various systems including library databases, student records, and high-performance computing (HPC)—for example, to support medical research, testing applications, and more. Downtime can also be costly as students (potentially from around the world) rush to register online, vying for limited class space. Major University Gives SIOS LifeKeeper for Linux Top MarksWhen a leading university in New York decided to revamp its enterprise resource planning (ERP) system, it hoped to improve performance, especially during peak registration periods, and reduce overall total cost of ownership (TCO). The university serves more than 10,000 students and uses an Oracle database to maintain all the information for student registration on an HP/UX SAN-based storage environment with replication of its full SAN architecture between two fabrics in a cluster. THE CHALLENGE THE SOLUTION Employing servers configured with SanDisk Fusion ioMemory-based IO accelerators, the university could replace its bulky and expensive SAN-based setup with a streamlined server set running Linux. The integration of ioMemory-based IO accelerators with both the servers and SIOS LifeKeeper for Linux offers better performance and availability than traditional legacy solutions, making the SIOS solution a perfect fit. THE RESULTS Learn more about SIOS high availability solutions. Reproduced with permission from SIOS |