January 24, 2024 |
Ensuring Access To Critical Educational ApplicationsEnsuring Access To Critical Educational ApplicationsEducation and information technology (IT) are increasingly inextricable. Whether the IT in question is an application supporting a classroom whiteboard, the database supporting a university registration system, the learning management systems (LMS), or the building maintenance system controlling student access to the labs, dorms, and dining halls — if key components of your IT infrastructure suddenly go dark, neither teachers, administrators, nor students can accomplish what they are there to accomplish. The mission of the institution is interrupted. If the interruptions are too frequent, if the experiences of students, teachers, and administrators suffer, the reputation of the institution itself can suffer as well. An IT infrastructure designed to ensure the high availability (HA) of applications crucial to the educational experience can minimize the risk of disruption and reputational loss that could occur if for any reason these systems become unresponsive. In this instance, an HA infrastructure is defined as one capable of ensuring the availability of key applications no less than 99.99% of the time. Put another way, that means that your critical applications won’t be unexpectedly offline for more than four minutes per month. How do you achieve HA? That question is readily answered, but it is not the only question you need to ask. Just as important is this: Which applications are so critical that they warrant an HA configuration? At its heart, an IT infrastructure configured for HA has one or more sets of secondary servers and storage subsystems that are housed in a geographically distinct location (which could be a remote data center if your primary server resides on-premises or in a separate availability zone [AZ] if your servers reside in the cloud). If something causes the applications running on the primary server to stop responding, the HA software managing your application will immediately fail over the application to the secondary server, where your critical applications will start up again from the point at which the primary server stopped responding. Depending on the size and performance characteristics of the primary server you plan to replicate, that secondary server may be costly, so it’s unlikely you’re going to configure all your academic applications for HA. Once you determine which applications warrant the investment in HA, you’ll know where you need to build out an HA environment. Choices for Achieving High AvailabilityOnce you’ve chosen the applications you intend to protect, your options for achieving HA become clearer. Are they running on Windows or Linux? Does your database management system (DBMS) have built-in support for an HA configuration? If so, what are its limitations? If your critical applications are running on Windows and SQL Server, for example, you could enable HA using the Availability Group (AG) feature of SQL Server itself. Alternatively, you could configure HA using a third-party SANless clustering tool, which offers options that the AG services in SQL Server do not. If you’re trying to protect database servers from multiple vendors, or if some of your critical applications run on Windows while others run on Linux, your ability to manage HA will be facilitated by the use of an HA solution that supports multiple DBMS and OS platforms. Opting for a cluster solution that accommodates diverse DBMS and OS platforms simplifies management, in contrast to the potential complexity and cumbersomeness of handling multiple database-native HA services concurrently.. Ensuring High Availability via database-native HA solutionsIf you’re using a database-native HA solution, such as the AG feature of SQL Server, the software will synchronously replicate all the data in your primary SQL Server database to an identical instance of that database on the secondary system server. If something causes the primary server to stop responding, the monitoring features in the AG component will automatically cause the secondary server to take over. Because the AG feature has replicated all the data in real time, the secondary server can take over immediately and there is virtually no interruption of service or loss of data. Many database-native HA tools operate in a similar manner. There are a few caveats, though, when considering a database-native approach: If the HA services are bundled into the DBMS itself, they may replicate only the data associated with that DBMS. If other critical data resides on your primary server, that will not be replicated to the secondary server in a database-native HA scenario. There may be other limitations on what the database-native services will replicate as well. If you use the Basic AG functionality that is bundled into SQL Server Standard Edition, for example, each AG can replicate only a single SQL database to a single secondary location. You could create multiple Basic AGs if your applications involve multiple SQL databases, but you cannot control whether each AG fails over at the same time in a failover situation — and problems may arise if they do not. One way around this limitation would be to use the Always On AG functionality bundled into SQL Server Enterprise Edition, which enables the replication of multiple SQL databases to multiple secondary servers, but that can get very expensive from a licensing perspective if your applications don’t otherwise use any of the features of SQL Server Enterprise Edition. Other database-native HA solutions may have similar constraints, so be sure to understand them before investing in such an approach. Ensuring High Availability via SANless ClusteringAs an alternative to the database-native approach to HA, you could use a third-party tool to create a SANless cluster. Just as in the AG configuration described above, the SANless clustering software automates the synchronous replication of data from the primary to the secondary server; it also orchestrates the immediate failover to the secondary server if the primary server becomes unresponsive. Because failover takes only seconds, administrator, faculty, and student access to your critical applications will remain virtually uninterrupted. The critical differences between the SANless clustering and a database-native approach lie in the practical details. The SANless clustering approach is database agnostic. It replicates any data on a designated storage volume. That could include multiple databases from multiple vendors, text files, video files, or any other educational asset whose availability is important. This can save an institution a considerable amount of money if a database-native approach to HA would otherwise require an upgrade to a more expensive edition of the database. Finally, as noted earlier, if you are trying to protect applications and data running in multiple operating environments, a SANless clustering approach may be more manageable than individual database–native approaches. You can use SANless clustering to ensure HA in either Windows or Linux environments, which can eliminate the complexities that could accompany the deployment of database-native approaches that differ among operating environments. Reproduced with permission from SIOS
|
January 19, 2024 |
Webinar: Disaster Recovery in the Cloud: Understanding Challenges and Strategies for SQL ServerWebinar: Disaster Recovery in the Cloud: Understanding Challenges and Strategies for SQL ServerRegister for the On-Demand WebinarEnsuring high availability (HA) and disaster recovery (DR) in the cloud could be a challenge for many organizations. Challenges associated with HA/DR in the cloud include the intricacies of utilizing various tools across different cloud vendors, data sovereignty considerations, compliance challenges, and ongoing cost management. This webinar will discuss ways to address those challenges, emphasizing the importance of redundancy and failover for uninterrupted services and data protection and explore common misconceptions about cloud resilience and the need for a robust backup and DR strategy. Reproduced with permission from SIOS
|
January 14, 2024 |
Build High Availability with a HANA 3-Node HSR Cluster in AWS Using SIOS LifeKeeperBuild High Availability with a HANA 3-Node HSR Cluster in AWS Using SIOS LifeKeeperIntroduction: How to Ensure HA and DR in Your DatabaseCreating a highly available SAP HANA environment in AWS is a critical task for many businesses. This guide provides a detailed walkthrough for setting up a 3-node HANA System Replication (HSR) cluster using SIOS LifeKeeper in AWS, ensuring database resilience and high availability. Prerequisites
Step 1: Preparing Your AWS EnvironmentEC2 Instance DeploymentDeploy three EC2 instances in AWS. These instances will act as your HANA cluster’s primary, secondary, and tertiary nodes. Ensure they meet the hardware and software requirements for SAP HANA and SIOS LifeKeeper. Make sure you follow the SAP HANA sizing guidelines when building your instance. Network ConfigurationConfigure your VPC, subnets, and security groups to allow communication between the nodes and to enable access to necessary services. When configuring HANA nodes in different regions, you can protect the DNS name using the SIOS LifeKeeper for Linux Route53 Application Recovery Kit or ARK. Following is the architecture for a 3 node HANA database in AWS: When setting up the storage use separate EBS volumes for /usr/sap, /hana/data, /hana/log and /hana/shared. We have 2 VPCs one for each region. We need to setup peering between the VPCs and add routes to the routing table to ensure the servers can talk to each other. We also need to modify the security group to allow traffic between the servers. Finally we need to create a hosted zone containing both VPCs and add records for the domain and hostname we will use to communicate with the active HANA node.
Step 2: Installing and Configuring SAP HANAInstallation on Each NodeInstall SAP HANA on each EC2 instance. Ensure that the versions are consistent across all nodes to avoid compatibility issues. This is by far the most challenging process. Start by determining your installation settings. For mine, I am using the following: SID: D11
Local Instance Storage:
*For this installation, this is storage that is not shared between these HANA database servers. If you try to use shared storage, you will not be able to create an identical server because hdblcm will prevent the installation with an error about the SID and instance already existing. Install the HANA server software on each node independently as if it were a standalone system. Make sure all required libraries are installed, for RHEL 8 they are in SAP note 2772999. You will need to make sure you create the symbolic link after installing compact-sap-c++-9-9.1.1-2.3.el7_6.x86_64.rpm by running: ln -s /opt/rh/SAP/lib64/compat-sap-++-10.so /usr/sap/lib/libstdc++.so.6
Create partitions, format storage and attach it. Create your swap file. I create RSA keys on all my hosts and then allow the root ssh login between the hana nodes by adding the public key to the .ssh/authorized_keys file. This will make installation much easier. Mount your HANA installation media volume.
Run hdblcm from the correct hana installation media directory. Once you have successfully installed HANA on all nodes you are ready for the next step. System Replication SetupYou will need to take a backup prior to enabling HSR:
Repeat the backup process above on all nodes. Configure HANA System Replication on each node: Start the HDB instance on primary HANA System if it isn’t already running: sapcontrol -nr <instance number> -function StartSystem HDB [ie: sapcontrol -nr 11 -function StartSystem HDB] Start the HSR at primary site: hdbnsutil -sr_enable –name=<primary site name> [ie. hdbnsutil -sr_enable –name=sapdemohana1 Stop the HDB instance on secondary HANA System: sapcontrol -nr <instance number> -function StopSystem HDB [ie. sapcontrol -nr 11 -function StopSystem HDB] In the additional HANA systems, backup the KEY and DAT files and copy the primary KEY and DAT files to the required locations:
Make sure the owner of the key and dat files are <SID>adm sapsys:
Register the additional HANA systems with primary HANA system – must be done as the admin user:
[ie. hdbnsutil -sr_register –name=sapdemohana2 –remoteHost=sapdemohana1 –remoteInstance=11 –operationMode=logreplay –replicationMode=sync] Check HSR status on all systems, run the following command as the admin user: d11adm@sapdemohana4:/usr/sap/D11/HDB11>hdbnsutil -sr_state Once all systems are online you can move onto the next step.
Step 3: Installing SIOS LifeKeeperAWS CLI InstallationInstall AWS CLI and configure it with a key with the following permissions: Route Table (backend) configuration:
LifeKeeper InstallationInstall SIOS LifeKeeper on each node. This involves running the installation script and following the setup wizard, which guides you through the necessary steps. For this installation, I am using the networking, Route53 ARK and the database, SAP HANA ARK along with the witness functions. Edit the /etc/selinux/config file and disable selinux: I also changed my hostname and edited the /etc/hosts file. Finally edit the /etc/default/LifeKeeper file and add /usr/local/bin to the PATH: Change NOBCASTPING=1: I also changed the QUORUM_LOSS_ACTION to osu: Make sure you have Xwindows working. I remove the cp alias from .bashrc and add /opt/LifeKeeper/bin and /usr/local/bin to my .bash_profile along with copy the ec2-users .Xauthority file to root and the <SID>adm home directory so that Xwindows will work: I change the root password and reboot. Prior to launching the LifeKeeper GUI. make sure that HSR is online on all nodes and all nodes are registered: ConfigurationLaunch the LifeKeeper GUI: lkGUIapp and login with the root user and password: Click on the connect button to login to the additional nodes in the cluster: Once logged into all the nodes click on the Create Comm Path button: Hit next when it asks for the Local Server and then hold shift and select all the nodes: hit Accept Defaults and hit done when it is complete. Click on the Create Comm path button again and this time change to the second node: hit next and select the 3rd node: hit the next button until you can hit the Accept Defaults button. When complete hit done. Now click on the Create Resource Hierarchy button: Select the IP kit and hit next: Hit next until you get to the IP resource page. Here enter 0.0.0.0 and hit next: Hit next until you get to the Create button. Hit the Create button: When it is complete hit next: Hit Accept Defaults with the Target Server showing the second node: When complete hit Next Server: Hit Accept Defaults with the 3rd node showing and when complete hit Finish: Hit done: Now we have an IP resource we can add our Route53 resource which will change the dns entry to resolve the fqdn to the active nodes IP address. In this case saphana.sapdemo will resolve to the ip address of sapdemohana1 (172.31.0.25). Hit the Create Resource Hierarchy button to start the process: Select Route53 and hit next: Keep hitting next until you get to the Domain Name. It should prepopulate with the active hosted zone name. Hit Next. Enter the Host Name that everything will use to connect to the HANA database and hit next: hit next until you get to the create button and click the create button. When done hit Next: At the Pre-Extend Wizard hit Accept Defaults: When done hit Next Server: The Target Server will show the 3rd node. Hit Accept Defaults: Hit Finish when done. Then hit Done. You can then expand the tree. Open a terminal session to the 2nd node and ping the fqdn for the HANA database [ie. ping -c3 saphana.sapdemo] Right click on the top standby under sapdemohana3 and select In Service: Hit In Service on the next screen and then hit Done when it is complete: Go to the terminal window and repeat the ping test: You can see that the hostname now resolves to sapdemohana3. Put sapdemohana1 back into service before moving onto the next step. Step 4: Integrating SAP HANA with SIOS LifeKeeperResource Hierarchy CreationUsing the LifeKeeper GUI, create a resource hierarchy for SAP HANA on each node. This setup is crucial for managing failover and recovery processes. Make sure that HSR is active on node1 and the additional nodes are registered: Click on the Create Resource button: Select the SAP HANA recovery kit and hit next until you get to the IP Address screen: Select none and hit next: Hit next until you get to the Create screen and hit Create: After creation hit next and then Accept Defaults for node2: Again when node2 is complete hit Next Server and Accept Defaults: When complete hit Finish, then hit Done: Right click on the Hana Hierarchy and select Create Dependency: For the child Resource Tag select the route53 resource from the pulldown and hit next: Click on Create Dependency: Click on Done. Then select view Expand Tree: If everything is Green we are ready to test. Step 5: Testing and ValidationFailover/Switchover TestingConduct thorough failover tests to ensure that the system correctly switches over to the secondary or tertiary node in case of a primary node failure. This testing should include scenarios like network failures, hardware issues, and software crashes. The first test we will perform is a switchover which would be used to perform maintenance activities or if you had a scheduled outage. Right click on the 2nd node and select In Service – Takeover with Handshake… Hit Perform Takeover: This test will switch to the 2nd node with the minimal downtime to users. When the 2nd node is up and running hit finish: After some time node1 will come back into standby – In Sync. Now we can perform a failover test. Open a terminal to node 2 and type echo c > /proc/sysrq-trigger to simulate a system crash. You will see node 1 take over because it has the highest priority of 1: Eventually, everything will go back to normal: There are a number of additional types of failure scenarios you may wish to test. Just ensure that your standby nodes are in sync prior to starting your testing. Data Synchronization VerificationVerify that data is correctly replicating across all nodes. Consistent data across nodes is crucial for the integrity of the HSR setup. Performance MonitoringRegularly monitor the performance of the SAP HANA instances and the LifeKeeper setup. Check for any anomalies or issues that could indicate potential problems. Check the /var/log/lifekeeper.log file to ensure that everything is performing as expected. You may need to adjust the Heartbeat timer and number of heartbeats missed based on the network performance. These can be configured in the /etc/default/LifeKeeper file. The tunables are LCMHBEATTIME and LCMNUMHBEATS. You can also check the status of Lifekeeper from the command line with the command lcdstatus -q. ConclusionSetting up a 3-node HANA HSR cluster in AWS with SIOS LifeKeeper involves detailed planning and execution. By carefully following these steps, you can establish a robust, resilient, and highly available SAP HANA environment in the cloud, ensuring your critical data remains accessible and secure. SIOS LifeKeeper for Linux makes the administration, monitoring, and maintenance of SAP HANA quick and easy. SIOS provides resources and training for all our products. Reproduced with permission from SIOS |
January 9, 2024 |
US National Capital Region Protects Critical Emergency Dispatch Services with SIOS DataKeeperUS National Capital Region Protects Critical Emergency Dispatch Services with SIOS DataKeeperSIOS DataKeeper Protects Critical CAD-EX CAD2CAD Software on SQL Server in AzureUntil recently, dispatchers used computer-aided dispatch (CAD) systems that presumed the likely whereabouts and availability of units in neighboring jurisdictions, but to request dispatch, they had to call those neighboring agencies on a dedicated phone line to validate actual unit location and availability. This process slowed response times and did not give first responders access to critical incident details that may be available from the dispatching agency. To improve efficiency, NCR agency leadership and Emerging Digital Concepts (EDC) collaborated to create the Data Exchange Hub (DEH) – a data exchange and functional interoperability platform designed to provide member NCR emergency services agencies with secure access to data and applications. They used the National Information Exchange Model (NIEM) to ensure the interoperability of systems, communications, and procedures. The DEH has become a national model of efficient regional cooperation in emergency response. Ensuring Efficient Emergency Dispatch Services for National Capital Region SIOS DataKeeper Protects Critical CAD-EX CAD2CAD Software on SQL Server in Azure. The primary DEH information exchange is the CAD-to-CAD (C2C) Exchange, which enables dispatchers in all NCR DEH agencies to immediately understand front-line resource locations, resource availability, and to share up-to-date information on related incidents in all C2C Exchange connected CAD systems in member jurisdictions. The NCR DEH C2C Exchange uses a Microsoft SQL Server database operating on Windows Servers and is deployed in the Azure Commercial Cloud. Given the critical nature of these systems, the DEH demanded high-availability data protection for the C2C Exchange platform Failover Clustering without Added Complexity In a traditional Windows high-availability (HA) environment involving a database, two or more database instance nodes are configured in a Windows Server failover cluster with shared storage (typically a SAN). The database is operational on the primary node with HA failover software monitoring its operation. If an issue is detected, the HA software orchestrates failover of the database operation to the standby, secondary node(s) in the cluster. In the cloud and other environments where shared storage is not available nor cost-effective, replication is used to create a SANless cluster by synchronizing local storage on each of the cluster nodes so that, in the event of a failover, the secondary node can continue to operate with current data. In the earlier stages of the project, the NCR IT team had deployed the C2C Exchange platform in a number of environments. This included an on-premises Fairfax County data center and subsequently in multiple third-party, national and local hosting provider environments. In these environments, the C2C Exchange database deployment architecture used Microsoft SQL Server Enterprise Edition and Always On Availability Groups. As the project expanded, the NCR IT team was driven to take advantage of advancing cloud technologies and to deploy the C2C platform to the Azure Commercial Cloud. The Cloud offered the flexibility and service levels needed to manage the C2C platform in a more virtual environment. The Azure Cloud also allowed the NCR to deploy a more cost-effective, high-availability, database clustering solution to deliver C2C Exchange application data confidently while simultaneously reducing the higher licensing costs associated with SQL Server Enterprise Edition. The SolutionThe NCR DEH C2C Exchange began employing the SIOS DataKeeper Cluster Edition software to create SANless clusters to protect their C2C Exchange data availability in Azure Commercial Cloud. The software runs on a two-node active-passive Windows Server Failover cluster configuration utilizing SQL Server Standard Edition with Always On Failover Clustering. The SIOS DataKeeper software uses bandwidth-efficient, host-based, block-level replication to synchronize local storage on all database cluster nodes. If an application availability issue is detected, operation is automatically moved to the secondary node with no manual intervention required. The service levels guaranteed by cloud vendors ensure hardware operability but they do not include software and networking-related causes of application downtime. The ResultsThe NCR DEH C2C Exchange has been using the SIOS DataKeeper Cluster Edition software in the Azure Commercial Cloud for more than two years. Participation in the interoperability program has grown. In addition to the initial members, the program now includes the Metro Washington Airports Authority (MWAA), Virginia Counties of Loudoun and Prince William, and Maryland Counties of Montgomery and Prince George’s. The C2C Exchange manages a few thousand shared units and shares data on hundreds of thousands of incidents per year between these participants. Establishing the high-availability database cluster in the Cloud was fast and straightforward using SIOS DataKeeper Cluster Edition. “We simply installed SIOS DataKeeper to our Windows Server Failover Cluster nodes, configured the local node storage as SIOS managed storage for replication, and it operated seamlessly,” said Greg Crider, EDC Chief Technical Officer and Co-Founder. “An added benefit of the SIOS DataKeeper clustering software is that it enables us to perform regular, rolling software maintenance on the database by transitioning cluster nodes, on-demand, without the need for planned downtime or interruption of service.” Since implementing SIOS DataKeeper in the NCR C2C Exchange, there have been no downtime issues involving the database or data loss between nodes. Chris Wiseman, EDC President, CEO, and Co-Founder adds, “There have been a few unexpected, uncontrollable networking issues, however, the database failed over quickly and C2C Exchange operation has continued without end users being impacted by prolonged reduction in service. The SIOS DataKeeper software enables us to deliver higher levels of data protection and delivery without needing the more costly SQL Server Enterprise Edition licensing. That adds up to significant, continuing annual savings for our stakeholders.” The NCR C2C Exchange with SIOS DataKeeper protection is featured by the DHS/SAFECOM in a video link). It was recently put to the tested when three large fires occurred simultaneously at a bank, an apartment complex and a nursing home. The interoperability between CAD systems played a critical role in ensuring fast, efficient response to these incidents. EDC is broadening C2C Exchange adoption around the United States in other markets with its commercially available NG-CAD-X C2C Exchange product. This functionally advanced C2C offering is being implemented by the City of Denver North Central All-Hazards Region and by greater Southeast Florida. NG-CAD-X is message compatible with the NCR C2C Exchange, deployed in the Azure Government Cloud for CJIS-compliance and law enforcement adoption in addition to fire and EMS ESF, and also implements SIOS DataKeeper Cluster Edition into its database architecture for all of the operational and cost-effective reasons highlighted above. “Strategic partnerships play an important role in providing our customers with the best technology solutions in the marketplace. SIOS DataKeeper in an integral part of our system and is a valuable EDC partner.” said Kevin Konczal, Vice President, EDC. Reproduced with permission from SIOS |
January 5, 2024 |
Webinar: Secure your SAP and SAP S/4HANA on Azure: Disaster Recovery Best PracticesWebinar: Secure your SAP and SAP S/4HANA on Azure: Disaster Recovery Best PracticesIn today’s digital landscape, securing critical business applications such as SAP and SAP S/4HANA is paramount to protect against potential disasters that could impact business continuity. Leveraging the power of cloud computing, Azure provides robust disaster recovery solutions for SAP and SAP S/4HANA environments. This on-demand symposium session discusses best practices for securing your SAP and SAP S/4HANA systems on Azure, including strategies for data replication, backup and restore, high availability, and failover. Harikrishna Madathala, Microsoft Senior Customer Engineer for Fast Track at SAP on Azure cloud, shares insights, practical tips, and real-world examples to help implement disaster recovery best practices to safeguard SAP and SAP S/4HANA deployments on Azure, ensuring the highest level of security, resilience, and availability for critical business applications. Reproduced with permission from SIOS |