January 5, 2023 |
Creating a HA Oracle Database server cluster in AWSCreating a HA Oracle Database server cluster in AWSIntroduction As a developer tasked with creating a POC for a business critical application that requires a highly available (HA) instance of Oracle I need to set up an Oracle EC2 HA cluster in AWS EC2. Where do you start? If you are like most of us you will spend endless hours googling your next task, reading articles, installation guides, documentation and questions on stack overflow. You will find lots of almost right answers, but they never quite fit your version or environment. Worse you go down a rabbit hole and end up wasting days building an environment that will not work. I am going to structure a series of blogs that focus on setting up HA environments for developing Proof of Concepts using the various SIOS HA solutions like: DataKeeper, LifeKeeper, and SIOS Protection Suite. If you have an immediate need that I have not yet covered, let me know, and I will move your configuration up in my backlog. Thank you for reading this. I hope it makes your life easier. I have a list of tasks below that you can just run through if you already are familiar with how to accomplish those tasks. Then below is a step-by-step guide for performing each task. AWS HA Oracle database SIOS Protection Suite for Linux
1. Launch 2 instances of Oracle on LinuxIn this first blog we are going to set up a HA environment in AWS for an Oracle Cluster using SIOS LifeKeeper for Linux. This means getting all the prerequisites out of the way. I will be using the aws-marketplace/Oracle Database 19.8.0 Enterprise Edition on Oracle Linux 8 AMI. These change frequently and it can be difficult to find the correct one that will fit your needs. This AMI was my 3rd attempt because installing anything, especially something like Oracle, in the cloud is very difficult due to the repositories, licensing, enrollment and security issues. This AMI actually works because Oracle is already installed on the image. Make sure the OS version and the Oracle DB version are supported by SIOS. That can be checked here. My instance has:
I am attaching an additional disk to the instance for the database and an additional NIC for redundant communication paths. Make sure the two NICs are on different subnets. This will also mean you will have to manually create and assign Elastic IP addresses in order to connect to the instances. Connect to the instance and mount the additional disk. I am using Putty and Xming to connect with my instance. If using Xming make sure to run Xlaunch prior to trying to make a connection. After launching the instance, you will need to partition the new disk. It is easiest to find by [ ls /dev/disk/by-path ]: Now you need to partition the disk with fdisk: Next create the file system on the new partition with mkfs.xfs: We will now mount the file system with mount: Finally we will add the entry to automatically mount the disk in fstab: It is important to note that you do not need to run the install for Oracle. The AMI has done that and created a database for you. I deleted the database that is pre-configured with this AMI and created a new one on the /data disk using DBCA. I started up the database and I created a schema and added data using SQLPLUS. This all requires that you get Xwindows working. 2. Get Xwindows workingXdisplay using Putty can be setup using Xming for Windows. Install Xming first. Then ensure that you enable X11 forwarding, enter localhost:0.0 in the x display location and the path and xming.exe executable in the x authority file for local display: That takes care of the Windows side, but you still need to fix the Linux side. First edit /etc/ssh/sshd_config and uncomment “X11Forwarding yes”. Finding and adding the correct key to Xauthority is next. You may have to start a new session if you have done any user switching. After logging in as ec2-user run xauth list which will provide you the hex key you need to add to your Xauthority file. Switch to the oracle user: su – oracle. Then run xauth add $DISPLAY . <hexkey copied from xauth list> . This stores the information into the /home/oracle/.Xauthority file. Exit back to ec2-user. 3. Connect to instance and mount additional diskI am using Putty and Xming to connect with my instance. If using Xming make sure to run Xlaunch prior to trying to make a connection. After launching the instance you will need to partition the new disk. It is easiest to find by [ ls /dev/disk/by-path ]: Now you need to partition the disk with fdisk: Next we create the file system on the new partition with mkfs.xfs: At this point we want to rename the /u01 to the /oracle directory so that we can mount the new filesystem on /u01 which is where Oracle resides on our server built with the AMI. Create the mount point with mkdir /u01 and mount the volume with mount. Move the files to the new disk with mv /oracle /u01. This will take some time because it is approximately 11GB of data. Finally we will add the entry to automatically mount the disk in fstab: It is important to note that you do not need to run the install for Oracle. The AMI has done that and created a database for you. I started up the database, created a schema, and added data using SQLPLUS. 4. Install AWS cli kitWe need the awscli kit; so, while we are root download the file with curl “https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip” -o “awscliv2.zip” Unzip the file with unzip awscliv2.zip Install the application with sudo ./aws/install Next setup the Access Key in AWS by clicking on your account on the top right of the console and then select Security Credentials Click on Create access key: Then click on Download .csv file: Transfer this file onto your servers and the configure AWS using the Key ID and access key from your csv file with the aws configure command: Test that it is working with something like: aws –no-paginate –no-cli-pager ec2 describe-instances 5. Configure Security/AccessFirst, I added the Oracle user to the root and wheel group giving it pseudo privileges (Usermod -aG wheel oracle). This will make life easy by making the Oracle account the lkadmin account. I downloaded the sps.img and license files onto both servers. Before installing the software there are a few more prerequisite steps that need to be done. First configure the security group for the servers so that they can communicate by opening up TCP ports 5900-59010. Open TCP ports 81 and 82 as well. Also make sure that the ports are open for the Virtual IP. 6. Create route entry for the virtual IPThe route table will need to be updated in order for the cluster’s Virtual IP to work. In this multi-subnet cluster configuration, the Virtual IP needs to live outside the range of the CIDR allocated to your VPC. Define a new route that will direct traffic to the cluster’s Virtual IP (172.30.0.101) to the primary cluster node (Oracle1) From the VPC Dashboard, select Route Tables, click Edit. Add a route for “172.30.0.101/32” with a destination of the primary Elastic Network Interface (ENI) on the primary server: 7. Disable Source/Destination Check for ENI’sUnder the Network Interfaces select each interface one at a time and then Under Actions select change source/dest. As long as you don’t get an authentication error, it is installed and configured correctly. Uncheck the Enable box: Repeat for all interfaces. 8. Edit /etc/hostsUnless you already have a DNS server setup, you’ll want to create host file entries on both servers so that they can properly resolve each other by name. 9. Configure the Listener with the VIP hostnameEdit or create the $ORACLE_HOME/network/admin/listener.ora file to point to the oracle-vip: 10. Disable SELinuxEdit the /etc/sysconfig/selinux file and set “SELINUX=disabled” Reboot the server(s). If at this point the server does not come back up, it is possible that you left the SELINUX setting at permissive and set SELINUXTYPE to disabled, which will brick the instance. Just disassociate the volume in AWS from your instance and mount it with the mount -o rw, nouuid {device} {mount directory} command to a new or existing working instance. Edit the /{mount directory]/etc/sysconfig/selinux file and correct the error. Save the file, unmount and disassociate the volume with this instance and re-attach it to the old instance. 11. Install SIOS Protection Suite for LinuxNext, as root I installed SIOS protection suite by mounting the image file with mount /home/ec2-user/sps.img /mnt/ -t iso9660 -o loop. Run the setup with /mnt/setup: Under LifeKeeper Authentication I scroll down to the lkadmin group, hit enter and add oracle to the ‘lkadmin’ group: Select OK and then tab to Done and hit enter. Next Scroll to Install License Key File and hit enter: From here type in the location and name of your license file: Next I select the Recovery Kit Selection Menu and hit enter: Here I select Networking: Hit the spacebar to select the LifeKeeper Recovery Kit for EC2. Tab to Done and hit enter. Next I selected the Database menu, scroll down and hit spacebar on LifeKeeper Oracle RDBMS Recovery Kit: Tab to Done or hit D and scroll down to Storage and hit enter. Next I hit the spacebar and select DataKeeper for Linux: Tab to Done and hit enter or hit d backing out to the Recovery Kit Selection and then tab to Done or hit D to back out the Main Configuration menu: Make sure LifeKeeper Startup After Install is selected and then finally one last tab to done or hit d and we get the Install confirmation screen: Here hit enter or y and the install will start. 12. Start LifeKeeperStartup the LifeKeeper GUI with /opt/LifeKeeper/bin/lkGUIapp if it fails it is likely because you don’t have the magic-number for the account you logged in the .Xauthority file. I logged in as oracle and then did an sudo -i to get to root. So, if my gui doesn’t load I will copy the /home/oracle/.Xauthority file to /root : Here I login as oracle: 13. Connect to second serverAnd then click on the Cluster Connect button Login as oracle: 14. Build communication pathsClick on the Create Comm Path button : If there is a failure, make sure firewall and iptables are disabled. Hit next: Hit next: Pick your first IP address and hit next: Select the remote IP: Hit next: Hit Create: Hit next: Now hit done: Next we need to create the second comm path by repeating step 14 with the secondary addresses. Once two paths have been successfully established the servers should go green. 15. Create a DataKeeper resourceClick on the Create Resource Hierarchies button: Select Data Replication and hit Next: Hit next (Intelligent means that after a failover you need to manually fail back): Hit next: Select your primary server and hit next: Select Replicate Existing Filesystem and hit next: Select the Existing mount point and hit next: Create a Data Replication Resource Tag and hit next: Select a File System Resource Tag and hit next:[1] For optimal performance the bitmap file should be placed on an ephemeral volume. For testing purposes the bitmap can be placed on the OS disk as shown above. Select the bitmap file location and hit next: Select no for Enable Asynchronous Replication and hit next: Select the Target Server and hit next: Select Switchback Type and hit next: Select Template Priority and hit next: Select Target Priority and hit next: HIt next: Select the Target Disk and hit Next: Hit next: Hit next: Select which network endpoints you want to use for replication and hit next: Select the mount point and hit next: Select resource tag and hit next: Hit Finish: Hit Done: If you click on the /u01 you will see the volume syncing: 16. Create Hierarchy with Virtual IP resourceClick on the create resource button: Select IP and hit next: Select Switchback Type and hit next: Select the Primary server and hit next: Enter the Virtual IP address from step 6 and hit next: Enter the subnet mask for the VIP and hit Next: Enter the network interface and hit next Enter the resource tag and hit next: After successful creation hit next: Select the Target Server and hit next: Select switchback type and hit next: Select priority and hit next: Select priority and hit next: Upon completion hit next: Hit next: Select the appropriate netmask and hit next: Select the interface and hit next: Select the resource tag and hit extend: Hit finish upon successful completion: Hit done after verification. 17. Create an Oracle Listener resourceMake sure the database and listener are running prior to attempting to configure these resources in LifeKeeper. Click on the create resource button: Select Oracle Database Listener and hit next: Select the primary server and hit next: Enter the Listener configuration file path and filename and hit next: Hit next: Enter the path for the Listener Executables and hit next: Select protection level and hit next: Select recovery level and hit next: Select the IP Address associated with the Listener if required and hit next: Enter the listener tag name and hit Create: Hit next: Hit accept defaults to build the resource on your second server: Click on finish: Click on Done and expand the LSNR and /u01: 18. Create Hierarchy with Oracle DatabaseClick on the Create Resource Hierarchy button : Select Oracle Database and hit Next: Select Switchback type and hit next: Select Server and hit next: Select the Database name and hit next (If you get an error unable to find home directory, make sure database is running): Enter the sysdba username and hit next: Enter the password for the account and hit next: Select the Oracle Listener and hit next: Hit Create: Upon successful creation select Next: Select Accept Defaults: Select Finish: Hit Done: Extend the trees to see all resources: 19. Create Hierarchy with EC2Click on the Create Resource Hierarchie button : Select Amazon EC2 and hit Next> Select Intelligent and hit Next> Select your primary server and hit Next> Select the EC2 Resource type (we are using Backend cluster for this example) and hit Next> Select the IP resource and select Next> Select the EC2 Resource Tag name and hit Create Upon successful creation of the resource hit Next> after a few seconds the pre-extend wizard will pop up. Hit accept defaults: Once the checks are completed successfully hit Accept Defaults again: Hit Finish and after verification hit Done: The configuration is complete. Now we can test the failover. 20. Change Shutdown BehaviorBy default, LifeKeeper will not failover resources if you simply shutdown or reboot the server. If you want to move a workload prior to shutting down the server you should manually move the resources to the standby server before shutting down the active node. However, you may wish to change the default behavior to facilitate testing. That is controlled by changing the Shutdown Strategy as shown below. Right click on your Primary Server and Select Properties: Under the General Tab change the Shutdown Strategy to Switchover Resources and then hit Apply: Next select the secondary server from the server pull down and verify the setting change: Hit Ok: 21. Test FailoverI am running the lkGUIapp from the secondary server. If you are on the primary server exit out of the LifeKeeper GUI and run it from the secondary server. Expand all the Resource Hierarchies and open a SSH session to your primary server. I am also running a ping -i 5 to the oracle-vip: Shutdown the primary server: You can see in my case the IP stopped responding for < 25 seconds. I missed 4 pings 20-23 at 5 second intervals. Everything is now active on the backup server. Because our primary is still down we get warnings on the hierarchy. Once you bring up the Primary server if you left the switchback to intelligent, you will have to manually bring the service up on the primary. Make sure that the Primary server is InSync before trying to bring it into service: Right click on the StandBy button for cdb1 and select In Service… Click In Service Hit Done. It will take a few minutes for the disk to resync, but eventually it will. Upon restoring everything we now have an HA Oracle database in AWS that is ready for development. Reproduced with permission from SIOS |
December 30, 2022 |
Leading Beverage Manufacturer Protects Critical SAP ERP in AWS EC2 CloudLeading Beverage Manufacturer Protects Critical SAP ERP in AWS EC2 CloudSIOS Chosen Based on Certifications and Validations for SAP, Amazon Web Services and Red Hat LinuxA leading Hong Kong-based beverage manufacturer produces 61 beverage brands including the number one software drink brand in the world and distributes them to more than 728 million customers throughout Hong Kong, mainland China, Taiwan and western USA. The EnvironmentThe company relies on an SAP ERP (enterprise resource planning) system running in a Red Hat Linux environment to manage a variety of critical business operations. The SAP environment comprises a variety of services including the ABAP (Advanced Business Application Programming), SAP Central Services (ASCS), Evaluated Receipt Settlement, Web Dispatcher and the DB2 database. They used a large Storage Area Network (SAN) for data storage. The core SAP applications handle all business operations across the company’s beverage division. In their on-premises data center, the company provided uptime protection for this system using data replication and backups of the SAN. The ChallengeThe company’s IT department determined that they could achieve true high availability (99.99% uptime), disaster recovery, scalability and cost savings by migrating to the cloud and using failover clustering to protect their critical SAP system. However, they realized that SAN and other shared storage required for traditional failover clustering is not practical in some clouds and is not available in others. The EvaluationAfter extensive evaluation, the company chose to move their SAP environment to Amazon EC2. They established four key criteria for evaluating their choices for an HA/DR solution. Their solution needed to:
The company’s cloud account manager recommended that they consider the SIOS Protection Suite, offered through AWS China. The SIOS software is certified by SAP for both NetWeaver and DB2, and that SIOS is fully tested and supported on Red Hat Enterprise and other distributions of Linux. The company tested the SIOS clustering software extensively under a variety of challenging failure scenarios, and also evaluated the throughput performance during periods of peak demand. The IT team’s confidence in SIOS Protection Suite increased as it passed each of their rigorous tests and proved to be remarkably easy to use. The SolutionSIOS Protection Suite for Linux enables SANless failover clustering to provide full HA and DR for SAP and its critical services. The SIOS software uniquely includes modules called Application Recovery Kits (ARKs) that provide application-specific functionality that simplifies configuration and ensures failover orchestration maintains application best practices. The SAP and HANA ARKs automate configuration steps and validate configuration inputs and manage IP failover, and boot order to minimize human error. Unlike other clustering software that only validates server operability, the SIOS clustering software verifying that SAP and critical services are running, that databases are mounted and available, that any file shares or exports are available, and that clients are able to connect. To ensure these services are all functioning properly, SIOS software continuously monitors the servers, virtual machines, operating system and all major components of the SAP software. For DR protection, the company located the active and standby cluster nodes in different AWS Availability Zones for geographical separation. The ResultsSIOS Protection Suite has made it possible for this leading beverage manufacturer to meet the stringent recovery time and recovery point objectives established for its SAP/DB2 environment. To date, the configuration has experienced no perceptible downtime, including during planned maintenance. And these results have been realized with minimal effort, making it possible for the IT staff to focus more on projects that enhance employee productivity or otherwise improve business operations. Reproduced with permission from SIOS
|
December 26, 2022 |
Video: How SIOS Ensures High Availability For The Financial Services IndustryVideo: How SIOS Ensures High Availability For The Financial Services IndustryIn this ongoing series about high availability (HA) and disaster recovery (DR) for various industries, Greg Tucker, Senior Product (Windows) Support Engineer at SIOS Technology, joins us to share his insights about how the company protects the financial industry from downtime and failover. SIOS has a global presence in the financial industry with customers ranging from commercial banking, various brokerage firms, wealth management, CPA firms and so on. No other industry is more mission-critical and sensitive to downtime and failure than the financial industry, with customers relying on critical applications for their online banking systems, ATMs, and payment systems. “We provide failure or clustering software that will protect their critical applications and data from downtime and/or a catastrophic event,” says Tucker. Tucker explains that in essence, the critical applications are deployed on a primary server, whether it be on-prem or in the cloud, as it is clustered with a secondary server or multiple servers. “In the event, the clustering software detects a failure, it will move all the resources over to the secondary node and restore services to the end users automatically; no data loss, no disruption,” he adds. Check out the whole interview above to learn more. Highlights of the discussion:
Solutions
Connect with Greg Tucker (LinkedIn) Reproduced with permission from SIOS |
December 18, 2022 |
Video: High Availability for Building Management and SecurityVideo: High Availability for Building Management and SecurityThis video covers high availability for building maintenance and security, featuring Harry Aujla, technical director at SIOS. Building Management System (BMS) solutions are software-based solutions running on hardware, designed and built with varying degrees of autonomy and intelligence. BMS can either be hosted on-site or off-site at a geographically distant control center. The BMS sector is at the cusp of another technical evolution as its customers are looking at how the cloud is changing the operating landscape. The market is now sufficiently mature in that many of the cloud vendors now offer secure and redundant connections to their platforms. There’s an implicit trust that BMS related data is being securely transmitted to and from the cloud. A lot of BMS companies are running in the cloud as well. To define your SLSs before customers embark on a high availability project is important. If we have an instance running in the cloud where our BMS solution is running and this instance for whatever reason happens to fail, the cloud vendors will take necessary actions to recover the instance. But what happens if you suffer an application software issue within the cloud instance? You need a way of monitoring application level failures and orchestrating their recovery. It’s important to consider adding a high availability clustering solution like SIOS that can address the application level high availability needs which can then contribute towards maintaining application performance. Reproduced with permission from SIOS |
December 14, 2022 |
SIOS LifeKeeper vs SUSE HAESIOS LifeKeeper vs SUSE HAEIn many enterprises, SAP systems are so essential to the operation of core enterprise business operations that even brief periods of downtime can have devastating consequences. However, Linux-based solutions can be complicated and error-prone. Even SUSE HAE and other open source clustering options, are highly manual and only protect individual components. Read the white paper to learn the differences between SUSE HAE and SIOS Protection Suite and find out the fastest, most accurate way to manage and optimize Linux environments. Download the white paper here
|