November 4, 2024 |
Webinar: Ensuring High Availability in a Multi-Cloud Environment: Lessons from the CrowdStrike OutageWebinar: Ensuring High Availability in a Multi-Cloud Environment: Lessons from the CrowdStrike OutageRegister for the On-Demand WebinarBusinesses increasingly use multiple cloud service providers to maintain flexibility and scalability; however, recent incidents like the CrowdStrike outage highlight that even top systems can encounter issues, particularly with updates and security patches. This webinar discusses best practices for implementing multi-cloud High Availability (HA) solutions to keep your mission-critical applications operational during unexpected disruptions. It also covers strategies to prevent downtime from system misconfigurations or problematic patches, ensuring you can effectively manage your cloud infrastructure. Watch the on-demand webinar to discover how to achieve HA in your environment and minimize preventable downtime. Reproduced with permission from SIOS |
November 1, 2024 |
Storage Considerations for Resizing Your Highly Available ClusterStorage Considerations for Resizing Your Highly Available ClusterWhen I was a Marine serving with a Tank Battalion, I remember that we’d all prepared ourselves to hear “FIRE IN THE HOLE” just before we shot a projectile. Even if you did not hear others yell this, we had radios/coms, hand/arm signals, flags, flares, etc. indicating that all things were “a go” and the projectile was headed down range. We all knew that communication was essential. The Importance of Communication in Cluster Storage ResizingIf you are Database Administrator, Server Engineer or an IT generalist responsible for the health of the application resources on your cluster (DataKeeper storage), communication is essential for you too. For example, how do you notify others about your efforts to scale your storage? To be successful, it’s likely you are going to need to communicate with several other members of your team about a wide range of topics, related to your Source and Target Volumes, including:
Who on your team will yell “FIRE IN THE HOLE” when it’s time to provision your existing DataKeeper Mirror(s)? Don’t you want to be notified before and after? Key Steps for Coordinating DataKeeper Storage ResizingYour DataKeeper Storage requires a few things that need to be communicated to all stakeholders; internally or externally (hosted):
Marine: “Are you ready?” Other Marines: “Yes!” (There is some swearing of course, WE ARE MARINES! LOL) Marine: “FIRE IN THE HOLE” DataKeeper Administrator: “Pause and Unlock Mirror” aka “FIRE IN THE HOLE”
Ready to optimize your storage for high availability? Connect with SIOS experts today to ensure your cluster resizing is smooth, efficient, and built to scale. Reproduced with permission from SIOS |
October 28, 2024 |
Top 5 Preventable Support Calls (And How To Avoid Them)Top 5 Preventable Support Calls (And How To Avoid Them)As a Customer Support organization, we hear from our customers all over the world every day. Customers call or email to open cases with us when they have questions or problems they need help with. Some of the cases end up being new problems and many cases end up not being new at all. Customers seem to run into the same issues over and over again. After 20 years of working in customer support and thousands of cases later, we still see new problems that have never been reported before and those fall into common categories as well. This keeps our work very interesting! One thing that we have noticed is that there are common categories that customer reported problems fall into. Here are the top 5 reasons (root causes) that our customers reach out to us for help: 1. Network Problems: How to Plan Ahead and Avoid DowntimeMany times customers need to change the IP addresses in the cluster. Sometimes, the ramifications of making changes to the network configuration are not realized or planned ahead of time. When the network changes are made, issues can occur with the cluster that may not have been expected. If the IP address that changed is used in the DataKeeper and LifeKeeper configurations, such as a mirror endpoint or a communication path, then you need to make changes in the DataKeeper and LifeKeeper configurations so that the products are aware of this change. Plan Ahead Update Mirror IP Address 2. Configuration Issues: Common Mistakes and How to Fix ThemOften, the root cause of the problem reported ends up being a configuration issue. Customers report that their configuration is not working correctly or the product appears to not be working properly from what they are seeing from the product GUI. Typically, configuration issues are a result of something that changed in the cluster environment from the original cluster configuration or something that was not setup correctly when the product was first installed. Examples of common configuration issues reported:
Many times customers need to expand/grow their volumes. One of the key product requirements is the source volume must be equal to or smaller than the target volume, otherwise the product will not be able to resync the data from the source to the target volume. While this may seem logical, it is often overlooked. Sometimes the target volume ends up smaller than the source and this leads to the volume not being able to reach a mirroring state. The following documentation and videos explain the procedure for expanding your DataKeeper volumes.
When installing DataKeeper the user is prompted to enter the login credentials to be used by the DataKeeper service. A domain account with administrator privileges is recommended and most customers create an account specifically for DataKeeper to use. The domain account used must be added to the Local System Administrators Group. This account must have administrator privileges on each server that DataKeeper is installed on. Many times the account is not added to the Local System Administrators Group and this prevents DataKeeper from being able to connect to itself and other DataKeeper servers in the cluster. Refer to the documentation for more detailed information located here. The majoring of the time Configuration issues require changes to be made to the cluster to get the DataKeeper or LifeKeeper products back to a working environment again. We recommend reaching out to support before changes are made to the cluster environment so that we help ensure that you are headed in the right direction and point you to the documentation and videos that we have on the subject. 3. Upgrade Planning: Avoiding Disruptions in Your SystemsUpgrades are a common part of a system administrator’s tasks. There is always a need to upgrade something on your systems as new versions are released: the operating system, the application software, the system firmware, the database software, security software, etc. This can be overwhelming if there are multiple upgrades that need to be done on your systems. Many customers reach out to Support when planning to upgrade DataKeeper or LifeKeeper and ask questions to make sure they understand the upgrade process before actually implementing the upgrade. This is what we like to see. We do see cases where some customers don’t reach out prior to performing upgrades and unexpected problems occur. Many believe that upgrades are routine; however, there are some upgrades that create incompatibilities and can cause issues. Upgrade Planning 4. External or OS Related Issues: Troubleshooting Beyond the SoftwareWhat are external or OS related issues? We refer to root causes as external or OS related issues when the reported problem turns out to be something that is outside of the DataKeeper and LifeKeeper area. DataKeeper and LifeKeeper use many of the server components such as: disks/volumes and network. If the operating system cannot “see” the disk or volume, then DataKeeper and LifeKeeper cannot “see” the disk or volume either. At first glance, problems reported may appear to be DataKeeper or LifeKeeper related, however, when analyzing the issue it is determined to be an operating system component that DataKeeper or LifeKeeper depends upon. For example, for a DataKeeper mirror to function properly, DataKeeper requires that the volume is visible to the operating system, on-line, healthy, and has a valid file system. If these requirements are not met, the DataKeeper mirror will not be able to mirror the data from one system to the other. DataKeeper will show that the mirror is in the Paused state. When debugging this problem, the Windows Disk Management tool for the Disk/Volume shows the volume is either off-line, not in a healthy state, or is a raw device. Once this is corrected, DataKeeper can mirror the data again from one system to the other. For more details refer to the video, Preparing Storage for DataKeeper Usage, located here. Another example of an external or OS related issue occurs when the DataKeeper volume fails to lock on the target system. DataKeeper purposely locks the volume on the target system to prevent writes from occurring on the target system. In order for DataKeeper to lock a target volume, there cannot be an OS page file on the volume. Many times, systems are configured at the OS level to “Automatically Manage Paging Files” and sometimes page files end up getting placed on the DataKeeper volumes by the OS. To overcome this, we recommend that this OS setting be changed. Refer to this link for further details. 5. Performance: Improving System and Mirror EfficiencyCustomers also contact us to improve their mirror performance and system performance with mirroring because the mirrors are not going into a mirroring state or the product is slowing down the performance of the system. The first issue (mirror not reaching a mirroring state) is simply a matter of tuning registry keys in DataKeeper to match your system configuration using Tunables such as WriteQueueHighWater, WriteQueueHighWaterSynchronous, and BlockWritesonLimitReached are several commonly changed tunables. Refer to the documentation for these tunables located here. The second issue (performance of the system) is simply a matter of moving the location of the DataKeeper bitmap. By default the bitmap is located on the C drive and may need to be relocated to a faster drive. Refer to the documentation and video for information on relocating the bitmap here. System and product tuning is often done to maximize performance. Examples of these changes include changing the product tunables to more closely match with the customer’s environment. There are many things that can affect DataKeeper and LifeKeeper including the operating system, network, storage devices, etc. DataKeeper and LifeKeeper use default settings that may need to be tuned to the customer’s specific environment. We do offer Validation and Health Check Services to help customers ensure that HA best practices are implemented. Visit this link for details on our offerings. A key strategy that we recommend is to ensure that testing is completed prior to going into production so that problems, including performance issues, are found and resolved earlier in the process. Testing is often done in a test or QA environment prior to going into a production environment. It is always best to try to simulate the production environment load on a test / QA environment to ensure that the production environment will perform sufficiently. We recommend reading several of our blogs on performance located at our blog and specifically at here. Ensure your systems run smoothly by staying ahead of these common issues. Need expert guidance? Contact our support team today to help you prevent future support calls! Reproduced with permission from SIOS |
October 19, 2024 |
Help Us Help You: How to Provide Essential Info for Faster SIOS SupportHelp Us Help You: How to Provide Essential Info for Faster SIOS SupportAnyone who has used software in their business knows that eventually you will face issues requiring help from a customer support center. All too often, we receive an email asking something like ‘Why did my system fail”? I would correlate this to calling a mechanic and asking, why is my car making a knocking noise? Obviously, he will need more information to diagnose this issue, and will need to examine the engine in person. Much like the mechanic, the software engineer assigned to your issue will need as much relevant information as you can provide. The sooner you can provide that information, the sooner we can start working towards a resolution! Speed Up Troubleshooting with the Right Diagnostic InformationWhen calling SIOS, here are some ways that you can significantly speed up the troubleshooting process by providing the right information… Gather Necessary Diagnostic Tools for Linux and WindowsGather the ‘lksupport’ logs: You may either email those to us or attach them directly to the case. In instances where they are too large, you can request a drop box in which to upload these files. For LifeKeeper for Linux, you can generate logs by running: “sudo su” (administrator terminal) /opt/LifeKeeper/bin/lksupport This will create a .tar file for each node under the directory: /tmp/lksupport EXAMPLE: /tmp/lksupport/.1907251056.tar.gz For Windows products, use the following procedures: 1) open an Administrative command prompt 2) cd %extmirrbase% 3) cd support From that directory, run the commands: DKSUPPORT.CMD lksupport These will gather up system and DataKeeper information as well as the Windows Event logs and create a zip file. These commands should be run on each node in the cluster. Provide Screenshots and Error DetailsScreenshots can be very valuable in diagnosing issues. Capture the error message, the problematic screen, or any unusual behavior to help the support team understand the issue better. Critical Information to Include in Every Support CaseHere is some additional information that should be provided with every new case: 1. Which node was active, and which was the target at the time this occurred? 2. Describe (in detail) every step, including the date and time leading up to this issue. 3. Describe (in detail) every step taken in trying to resolve this issue. 4. What is the current status (is it up and replicating)? 5. Is this a production, QA, or test cluster? Provide Relevant System Information for Faster Support
Be Patient and Cooperative During the Troubleshooting Process
Keep a Record of Communications by Using the Customer Support Portal
Collaborate with SIOS Support for a Swift ResolutionBy following these tips, you’ll become a more effective partner in troubleshooting and increase your chances of a swift resolution. Remember, the goal is to work together to find a solution that gets you back on track as soon as possible! Need assistance right now? Contact SIOS Support to open a case or speak with an expert. We’re here to help you resolve issues quickly and efficiently! Reproduced with permission from SIOS |
October 13, 2024 |
Importance of determining the right level of protectionImportance of determining the right level of protection |