A Tester's Role in AIOps

[article]
Summary:
“AIOps” stands for “artificial intelligence in IT operations,” or using machine learning and data science to solve IT problems. AI can help with many IT functions, including detecting and remediating outages, monitoring availability and performance, and IT service management. Like with DevOps, a tester plays an important part with AIOps—they just have to determine what that is.

“AIOps” stands for “artificial intelligence in IT operations,” or using machine learning and data science to solve IT problems. AI can help with many IT functions, including detecting and remediating outages, monitoring availability and performance, and IT service management.

Big data and machine learning (ML) are the two primary components of AIOps. It consolidates data from monitoring, the service desk, and automation, in turn delivering useful insights and increasing business value.

AIOps is becoming a more popular process to cater to the fast-paced delivery of complex applications with huge amounts of data that we all deal with in this digital age. In fact, Gartner predicts that large enterprise-exclusive use of AIOps and digital experience monitoring tools to monitor applications and infrastructure will rise from to 30% in 2023.

How should a team get started with AIOps? Like with DevOps, AIOps requires a cultural shift. Also like with DevOps, a tester plays an important part with AIOps—they just have to determine what that is. Let’s look at the role of a tester in today’s AIOps-powered world.

Getting Started with AIOps

The general process by which AIOps platforms and solutions operate involves observing, engaging, and acting.

When we were beginning AIOps implementation, my team made ourselves familiar with the AI and ML vocabulary by conducting training. With this new knowledge, we analyzed the feasibility of our project making use of AIOps. 

Next we started selecting the test cases for this project. We held brainstorming sessions with the other involved teams and heard each team’s viewpoints. The business team gave their input on their core workflow scenarios, the development team gave their input on nonfunctional scenarios, and the infrastructure team gave their input on application monitoring. After that, we chose and finalized the test cases.

With AIOps, we can immediately test the code for performance and regression, automatically analyze the test traffic, and detect issues early. We still practice continuous testing in DevOps by testing over the whole lifecycle continuously from beginning to end, but to make this process more powerful, we decided to integrate the DevOps pipelines with the complete AI-based application performance management solution. In turn, it resulted in a powerful AIOps tool. 

AI-based APM solutions conduct analysis of traffic, logs, and resource utilization and detect any inconsistency. If an inconsistency is observed, an alert is triggered. Based on the alerts, we built automated scripts for known issues that can be executed right when the issue occurs.

These are just some of the automatic abilities we’ve enacted:

  • When the disk usage is near capacity, add additional disk space automatically
  • Execute remedial scripts when a sudden peak usage in traffic increases database table data
  • Increase or decrease CPUs based on memory usage
  • Roll back to the previous build if the new build failed

How Testers Can Participate in AIOps

One major area where AIOps has helped my team is with our performance testing. This is my testing team’s experience working with AIOps, both before and after its implementation.

Before Implementation of AIOps

Performance testing is usually conducted in the staging environment, a standalone version of a simulated environment that is like production, but without real production data.

Before our implementation of AIOps, my testing team wrote our performance testing scenarios and reviewed them with the business teams and the development teams. We were not very aware of how the server behaved in peak usage with heavy loads. We overlooked this data and did not take it into account for our application performance management.

When we started executing the load tests, the server reached the peak usage for memory, and utilization of the system resources was very high. This led to the scripts failing and the server going down. Afterward, we analyzed the script execution results and application log files and reached out to the IT support team for help.

In the meantime, the IT team analyzed the APM monitoring system’s alerts and found the root cause of the issue. Based on our inputs from the application log analysis and their system logs analysis, they increased the system resources manually. 

With this experience, we learned that we missed correlating the execution of the performance test scenarios and the APM monitoring alerts. 

After Implementation of AIOps

Now, we choose performance testing scenarios wisely by involving the IT and infrastructure teams in the decision-making as well as the business and development teams. We updated our test scenarios to handle the APM alert notifications, too. In execution of the scripts, if the scripts fail, the whole team will receive a notification.

The IT support team also prepared remedial automated scripts to handle the peak usage of system resources. During performance test execution, if we encounter an error, thanks to the notifications it will be taken care of by the scripts the IT team added in the new pipeline.

Now that we don’t have to worry about the system resources, our testing team’s focus can be only on the performance testing activities and delivering high-quality software. After implementing AIOps, we efficiently utilize the team as well as the system resources.

The below picture shows the flow of AIOps with our performance testing scenario:

I hope the information from this experience helps you to enhance your testing!

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.