Today’s IT operations face some great challenges. Modern IT infrastructures are constantly changing at an unprecedented pace, environment complexity is greater than ever, and IT still operates in silos. The volume, velocity, and variety of the data IT operations is up against deserves to be called a big data problem.
Looking to overcome this problem, a new category of IT management tools recently emerged: IT operations analytics (ITOA), as it was coined by Gartner analysts, or simply IT analytics.
More and more senior IT operations managers are starting to leverage ITOA tools as a source of operational data for making key decisions. Leading vendors and startups have made significant progress in leveraging analytics for providing better IT operational insights. These ITOA solutions are the natural first step for elevating the capabilities of the current IT management toolset.
Yet, they are still limited. ITOA tools are constrained by operating in narrow silos (application performance management, log, network, etc.), by concentrating on just the symptoms that surround issues, and by their own limited analytics capabilities.
Now is the time for the next step: to realize the benefits and promise that IT operations analytics offers. IT stakeholders responsible for stability, performance, and security of business systems need to break out of their silos and apply a blended analytics approach that combines major sources of information for comprehensive analysis.
ITOA Today: More of the Same
Today's ITOA solutions are restricted by silos, a focus on symptoms, and weak analytics.
Stuck in Silos
Current ITOA solutions perpetuate silo operations. In the typical silo analytics approach, a number of essential data feeds still go unprocessed and are not correlated together, including deployment and release automation information, service requests, environment configuration, software configuration, application data and many others. One person may see performance while another sees logs, but no one sees the whole picture.
IT operations data is much more diverse, so while these insights are important, they fall short of actually nailing down sensitive issues. Rather, mapping the data to the environment components exhibiting abnormal behavior would help improve root cause analysis.
Focused on Symptoms
ITOA solutions focus on “symptoms” that represent a manifestation of a problem but not the true root cause of a problem. The root cause can be an undesired change, yet the concept of monitoring and analyzing changes is still missing from today’s ITOA tools.
Heavily relying on symptoms alone leaves IT with some critical drawbacks. It is very difficult and time-consuming to identify the true root cause of a problem just from symptoms. Reverse engineering from the symptom takes too much investigation time to be practical.
Even more, by relying only on symptoms, ITOA tools are left waiting for abnormal behavior to appear and only then pursue potential issues. Once a symptom is observed, it could be too late—users may already be experiencing the impact of the abnormal system behavior.
Limited Analytics Capabilities
In recent years, pressures bearing down on IT have pushed the ITOA space forward, driving the need for effective analytics. This has led various IT management vendors, particularly established ones, to announce the availability of new ITOA components that enhance their existing product portfolios.
In many cases, the so-called analytics has simply represented a new dashboard or a KPI aggregation. Many of the solutions still rely on users to define data analysis algorithms, providing users with libraries of predefined statistic functions. Most of the existing ITOA solutions do not yet fully apply machine learning, such as advanced statistical analysis or even domain-specific heuristics, to offer truly intelligent analytics.
Introducing Blended Analytics
Analytics must provide actionable insights to drive operational decisions and activities, both manual and automated. Reaching this level of insight entails analyzing all relevant IT data. Leveraging investments from various tools and data sources and adding more information helps lead to root causes.
Combining information from across silos, correlating symptoms and root causes, and mapping them into a context that is easily comprehensible for users, blended analytics takes ITOA a step further towards realizing the true benefits of ITOA.
What Blended Analytics Is Not
It’s important to mention that blended analytics is not a dashboard with different widgets for each data silo. That’s a unified interface view—really, an elevated dashboard—and doesn’t actually blend the data together at the analysis level.
Blended analytics is also not a group of key performance indicators from different silos meshed together. While knowing that a number of performance alerts and log errors occurred in the same system over the last sixty minutes is useful, it is neither insightful nor actionable, and it isn’t detecting complex patterns. The data remains unrelated, uncorrelated, and, at the end of the day, unanalyzed.
What Blended Analytics Is
Blended analytics is about gathering, correlating, and analyzing data from multiple sources using a combination of methods to produce actionable insights. By analyzing this blend of data sources (e.g., performance, logs, automation, and change data), behavioral patterns, unusual event occurrences, and anomalies can be identified.
Analysis methods can include machine learning, statistics algorithms, and domain-specific heuristics.
Blended analytics proactively detects potential performance, availability, and security issues to provide automated root cause analysis for quick and effective prevention and resolution of issues. To make analytics insights comprehensible and useful, they are mapped to the context of IT environments and activities familiar to IT operations, such as applications topology, releases, service requests, etc.
Sources of Data for Blended Analytics
Most operational issues are caused by changes in the IT environment. It’s no wonder that the first question everybody asks when a performance issue is detected is, “What changed?” These “undesired changes” can be in environment configuration, application code, application data, application workload, or environment capacity. The only other cause that could impact operations would be hardware failure.
While change is a crucial element of blended analytics, this approach does not rely on change alone. Changes are correlated and analyzed with the various symptoms of abnormal system behavior and IT activity context, generating actionable insights.
Symptoms: Symptoms are identified via indicators that focus on system behavior and health, such as application performance management, system and network management, or log events. Observing different types of indicators increases the probability of early symptom detection. So, for example, errors in a log report that show a locking database can be the harbinger of an oncoming performance problem. The application may initially overcome the locking by retrying queries. Performance and system metrics will not show any meaningful transaction delays or other indicators of abnormal behavior. If the locking problem exacerbates, the number of application retries will grow, only then tipping off performance indicators about a problem. Likewise, transactions can also slow down, and logs wouldn’t even notice the issue.
IT context: To simplify the understanding of ITOA insights, contextual data maps and related data items highlight the IT activities that drive change, revealing significant properties of detected changes and correlating changes with related data items and points. IT context composes data from various IT process management and automation solutions that lead to changes. For example:
- Deployment automation provides a context around automated actions that change applications and infrastructure. Using data from release deployments, individual outlying changes, and patches will differentiate risky manual and automated changes. This way, change risk can be integrated to the same deployment event, presenting risks in a language common to IT users.
- The service desk enables a number of key IT processes, including incident management, change management, and configuration management. The data collected by the service desk includes change requests, configuration inventory, incident records, and other key data items produced and managed by IT processes. Using this data as a context for granular environment changes enables unauthorized changes to be identified, as opposed to actual changes that do not correlate to an approved change request.
- The configuration management database contains high-level information on the IT landscape topology and dependencies between environments and environment components. The impact of changes across environments can be traced using this information. Mapping technical insights on topology also allows for leveraging the criticality of business services as an input factor for risk calculation, which simplifies the understanding and management of the impact of change.
Changes: A critical yet surprisingly frequently overlooked source of data is changes. Changes such as application updates can introduce application defects; configuration changes can lead to misconfigurations; changes in a data set can impact performance or even cause a failure if the system was not designed to handle certain types, volumes, or parameters of data; and capacity changes can impact the ability of a system to support current and historical levels of workloads.
Collecting information about changes in the environment state is not easy, particularly for configuration, data, and code at the granular level and in near-real time. Information is stored in different formats across different sources (files, databases, registry in Windows, APIs, system utilities, etc.). With environment state data coming in huge volumes, business systems, especially in production, need to remain unaffected from this massive data collection.
The Future of ITOA
The success of the business more than ever depends on IT operations being able to provide reliable support for critical business systems and their underlying infrastructure. This means ensuring that the business side has stable services, supporting the introduction of new applications and infrastructure, and analyzing ever-growing amounts of operational data.
To finally realize the benefits of IT operations analytics, IT decision-makers need to break out of their silo-focused approach and apply blended analytics. By looking at change as the cornerstone of everything that happens in IT environments and applying a blended approach to data from application performance management, logs, service management, configuration, and other sources, the next generation of IT operations analytics tools will be better positioned to sift through terabytes of operational data in close to real time. This will lead to continuously improving and providing effective descriptive, predictive, and, eventually, prescriptive IT operations analytics.