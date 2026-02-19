When a company’s IT system goes down, the challenge is often the flood of alerts that follow. One failure can trigger dozens or even hundreds of notifications across servers, applications and cloud systems. Since many organisations use multiple monitoring tools, the same issue may show up several times in different dashboards, forcing IT teams to sort through the noise before fixing the actual problem.

ManageEngine, the IT management division of Zoho Corporation, is attempting to address this challenge with new artificial intelligence (AI) capabilities added to its Site24x7 monitoring platform recently.

Speaking to CiOL, Srinivasa Raghavan, Director of Product Management at ManageEngine, said the bigger challenge during outages is not spotting the failure but dealing with the flood of alerts that follow. When one part of a connected IT system fails, it can trigger dozens of related notifications, even if fixing it requires just one action.

“What Site24x7 does is group all alerts from a single trigger into one problem using contextual topology information, so the IT team sees one problem instead of a hundred alerts,” Raghavan said.

The platform combines related alerts into a single incident view and uses a noise-scoring system to suppress repetitive notifications. A feature to remove duplicate alerts coming from third-party monitoring tools is also being developed and is expected to roll out soon.

In beta trials with 40 to 50 customers, Raghavan told CiOL that early results have been positive. While he did not commit to a fixed number, he said enterprises have seen alert reductions of 50% to 60% or more. In one case, a global IT services firm filtered out nearly 90% of alert noise after using the new features, the company said.

Deduction And Recovery from Hours To Minutes

The main benefit of the AI upgrade, Raghavan said, is faster identification of the real cause of a problem. In one internal case, diagnosis time dropped from nearly an hour to just five to ten minutes.

In a Kubernetes-based customer environment, that is in a customer’s cloud-based software system that runs many applications together, the issue was identified within minutes, compared to the typical 30 to 40 minutes required.

Raghavan said that simple hardware issues, such as servers exceeding memory or CPU thresholds, are already being handled effectively through traditional automation. He said AI is most useful when problems are complicated and involve multiple systems, making it harder for teams to manually connect the dots and find the real cause.

Letting AI Take Action — With Guardrails

Beyond identifying the problem, the system can also fix certain routine issues on its own, but with strict safeguards. Raghavan said the AI follows pre-approved step-by-step guides used by engineers and does not go beyond those instructions. It can automatically restart applications or stop faulty processes, but any major changes to databases, networks or core software still require human approval.

To avoid mistakes, the system uses a confidence score and only acts automatically when it is at least 90% sure of its diagnosis, Raghavan said. Every action taken by the AI is recorded and can be reviewed later if needed.

The approval and tracking process runs through Qntrl, Zoho’s workflow platform, which keeps a clear audit trail. Raghavan added that companies can also connect third-party workflow tools instead of being limited to in-house systems.

Raghavan acknowledged competition from players such as Dynatrace, Datadog and ServiceNow, but said the company’s strength lies in covering a wider range of systems. He said most enterprises run a mix of older infrastructure and modern cloud setups, and the platform is designed to monitor both traditional systems and newer cloud and Kubernetes environments in one place.

The AI-driven capabilities are now being rolled out to enterprise customers on professional and enterprise plans, initially as an advisory intelligence layer alongside existing monitoring systems, allowing IT teams to evaluate the technology before adopting deeper automation.