Auto Resolution is a new capability added to the OnBot Suite. Auto Resolution intends to give IT operations team an intelligent ChatOps based ecosystem to monitor, incidents remediation and events across applications and infrastructure. The system helps teams to make informed decisions and increases productivity using data driven recommendations for incident and event notifications. It help organizations to embrace ChatOps from a manual or Standard Operating Procedure (SOP) to a transitional rule based systems phase and finally land into an AI based auto remediation systems.
Overview:
How it works:
The below conceptual architecture denotes an illustration of a rule based remediation system. The rules are written as a simple JSON based issue-action mapping. For every event/incident, based on the action mapped to it, the recommendation engine will be able to provide a list of actions. These actions are posted back to the slack channel (using a slackapp) and the operations person can choose the actions or may enter a new action that would be triggered by a bot or a downstream remediation tool.
- Monitoring Tool sends alerts to the chat application based on an event (E.g.: application has crashed, build has failed, etc.)
- EventHandler bot parses the message posted by the Monitoring Tool and sends the Notification to the Neo4jDBHandler service to be written into Neo4j.
- Recommendation Engine service loads the static recommendation based on json mapping.
- EventHandler bot asks for recommendation from Recommendation Engine service.
- EventHandler bot posts the recommendations to the channel. The user may choose from the recommendations posted or enter a new command and submit to the chat application.
- Command is executed by the Remediation Tool.
- EventHandler bot parses the action taken and sends it to Neo4jDBHandler service to be written to Neo4j
The above illustration can be extended from a rule-based recommendation engine to a dynamic score based recommendation engine by using 3rd party software like Neo4j Graphaware recommendation engine. The incident action information that are stored into Neo4j would be proccessed by the Graphaware Neo4j recommendation engine and score would be applied. Based on the scores, the top recommendations for new incidents would be provided to the operations person via Slack. Neo4j and Graphware components are used as an illustration. Any equivalent software components can be used for building knowledge base and recommendations based on your implementation.