Alerting Interface Analysis and Design
Alerting
Created Montag 13 April 2015
Analysis
UCS documentation specifies 2 interfaces related to Alerting:
- Alerting Interface: alert clients can update the state of an alert.
- UCS Alerting Interface: client side interface to receive alert-related events from UCS server.
Note
The documentation is not clear on how the methods between the 2 interfaces are related. Alerting defines a single updateAlert() method but UCSAlerting defines 3: receiveAlertMessage(), updateAlertMessage() and cancelAlertMessage().
JH - the Alerting interface is provided by UCOM to allow alertable recipients to acknowledge the alert. The UCS Alerting interface must be provided by the alertable recipient to receive alerts (and updates/cancellations) - these are the methods UCOM will call in response to the sender issuing send/cancel/updateMessage requests (from 5.3) with an alert message type.
EA - How does the Alerting interface allows alertable recipients to acknowledge the alert? updateAlert() method in Alerting interface receives only an AlertMessage as parameter.
After discussing this with the team, we came to the conclusion that Alerting.updateAlert() is too broad for this task. A new method in this interface needs to be introduced.
According to the documentation, the following is a usual "Send Alert with ACK" scenario:
JH - the Alerting and UCS Alerting labels are wrong (note made to fix in spec)
The "send message" path is clear. A Sender uses Client.sendMessage() to send a new AlertMessage to UCS. UCS uses a specific adapter to send the message to one or more Receivers. (Note that our NiFi implementation will not have any adapter for the alerting part. All the messages will be kept inside UCS).
When the Receiver wants to ACK the message it uses UCSAlerting.updateAlertMessage() (Note: it is not clear what are the parameters the Receiver uses for the invocation JH - just the alert message itself) to modify the status of the AlertMessage in UCS. UCS then notifies the clients using, according to the documentation UCSClient.handleResponse() method. This is where I think the documentation is wrong. The method that UCS should use to notify a client about a modification in an AlertMessage should be UCSAlerting.updateAlertMessage() JH - no, this is not a method in 5.3, see above (EA- Then how is a client notified about the ACK? UCSClient.handleResponse()?? -> UCSClient.updateAlertMessage()). This analysis document will use UCSAlerting.updateAlertMessage() and not UCSClient.handleResponse() to notify client about a modification in an AlertMessage.
UCSClient.receiveMessage vs UCSAlerting.receiveAlertMessage
The documentation doesn't specify what is the specific sequence of actions when a new AlertMessage is introduced in UCS via Client.sendMessage(). The singularity of this situation is that there are 2 "UCS" interfaces that has to be invoked here: UCSClient.receiveMessage() and UCSAlerting.receiveAlertMessage(). Given that the documentation is not specific, our NiFi implementation will notify both interfaces whenever a new AlertMessage is introduced in UCS.
AlertStatus vs AlertStatusByReceiver.
Another gray area in the documentation is what is the correct usage of alertStatus and statusByReciever properties of AlertMessageHeader class. Apparently, an AlertMessage has 2 levels of statuses: a global status and a specific status for each of the recipients. It is not clear how/if the global status is related in any way with each of the individual statuses of each of the recipients. Both levels are defined by the same enum: AlertStatus.
public enum AlertStatus {
Acknowledged,
New,
Expired,
Retracted,
Pending
}
Our NiFi implementation will use these 2 levels of statuses in the following way:
- alertStatus: it will only allow the following values: New, Pending, Acknowledged and Retracted. Any incoming message will start with a 'New' status. When a Receiver acknowledges the message, its status will be modified to Pending (if there are more Recipients that haven't yet acknowledged the message) or Retracted (If the recipient is the only recipient of the message or if all the recipients of the message already aknowledged it). When an AlertMessage is canceled, its status is changed to Retracted.
- statusByReciever: it will only allow the following values: Pending and Acknowledged. When a new AlertMessage arrives to UCS, it must contain one entry in this Map with a status of Pending for each of the message's recipients. When a recipient acknowledges an AlertMessage, it changes his own status to Acknowledged. UCS will analyze and update the global status whenever a statusByReceiver is modified.
UCSAlerting methods
This section explains when each the methods of UCSAlerting are invoked by UCS.
receiveAlertMessage
This method is invoked whenever a new AlertMessage arrives to UCS. Usually, an AlertMessage is introduced into UCS using Client.sendMessage().
updateAlertMessage
This method is invoked whenever an AlertMessage is updated in UCS. An AlertMessage is modified using Alerting.updateAlertMessage().
Note
Client interface also defines a method called updateMessage, but the intention of this method is to update a message that is being prepared BEFORE it is actually sent to UCS (JH - not strictly. We allow updates even after a message has been sent by UCOM, however for ti to be meaningful, it has to make sense in the modality used to transmit the message). In the current NiFi implementation, Client.updateMessage is not related in any way with UCSAlerting.updateAlertMessage.
EA - (From 5.3: "[Client.updateMessage()]This operation is used to modify a message. Unsent messages are considered fully mutable except for the message ID and creation information. Once a message is sent, the message is immutable.").
cancelAlertMessage
Is is not clear in the documentation what is the action in UCS that triggers this method. Client interface defines a method called cancelMessage but this method is used to cancel a message that was being prepared BEFORE it was actually sent to UCS.
A way this method could be triggered is when a client uses Alerting.updateAlertMessage() specifying a status of Retracted. We need to investigate further.
Sequence Diagrams
Send and receive an Alert Message
Send an Alert Message and ACK
Send an Alert Message and update it
Important
This operation is vaguely described in the specification. The implications of the modifications of a message are too complex to be included in this implementation. We are not going to support this operation.
Send an Alert Message and cancel it
Alerts timeout and escalation
Even if there is no technical limitation, Alert Messages are not designed to support responses. The purpose of an alert is to notify a set of recipients and, most of the times, to expect an ACK from the recipients. In the context of an Alert, then, what determines the timeout/escalation is not the absence of a response, but the alertStatus of the Alert Message.
If, when the timeout period is reached, the status of the Alert Message is not "Acknowledged", the escalation mechanism should be triggered.
Design
Alerting Interface Design
The Alerting interface is similar to the Client interface already implemented in UCS. This interface is a one-way mechanism to execute a command on UCS. The available command in our NiFi implementation will be: updateAlertMessage.
An option for the implementation of this interface is to implement it as a specific command inside Client workflow.
A specific processor must be implemented to hold the logic related to this operation: UCSAlertingUpdateAlertMessage.
UCSAlerting Interface Design
The UCSAlerting interface is similar to the UCS Client Interface already implemented in UCS. Before this interface is enabled for a client, it must be registered into UCS (like UCSRegisterUCSClientCallback in Client Interafce workflow is currently doing).
A new UCSRegisterUCSAlertingCallback processor must be implemented to allow clients to register themselves as a UCSAlerting client. UCSControllerService must be modified to hold these references.
Whenever an AlertMessage is created/modified/canceled in UCS, all the UCSAlerting clients that are currently registered must be notified. This is something similar to what UCS Client Interface workflow is currently doing. A new processor called UCSGetUCSAlertingCallbacks must be created to retrieve any registered UCSAlerting client from UCSControllerService (similar to what UCSGetUCSClientCallbacks is currently doing).
New Workflow/s
Depending whether we decide to re-use Client Interface workflow or not, one or two new workflows must be implemented.
Re-use Client Interface Workflow
If we decide to re-use Client Interface workflow then we need to introduce a new processor into this workflow: UCSAlertingUpdateAlertMessage. This processor will implement the required logic to update an Alert Message into UCS. Another processor that needs to be implemented in this workflow (no matter if we decide to re-use it for the Alerting interface or not) is UCSCancelMessage. This new processor will cancel an existing AlertMessage.
Two new output ports must be added to this workflow so other parts of UCS can be notified about the modification and cancellation of an Alert Message (i.e. UCS Alerting Interface workflow).
In order to allow clients to register UCSAlerting callback interfaces into UCS, a new processor must be implemented and added into Client Interface workflow: UCSRegisterAlertingCallback.
The previous image shows how Client Interface workflow will look after the modifications for the Alerting Interface are implemented.
Implement new Alerting Interface
In this approach, the two previously mentioned processors - UCSAlertingUpdateAlertMessage and UCSRegisterAlertingCallback - are implemented into a specific workflow called "Alerting Interface". The structure of this workflow is similar to the Client Interface.
As you can see in the previous image, the new workflow has a similar structure to Client Interface: a message is received via HTTP, parsed and processed according to its content. The 2 available "commands" in this workflow are: registerUCSAlertingInterface (or registerUCSAlertingCallback) and updateAlertMessage.
Note
UCS Alerting Interface
This workflow is in charge of notifying any previously registered UCSAlerting interface about alert-related events happening inside UCS. The supported events are: a new Alert Message is present in UCS, an Alert Message was modified inside UCS and an Alert Message was canceled inside UCS. Each of this events is represented in the new workflow as an input port.
Just like UCS Client Interface workflow, this new workflow notifies any previously registered UCSAlerting callback about an alert-related event that happened inside UCS.
New Processors
UCSAlertingUpdateAlertMessage
This processor is the most important processor in the entire Alerting Interface implementation. This processor receives an AlertMessage as its input, and generates a FlowFile containing 2 AlertMessages.
The incoming AlertMessage MUST have a MessageId that is known inside UCS. The original Alert Message is then retrieved and a diff is performed between the old and the new version of the message.
Important
UPDATE: NiFi UCSAlertingUpdateAlertMessage processor will only take as input parameters the id and new status of the message to be updated. Receiving an entire message as input made things complicated - and error prone - to the mechanism we currently have in place to process commands. This implementation also makes more clear what the intention of the processor is. The impedance between what UCS specification stands for Alerting.updateMessage() and the way NiFi implements it will be hidden behind the concrete implementation of ucs-nifi-api.
The way the status of an Alert Message is processed is this:
1.- If the alertStatus property of the original alert message in UCS is other than "Pending" or "Acknowledged", the processor will fail. The incoming flowfile is redirected to REL_STATUS_MISSMATCH (using UCSCreateException.routeFlowFileToException())
2.- If the alertStatus property of the new alert message is other than "Acknowledged", the processor will fail. The incoming flowfile is redirected to REL_STATUS_MISSMATCH (using UCSCreateException.routeFlowFileToException())
3.- If the alertStatus property of the new message is different than the alertStatus property of the original message, the property in the original message is updated. Use REL_SUCCESS. The message is updated in UCSControllerService by invoking updateMessage().
4.- If the alertStatus property of the new message is equals to the alertStatus property of the original message, the flowfile will be directed to a REL_NO_UPDATE relationship.
If there is no error during the execution of this processor, the output of this processor is a flowfile containing a serialized version of both, the original and the updated message.
UCSRegisterUCSAlertingCallback
This processor is used to register a callback URL that is used to notify about UCSAlerting-related events to a client. The implementation of this processor is very similar to UCSRegisterUCSClientCallback. UCSControllerService must be enhanced to keep track of these callbacks.
UCSGetUCSAlertingCallbacks
Similar to UCSGetUCSClientCallbacks, this processor retrieves any previously registered UCSAlerting callback URL from UCSControllerService and generates a flowfile for each of them.
UCSCancelMessage
This processor implements the expected behavior of Client.cancelMessage() operation. This processor retrieves the messageId from the incoming FlowFile, retrieves the related message from UCSControllerService and performs the following operations:
1.- If there is no message in UCSControllerService with the specified id, the processor will route the incoming FlowFile to a REL_UNKNOWN_MESSAGE relationship.
2.- If the message in UCSControllerService with the specified id is not an AlertMessage, the processor will route the incoming FlowFile to a BAD_MESSAGE relationship.
3.- If the alertStatus property of the message in UCSControllerService with the specified id is already "Retracted", a new FlowFile containing the message will be directed to a REL_NO_UPDATE relationship.
4.- If the alertStatus property of the message in UCSControllerService with the specified id is "Pending", the alertStatus value is changed to "Retracted" and a new FlowFile with the serialized message will be directed to a REL_CANCELLED relationship.
5.- If the alertStatus property of the message in UCSControllerService with the specified id is other than "Pending" or "Retracted", the original FlowFile will be directed to a REL_INVALID_STATE relationship.
Alerts timeout and escalation
Given that the current implementation ACKs an Alert Message as soon as the first recipient ACKs it, there is only one escalation possibility: onNoResponseAll.
When an Alert Message is persisted - UCSPersistMessage processor - a new cron job needs to be scheduled if the message fulfils the following requirements:
1.- The Message is an AlertMessage instance
2.- The respondBy property of the message's header is > 0
3.- The receiptNotification property of the message's header is true
The cron job that gets scheduled implements the following logic:
1.- If the alertStatus property of message if "Acknowledged", "Retracted" or "Expired", the cron job ends silently.
2.- If the alertStatus property of the message is "Pending", any Message present in onNoResponseAll list is notified to UCSControllerService.notifyAboutMessageWithResponseTimeout(). The timeOutReason of the generated TimedOutMessage is NO_RESPONSES.