...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Table of Contents |
---|
Fundamentals
Please refer below section on workflow framework basics.
Overview
Insights has growing need of a customizable framework, which would help running offline tasks with in a defined order. Workflow framework is fully configurable, extensible, and scalable. Workflow has independent design and any component can make use of this feature by registering itself to workflow and schedule jobs. Workflow as a whole comprises of below entities that together manages the workflow framework.
Entities
Workflow Tasks: In workflow, a work is divided into smaller dependent tasks. In another words a task is a small unit of given work.
Rabbit MQ Channels: Each workflow task has its own rabbit MQ channel and each task uses these channels for intra-task communication.
Sample MQ Message Format : {"executionId":1602843384824,"workflowId":"Report_1602842568","currentTaskId":373,"nextTaskId":-1,"sequence":1,"isWorkflowTaskRetry":false}
Workflow Task Sequence: Each given task has a unique sequence number which ensures that all tasks are executed sequentially. When a task completes its execution then it notifies its next task using MQ channel.
Workflow Lifecycle Stages:
NOT_STARTED: Default status when any new workflow is created.
IN_PROGRESS: Status is changed to IN_PROGRESS when task is currently being executed.
COMPLETED: Status is changed to COMPLETED when all the tasks of the workflow are completed its execution without any error and exception.
ERROR: Status is changed to ERROR when any workflow task fails to complete its execution due to error or exception.
ABORTED: Any workflow when it is retrying and at a certain point of time it exceeds the maxretrycount then status of that workflow is changed to ABORTED.
RESTART: When any workflow is in ABORTED state then there is flexibility to RESTART that workflow again after fixing possible errors.
TASK_INITIALIZE_ERROR: When any workflow task fails to publish message to its next task after successfully completing current task due to issue in MQ.
Workflow Schedulers: Workflow scheduler is responsible for executing active workflow tasks. Currently there are four types of workflow schedulers.
Workflow Executor: Workflow executor is responsible for normal execution of a given task, it takes all workflows which are in COMPLETED, NOT_STARTED, RESTART for execution, execution frequency of this executor is configurable and it is configured in server-config.json file.
Workflow Retry Executor: Workflow retry executor is responsible for retrying those workflows which are either in ERROR state or in TASK_INITIALIZE_ERROR. Execution frequency of this executor is configurable and it is configured in server-config.json file.
WorkflowImmediateJob Executor: This executor is responsible for immediate execution of workflows. Any NOT_STARTED, RESTART, TASK_INITIALIZE_ERROR stage workflow and which is flagged for immediate execution is picked up for execution. This executor frequency is configured at system level and has fixed 5 min intervals.
Workflow AutoCorrection Executor: This executor is responsible for auto correcting workflows which have missed their timeline. It is configurable in server-config recommended frequency is 4 hrs.
Workflow Data Storage: Workflow uses below database tables for data storage.
INSIGHTS_WORKFLOW_CONFIG: This database table stores each workflow details like its id, scheduling information, nextruntime, workflow status. Etc.
INSIGHTS_WORKFLOW_TYPE: This database table stores type of the workflow task.
INSIGHTS_WORKFLOW_TASK: This database table stores all the workflow tasks.
INSIGHTS_WORKFLOW_TASK_SEQUENCE: This database table maintains the sequence of the each workflow tasks.
INSIGHTS_WORKFLOW_EXECUTION_HISTORY: this database tables stores execution history of the each workflow task.
Workflow Life Cycle Hooks: Each workflow usually goes through following phases.
Initialization Phase: In this phase workflow registers and initialize all its task in addition to that it also initializes workflow thread pool and starts workflow scheduler.
Pre-Processor Phase: In this phase workflow performs pre-requisite actions like writing necessary information to db .etc.
Execution Phase: In this phase the actual workflow task is getting executed.
Post-Processor Phase: Once task finished its execution then workflow performs some post action in this phase like updating execution history etc.
Error Phase: if any workflow task failed its execution due to error or exception then workflow enters into this phase.
Workflow Nextruntime and its Scheduling: Each workflow is scheduled based on its frequency. For each workflow frequency nextruntime is calculated which is in turn used by workflow schedulers to execute workflows. Please refer below table for reference.
...
Workflow Error Handling Phases: Workflow is said to be failed when it is unable to execute its task or goal which may be due to infrastructure failures like MQ, DB, and NEO4J down. In such scenarios workflow comes with built-in error handling mechanism. Workflow is capable for retry in below cases.
Workflow is failed while publishing message first time to MQ. In this scenario workflow do not have any execution history as it got failed in initial phase.
Workflow is failed in middle task where workflow has execution history available and in this case at least one of the tasks in ERROR state.
Workflow is failed with at least one task is completed but none of the tasks are in ERROR state.
Workflow Thread-Pool: Workflow framework comes with multi-threading architecture where each workflow task is further broken into small chunks and each such chunk is executed on its own thread. Workflow uses refined custom thread-pool and its parameters are kept configurable as per need. Based on load and other factors such as CPU, memory and others user can easily optimize workflow thread-pool. Below are the list of parameters which user can configure.
corePoolSize: Number of thread that initially JVM creates for tasks , for example corePoolSize is 5 then JVM keeps creating threads up to 5 these means if 6th task come then JVM creates 5 tasks and start them immediate and put 1 task in waiting queue
maximumPoolSize: maxPoolSize defines the maximum number of threads that can ever be created.
keepAliveTime: when the number of threads is greater than the core, this is the maximum time that idle threads will wait for new tasks before terminating.
waitingQueueSize: The queue to use for holding tasks before they are executed.
Components which uses Workflow:
Audit Reporting
Upshift
Prerequisites:
Overview:
Workflow framework comes with build-in reporting tool. To configure reports refer this https://onedevops.atlassian.net/wiki/pages/resumedraft.action?draftId=2409660492 .
Pre-Setup:
Server-Config : refer below example properties in server-config.json, user can change it according to need.
...
Workflow Engine Configuration:
Create new folder “workflowjar” in INSIGHTS_HOME.
Place latest PlatfromReport and PlatfromWorkflow artifacts into workflowjar folder.
Run the OS specific workflow service script present in insights docroot/nexus repository to register a service.
Use OS specific start / stop scripts to start / stop workflow engine.
Directory Structure For Report PDF:
Create folders exactly as follows and add required files in it.
$INSIGHTS_HOME\assessmentReportPdfTemplate.
Provide permission to above folder.
Workflow Frequencies:
Frequency | Reoccurrence | Start Date | End Date | Info |
---|---|---|---|---|
One time | Not Available | Mandatory | Mandatory | One time report is configured when user needs report on specific data range in the past. |
Daily | Available | ---- | ---- | Daily report is configured when user needs the daily basis report and report is executed based on next runtime (please refer fundamentals section for more info) |
Weekly | Available | ---- | ---- | Weekly report is configured when user needs the report on weekly data and report is executed based on next runtime (please refer fundamentals section for more info) |
BI-Weekly Sprint | Available | Mandatory | ---- | BI-Weekly sprint report is configured when sprint report is needed and Start date is mandatory while configuring report. report is executed based on next runtime (please refer fundamentals section for more info) |
TRI-Weekly Sprint | Available | Mandatory | ---- | TRI-Weekly sprint report is configured when sprint report is needed and Start date is mandatory while configuring report. Report is executed based on next runtime (please refer fundamentals section for more info). |
Monthly | Available | ---- | ---- | Monthly report is configured when report is needed on monthly data. |
Quarterly | Available | ---- | ---- | Quarterly report is configured then report is needed on quarterly data |
Yearly | Available | ---- | ---- | Yearly report is configured when report is needed on yearly data. |
Steps to use Workflow framework for new component:
Define task and create a component class which must extends WorkflowTaskSubscriberHandler.java, call super constructor and implement handleTaskExecution() method.
Task should have a workflow type.
Insert workflow type in INSIGHTS_WORKFLOW_TYPE and task details in INSIGHTS_WORKFLOW_TASK table.
Register to workflow by making an entry in INSIGHTS_WORKFLOW_CONFIG table with required workflow details like:
workflowId: Its not auto generated and must be unique. It can contain alphanumeric characters and underscore. e.g. It can be in format like <workflow type>_<current time in sec>
workflowType: Provide a suitable workflow type name.
isActive: Only active records are picked up for execution. So, this flag should be TRUE.
scheduleType: Provide appropriate frequency/schedule according to which component tasks are required to be executed. (For frequency info refer - https://onedevops.atlassian.net/wiki/spaces/OI/pages/1476460548/Workflow+Configuration#Workflow-Frequencies%3A )
reoccurence: A boolean flag to keep workflow component execution on defined frequency.
runImmediate: If it’s true then it will be picked by workflow immediate job scheduler.
status: Initially it should be NOT_STARTED, later workflow will update it as per task execution.
And as per frequency it will also need start date or end date to calculate nextrun.
Add task sequence details in INSIGHTS_WORKFLOW_TASK_SEQUENCE table.
Miscellaneous:
Workflow State / Flow Diagram:
...