Workflow Configuration
Fundamentals
Please refer below section on workflow framework basics.
Overview
Insights has growing need of a customizable framework, which would help running offline tasks with in a defined order. Workflow framework is fully configurable, extensible, and scalable. Workflow has independent design and any component can make use of this feature by registering itself to workflow and schedule jobs. Workflow as a whole comprises of below entities that together manages the workflow framework.
Components which uses Workflow:
Audit Reporting
Upshift
Prerequisites:
Pre-Setup:
Server-Config : refer below example properties in server-config.json, user can change it according to need.
"assessmentReport":{
"outputDatasource":"NEO4J",
"maxWorkflowRetries":2,
"chartVendor":"Fusion",
"fusionExportAPIUrl":"http://localhost:1337/api/v2.0/export"
},
"workflowDetails":{
"corePoolSize":8,
"maximumPoolSize":20,
"keepAliveTime":20,
"waitingQueueSize":10,
"workflowExecutorCron":"0/30 0 0 ? * * *",
"workflowRetryExecutorCron":"0/1 0 0 ? * * *",
"workflowAutoCorrectionSchedular":"0 */1 * ? * *"
}
In above assessmentReport object ,
outputDatasource: this refers to output datasource where report data is stored. For example NEO4J , ES
maxWorkflowRetries: this is an integer number which refers to how many times workflow should retry in case of report failures. The recommend value is 3.
chartVendor: this property refers to chart vendor to be used for report visualization as of now Fusion vendor supported . Fusion is recommend value ( only applicable for Report Management).
fusionExportAPIUrl: This is CDN URL where fusion libraries are placed ( only applicable for Report Management) .
In above workflowDetails object,
corePoolSize, maximumPoolSize, keepAliveTime, waitingQueueSize : refer fundamental section for more details on how optimal values are given.
workflowExecutorCron: This is cron expression where user can define workflow executor frequency.
workflowRetryExecutorCron: this is cron expression where user can define workflow retry executor frequency.
workflowAutoCorrectionSchedular: this is cron expression where user can define workflow auto correction executor frequency.
Workflow Engine Configuration:
Create new folder “workflowjar” in INSIGHTS_HOME.
Place latest PlatfromReport and PlatfromWorkflow artifacts into workflowjar folder.
Run the OS specific workflow service script present in insights docroot/nexus repository to register a service.
Use OS specific start / stop scripts to start / stop workflow engine.
Directory Structure For Report PDF:
Create folders exactly as follows and add required files in it.
$INSIGHTS_HOME\assessmentReportPdfTemplate.
Provide permission to above folder.
Entities
Workflow Tasks: In workflow, a work is divided into smaller dependent tasks. In another words a task is a small unit of given work.
Rabbit MQ Channels: Each workflow task has its own rabbit MQ channel and each task uses these channels for intra-task communication.
Sample MQ Message Format : {"executionId":1602843384824,"workflowId":"Report_1602842568","currentTaskId":373,"nextTaskId":-1,"sequence":1,"isWorkflowTaskRetry":false}
Workflow Task Sequence: Each given task has a unique sequence number which ensures that all tasks are executed sequentially. When a task completes its execution then it notifies its next task using MQ channel.
Workflow Lifecycle Stages:
NOT_STARTED: Default status when any new workflow is created.
IN_PROGRESS: Status is changed to IN_PROGRESS when task is currently being executed.
COMPLETED: Status is changed to COMPLETED when all the tasks of the workflow are completed its execution without any error and exception.
ERROR: Status is changed to ERROR when any workflow task fails to complete its execution due to error or exception.
ABORTED: Any workflow when it is retrying and at a certain point of time it exceeds the maxretrycount then status of that workflow is changed to ABORTED.
RESTART: When any workflow is in ABORTED state then there is flexibility to RESTART that workflow again after fixing possible errors.
TASK_INITIALIZE_ERROR: When any workflow task fails to publish message to its next task after successfully completing current task due to issue in MQ.
Workflow Schedulers: Workflow scheduler is responsible for executing active workflow tasks. Currently there are four types of workflow schedulers.
Workflow Executor: Workflow executor is responsible for normal execution of a given task, it takes all workflows which are in COMPLETED, NOT_STARTED, RESTART for execution, execution frequency of this executor is configurable and it is configured in server-config.json file.
Workflow Retry Executor: Workflow retry executor is responsible for retrying those workflows which are either in ERROR state or in TASK_INITIALIZE_ERROR. Execution frequency of this executor is configurable and it is configured in server-config.json file.
WorkflowImmediateJob Executor: This executor is responsible for immediate execution of workflows. Any NOT_STARTED, RESTART, TASK_INITIALIZE_ERROR stage workflow and which is flagged for immediate execution is picked up for execution. This executor frequency is configured at system level and has fixed 5 min intervals.
Workflow AutoCorrection Executor: This executor is responsible for auto correcting workflows which have missed their timeline. It is configurable in server-config recommended frequency is 4 hrs.
Workflow Data Storage: Workflow uses below database tables for data storage.
INSIGHTS_WORKFLOW_CONFIG: This database table stores each workflow details like its id, scheduling information, nextruntime, workflow status. Etc.
INSIGHTS_WORKFLOW_TYPE: This database table stores type of the workflow task.
INSIGHTS_WORKFLOW_TASK: This database table stores all the workflow tasks.
INSIGHTS_WORKFLOW_TASK_SEQUENCE: This database table maintains the sequence of the each workflow tasks.
INSIGHTS_WORKFLOW_EXECUTION_HISTORY: this database tables stores execution history of the each workflow task.
Workflow Life Cycle Hooks: Each workflow usually goes through following phases.
Initialization Phase: In this phase workflow registers and initialize all its task in addition to that it also initializes workflow thread pool and starts workflow scheduler.
Pre-Processor Phase: In this phase workflow performs pre-requisite actions like writing necessary information to db .etc.
Execution Phase: In this phase the actual workflow task is getting executed.
Post-Processor Phase: Once task finished its execution then workflow performs some post action in this phase like updating execution history etc.
Error Phase: if any workflow task failed its execution due to error or exception then workflow enters into this phase.
Workflow Nextruntime and its Scheduling: Each workflow is scheduled based on its frequency. For each workflow frequency nextruntime is calculated which is in turn used by workflow schedulers to execute workflows. Please refer below table for reference.
NextRuntime | |||||
** Signifies start of the day of given date with time as 00:00:01(UTC) | |||||
Frequency | Report Creation Date / Month (Assumed for better understanding | Logic | Calculation | Next Runtime | Note |
Daily | 25/09/2020 ** | Add 24 hrs. to Report Date | 25/09/2020 + 24 hrs. | 26/09/2020 | Always executes on the last day data |
Weekly | 25/09/2020 ** | Go to start of the week of the date on which report is created then add seven days (1 week to it ) | 25/09/2020(start of the week) + 7 days | 02/10/2020 | Always executes on last week data |
BI-Weekly Sprint | 25/09/2020 ** | Report creation date + 14 days | 25/09/2020+ 14 days | 09/10/2020 | Always executes on last sprint data |
TRI-Weekly Sprint | 25/09/2020 ** | Report creation date + 21 days | 25/09/2020+ 21 days | 14/10/2020 | Always executes on last sprint data |
Monthly | 25/09/2020 ** | Goes to the start of the day of the report creation month | Start of the month | 1/09/2020 | Always executes on last month data |
Quarterly | 25/09/2020 ** | Based on the report creation month it checks on which quarter the report month includes and it sets the date to the start of that quarter (check quarter details on next tab) | Start of the quarter | 1/07/2020 | Always executes on the last quarter date |
Yearly | 25/09/2020 ** | Based on the report creation year , goes to start of the report creation year and sets the that date | Start of the year | 1/1/2020 | Always executes on the last year data |
Workflow Error Handling Phases: Workflow is said to be failed when it is unable to execute its task or goal which may be due to infrastructure failures like MQ, DB, and NEO4J down. In such scenarios workflow comes with built-in error handling mechanism. Workflow is capable for retry in below cases.
Workflow is failed while publishing message first time to MQ. In this scenario workflow do not have any execution history as it got failed in initial phase.
Workflow is failed in middle task where workflow has execution history available and in this case at least one of the tasks in ERROR state.
Workflow is failed with at least one task is completed but none of the tasks are in ERROR state.
Workflow Thread-Pool: Workflow framework comes with multi-threading architecture where each workflow task is further broken into small chunks and each such chunk is executed on its own thread. Workflow uses refined custom thread-pool and its parameters are kept configurable as per need. Based on load and other factors such as CPU, memory and others user can easily optimize workflow thread-pool. Below are the list of parameters which user can configure.
corePoolSize: Number of thread that initially JVM creates for tasks , for example corePoolSize is 5 then JVM keeps creating threads up to 5 these means if 6th task come then JVM creates 5 tasks and start them immediate and put 1 task in waiting queue
maximumPoolSize: maxPoolSize defines the maximum number of threads that can ever be created.
keepAliveTime: when the number of threads is greater than the core, this is the maximum time that idle threads will wait for new tasks before terminating.
waitingQueueSize: The queue to use for holding tasks before they are executed.
Workflow Frequencies:
Frequency | Reoccurrence | Start Date | End Date | Info |
---|---|---|---|---|
One time | Not Available | Mandatory | Mandatory | One time report is configured when user needs report on specific data range in the past. |
Daily | Available | ---- | ---- | Daily report is configured when user needs the daily basis report and report is executed based on next runtime (please refer fundamentals section for more info) |
Weekly | Available | ---- | ---- | Weekly report is configured when user needs the report on weekly data and report is executed based on next runtime (please refer fundamentals section for more info) |
BI-Weekly Sprint | Available | Mandatory | ---- | BI-Weekly sprint report is configured when sprint report is needed and Start date is mandatory while configuring report. report is executed based on next runtime (please refer fundamentals section for more info) |
TRI-Weekly Sprint | Available | Mandatory | ---- | TRI-Weekly sprint report is configured when sprint report is needed and Start date is mandatory while configuring report. Report is executed based on next runtime (please refer fundamentals section for more info). |
Monthly | Available | ---- | ---- | Monthly report is configured when report is needed on monthly data. |
Quarterly | Available | ---- | ---- | Quarterly report is configured then report is needed on quarterly data |
Yearly | Available | ---- | ---- | Yearly report is configured when report is needed on yearly data. |
Steps to use Workflow framework for new component:
Define task and create a component class which must extends WorkflowTaskSubscriberHandler.java, call super constructor and implement handleTaskExecution() method.
Task should have a workflow type.
Insert workflow type in INSIGHTS_WORKFLOW_TYPE and task details in INSIGHTS_WORKFLOW_TASK table.
Register to workflow by making an entry in INSIGHTS_WORKFLOW_CONFIG table with required workflow details like:
workflowId: Its not auto generated and must be unique. It can contain alphanumeric characters and underscore. e.g. It can be in format like <workflow type>_<current time in sec>
workflowType: Provide a suitable workflow type name.
isActive: Only active records are picked up for execution. So, this flag should be TRUE.
scheduleType: Provide appropriate frequency/schedule according to which component tasks are required to be executed. (For frequency info refer - Workflow Configuration | Workflow Frequencies: )
reoccurence: A boolean flag to keep workflow component execution on defined frequency.
runImmediate: If it’s true then it will be picked by workflow immediate job scheduler.
status: Initially it should be NOT_STARTED, later workflow will update it as per task execution.
And as per frequency it will also need start date or end date to calculate nextrun.
Add task sequence details in INSIGHTS_WORKFLOW_TASK_SEQUENCE table.
Miscellaneous:
Workflow State / Flow Diagram:
©2021 Cognizant, all rights reserved. US Patent 10,410,152