I am working on a project where we are utilizing BPMN for authoring and controlling the processing of analysis. Any single analysis task may yield several descendants like how a
.ZIP file has many child files. Additionally, many analyzers also yielded additional analysis for both the inputted artifact as well as additional artifacts. Currently we we treating each and every child artifact, as well as child workflows, as completely separate entities.
This hurt us for a number of reasons: For artifacts that had a very large number of children, like large APK file types, they would clog up our system and prevent other users from utilizing the system until that processing was complete. Additionally, it was never evident when the total analysis was completed on the initial artifact. That made it that analysis from a descendant that may actually affect the final analysis of a top level artifact inaccurate and misleading.
I was tasked with fixing this issue to help both the system utilization issue as well as being able to accurately determine when a top-level workflow is indeed complete. I theorized that we were in fact under utilizing jBPM and that it is the proper way to handle this task. Initially I used the
forEach block which would iterate through all the new work orders. Instead of invoking the worker directly I recursively called this new handler I created. As each process completed it was returning the child workflows allowing jBPM to further invoke all child workflows. This worked really well causing all child workflows to finish up prior to considering its parent is complete.
Unfortunately, after some testing this proved to be a disappointment with respect to performance. The
forEach loop is blocking and is single threaded. That means for each child workflow you had to wait for its sibling to complete. This was a tremendous under-utilization of resources and really slowed down performance. I had attempted to optimize in other areas, but this was the bottleneck. I redesigned this quickly and got rid of the
forEach loop instead I handle this but submitting the
Runnable tasks directly to my threadpool. Now I did have to track those tasks completion which was an added complexity. This was well worth it. The end result yielded performance even faster than the initial non nested version. I strongly recommend this approach for large scale workflows that are utilizing a jBPM engine. This project was using an slightly older version of jBPM (5.5.0.Final) but I think this design would still be useful even with 6.X.X. I hope to post some sample code soon to better illustrate how to leverage this technique.
As for the other issue of clogging the system, now we can manually adjust how many “child workflows” consume the thread-pool. In fact, I configured it so that once the thread-pool became full instead of queuing up the next child workflow, it was run serially. This was necessary because the child workflows determined when the parents were deemed completed. That meant if the child workflows were queued up…it may be possible that the parent workflow could result in a deadlock and never complete. Forcing them to run serially would be slower but would ensure an eventual completion.
To all those that understand appreciate this, enjoy!