CLONE - Performance bottleneck in Disposition Lifecycle job caused by large database transaction
The disposition lifecyle job processes record actions in batches of 1000 in ACS, but the underlying DB actions are processed in a single transaction. This causes scalability problems when a single execution of the job has to process large numbers of records. It results in a single transaction holding an increasing number of exclusive locks, steadily consuming resources in the DB leading to an overall slowdown, and affecting concurrency for other DB processes that may be blocked by any of the locks being held.
DEBUG analysis from a recent customer example.
This job run processed 125k records
The log extract covers the first 22k records being processed
started at 04:10, last entry at 04:57
Timestamps show the job processing the first few thousand at approximately 18 records per second
By the time its over 20k, processing is down to 4 per second
Logging from another job run showed records being processed as slowly as > 1 second per record
disable the scheduled execution for the scheduledDispositionLifecyceleJobDetail cron
create a new file plan
create a disposition/retention schedule to cut off immediately
create a large number of records in the plan (5k-10k)
Enable the following debug logging in admin console - org.alfresco.module.org_alfresco_module_rm.job.DispositionLifecycleJobExecuter
trigger the scheduled job manually from the admin console or JMX
The db will show a single open transaction for the duration of the disposition schedule. Checking the locks against this transaction will show an increasing number of locks being held by it
The dispostion job should process nodes in more regular and smaller database transactions
The effect of a single large transaction and a lot of records to process is a steady decline in performance on the database and ACS as memory/heap/locks increases with no commit in the database to free up resources and allow GC top clean up the ACS heap
Before RM-1413, the job was processing each record in an individual transaction.
Following RM-1413, it changed to a single transaction for all records
Implement database transaction batching, which is exposed to be configurable, while not allowing it to be less than the 1000 of the disposition batch processing size
Thank you, will do. it looks like it’s an issue for PaaS too, but when I challenged further, its not as urgent so it could potentially be slipped into a Service Pack. I’ll address this with the Product team and let you know the outcome. Thank you!
Per the case, version Cassie needs HF for is: 184.108.40.206. I will verify AGS version with Customer.
Let me verify version HF is needed on. HF is primarily for one of the two customers, cases