AGS DB query runs multiple times when invoking the documentlibrary from share
A customer has upgraded from ACS 5.2.7 to ACS 22.214.171.124 with RM 3.4.1, previously the customer used RM 3.3.1. After the upgrade the customer is seeing a DB query run multiple times, users report that navigating a site's document library and the repository library is slower than when they had 5.2.7 deployed.
[Steps to reproduce]
This issue is only visible in the customer's PROD instance. Their DBA has reported that the query below get executed multiple times maxing out the DB CPU.
2021-03-15 11:55:33 UTC:10.66.74.253(53624):alfscoowner@alfscop::DETAIL: parameters: $1 = '-9223372036854775808', $2 = '9223372036854775807', $3 = '636'
2021-03-15 11:55:33 UTC:10.66.74.253(60156):alfscoowner@alfscop::LOG: duration: 15269.996 ms execute S_10: select
node.id as id,
store.protocol as protocol,
store.identifier as identifier,
node.uuid as uuid
join alf_node node on (na.node_id = node.id)
join alf_store store on (store.id = node.store_id)
na.node_id >= $1
and na.node_id < $2
and na.qname_id in
( $3 )
The thread dumps show that this is getting called from the RM code trying to retrieve info from the fileplan
Navigating the sites should not slow down due to RM
After upgrading to 6.2 the share navigation slows down
[Analysis to date]
It appears that the query in question should only get called if the storeRef is not cached (https://github.com/Alfresco/governance-services/blob/V3.4.1/rm-community/rm-community-repo/source/java/org/alfresco/module/org_alfresco_module_rm/fileplan/FilePlanServiceImpl.java#L202) not sure if the cache gets full or if it is never caching the value.
The customer is not very confident about this fix since they think that eventually the cache will get cleared again and they’ll run into this issue with the bad query. They have overriden our out of the box query and i have provided their template , they would like to know if it wouldn’t be better to have this query template as well as the cache changes since if the out of the box query runs then they will run into major performance issues again.
Let me know your thoughts. Thank you.
Thank you !!!!!
This allows them to see when the cache misses which happens often but sometimes it does hit the cache, something is clearing it and we don’t know what it is. Also this does not happen often in any of their lower environments, which are also in a clustered environment, so they think this might either be an issue with concurrency, load or size of the repo but we can’t seem to pin point to the root cause.
Some of the values printed in their script
The customer has updated the DB template select_NodesWithAspectIds. Their current template is listed below:
The customer would like to know if this is a suitable replacement as for now it is preventing their CPU on the DB from maxing out. They would also like to understand why the query is being executed when it’s supposed to go to the RM container cache
<!-- RM Container Cache Manager -->
<bean id="rmContainerCache" class="org.alfresco.repo.cache.DefaultSimpleCache" />
They have seen this cache gets missed many time randomly so they would like to know why this cache gets invalidated.