AGS DB query runs multiple times when invoking the documentlibrary from share

Description

[Description]

A customer has upgraded from ACS 5.2.7 to ACS 6.2.2.7 with RM 3.4.1, previously the customer used RM 3.3.1. After the upgrade the customer is seeing a DB query run multiple times, users report that navigating a site's document library and the repository library is slower than when they had 5.2.7 deployed.

[Steps to reproduce]

This issue is only visible in the customer's PROD instance. Their DBA has reported that the query below get executed multiple times maxing out the DB CPU.

2021-03-15 11:55:33 UTC:10.66.74.253(53624):alfscoowner@alfscop:[31009]:DETAIL: parameters: $1 = '-9223372036854775808', $2 = '9223372036854775807', $3 = '636'
2021-03-15 11:55:33 UTC:10.66.74.253(60156):alfscoowner@alfscop:[23995]:LOG: duration: 15269.996 ms execute S_10: select
node.id as id,
store.protocol as protocol,
store.identifier as identifier,
node.uuid as uuid
from
alf_node_aspects na
join alf_node node on (na.node_id = node.id)
join alf_store store on (store.id = node.store_id)
where
na.node_id >= $1
and na.node_id < $2
and na.qname_id in
( $3 )

The thread dumps show that this is getting called from the RM code trying to retrieve info from the fileplan

at org.alfresco.repo.domain.node.ibatis.NodeDAOImpl.selectNodesWithAspects(NodeDAOImpl.java:799)
at org.alfresco.repo.domain.node.AbstractNodeDAOImpl.getNodesWithAspects(AbstractNodeDAOImpl.java:2816)
at org.alfresco.module.org_alfresco_module_rm.fileplan.FilePlanServiceImpl.getFilePlans(FilePlanServiceImpl.java:204)
at org.alfresco.module.org_alfresco_module_rm.fileplan.FilePlanServiceImpl.getFilePlans(FilePlanServiceImpl.java:170)

[Expected Behaviour]

Navigating the sites should not slow down due to RM

[Observed Behaviour]

After upgrading to 6.2 the share navigation slows down

[Analysis to date]

It appears that the query in question should only get called if the storeRef is not cached (https://github.com/Alfresco/governance-services/blob/V3.4.1/rm-community/rm-community-repo/source/java/org/alfresco/module/org_alfresco_module_rm/fileplan/FilePlanServiceImpl.java#L202) not sure if the cache gets full or if it is never caching the value.

Environment

None

Testcase ID

None

Activity

Show:
Jose Portillo
April 5, 2021, 4:50 PM

Hi ,

The customer is not very confident about this fix since they think that eventually the cache will get cleared again and they’ll run into this issue with the bad query. They have overriden our out of the box query and i have provided their template , they would like to know if it wouldn’t be better to have this query template as well as the cache changes since if the out of the box query runs then they will run into major performance issues again.

 

Let me know your thoughts. Thank you.

CC:

Cassandra Panayiotou
April 2, 2021, 1:01 PM

Thank you !!!!!

Jose Portillo
March 30, 2021, 1:13 PM

Hello ,

 

What the customer is seeing is that the query runs multiple times in a matter of minutes, more times than the number of cluster nodes they have. This is why they created the javascript script provided to check how ofter the cache misses and it is very frequent even when we execute it in the same node. We don’t know why this cache gets reset.

Jose Portillo
March 23, 2021, 1:15 PM
Edited

, the customer is using a clustered environment of about 6 nodes. They also have deployed the javascript console and they are using it to keep track of the RM container cache

This allows them to see when the cache misses which happens often but sometimes it does hit the cache, something is clearing it and we don’t know what it is. Also this does not happen often in any of their lower environments, which are also in a clustered environment, so they think this might either be an issue with concurrency, load or size of the repo but we can’t seem to pin point to the root cause.

 

Some of the values printed in their script

ttl=0
maxItems=2147483647
maxIdleSecs=0
useMax=false

Jose Portillo
March 16, 2021, 2:01 PM

The customer has updated the DB template select_NodesWithAspectIds. Their current template is listed below:

The customer would like to know if this is a suitable replacement as for now it is preventing their CPU on the DB from maxing out. They would also like to understand why the query is being executed when it’s supposed to go to the RM container cache

 

  <!-- RM Container Cache Manager -->

   <bean id="rmContainerCache" class="org.alfresco.repo.cache.DefaultSimpleCache" />

They have seen this cache gets missed many time randomly so they would like to know why this cache gets invalidated.

Fixed

Assignee

Tiago Salvado

Reporter

Jose Portillo

Hot Fix Version

AGS 3.4.1.2

ACT Numbers

00371254

Premier Customer

Yes

Delivery Team

Customer Excellence

Bug Priority

Category 1