propTablesCleanupJobDetail v2 can cause Out of Memory errors (CleanAlfPropTablesV2.sql )

Description

When running the alternative method for scheduling cleanup of database tables on larger databases, we can trigger OoM (out of memory) errors.

Steps to reproduce:

1. Setup ACS to use MySQL (this has not been tested on other RDBMS)
2. Create a large amount of audit data in secondary/ referencer table alf_prop_link (200 million rows)
3. Run propTablesCleanupJobDetail (OOTB in ACS 6.2.1+ this will be the v2 algorithm)

Observed Behaviour:

The system hits OoM when processing alf_prop_link

Expected Behaviour:

Process secondary references in batches or another method that does not trigger an OoM error

Additional Information:

This analysis was performed on a environment where the issue is present

Logging from catalina.out

  • With the trace set as follows:

    We see the following just before the OoM error:

Captured YourKit telemetry

  • We see memory utilisation fluctuating as you would expect with GC before the OoM issue

  • As we approach the OoM condition, memory consumption increases continually

  • Thread dump samples from the period of continual memory increase show the following call is continually reading data from the database:

Captured p6spy SQL telemetry:

  • This shows the last statement sent was:

    A few minutes later the OoM error killed the JVM

  • The p6spy log does not contain a result set for the above request, which means that the complete set of results was not received. This select statement returns a very large set of data as per the below.

SQL Database:

  • From the query gathered from p6spy we can see the following with regards to the database:
    The number of rows in this table are over 200 million:

    The size of the alf_prop_link table is around 40GB:

Environment

None

Testcase ID

None

Activity

Show:
Denis Ungureanu
5 days ago
Edited

Hi , we’ve ran the propTablesCleaner job on a Postgres database with over 200M rows (per table) on multiple tables, including alf_prop_link (). Besides the fact that it took ~9.5 hours to complete, the job works fine.

Denis Ungureanu
14 minutes ago

Hi - the JVM options I used on the Alfresco container -Xms2g -Xmx4g

Assignee

Unassigned

Reporter

Sandeep Reehall

Labels

None

ACT Numbers

00340210

Security Issue

None

Patch Attached

None

Premier Customer

None

Prioritization Score

None

Delivery Team

None

Build Location

None

Cloud or Enterprise

None

Bug Priority

Category 2

Work Funnel End

None

Escalated By

None

Dependent Version/s

None

Regression Since

None

Code Branch

None

Components

Fix versions

Affects versions