(Investigation) Improve the contentStoreCleaner mechanism

Description

The contentStoreCleaner is responsible cleaning up orphaned content from the contentstore. Currently this is done in batches of 1k and in case the database is large (150 mil nodes, 500k or 1 mil orphan nodes), it takes ~ 2.5 - 3 min for each batch to complete. Please view this comment for more details on the environment used for tests.
This means that we need weeks to delete a large amount of data (A/C for ).

Acceptance criteria
Check if there is a way to improve the time taken to purge the orphan nodes that will allow deleting 10 mil documents in a more timely manner. Also it looks like the current batch size is hard coded:

so maybe there will be an improvement if the batch size could be changed.

Notes:

  • Depending on the change, testing scenarios could include: only fileContentStore, file and s3contentstore, only s3contentstore, s3 with deleted content store, azurecontentstore

Assignee

Unassigned

Reporter

Andrei Forascu

Labels

Release Train

None

Delivery Team

Team 7

Components

Sprint

Team 7 - Backlog

Fix versions

Priority

Unprioritized