(Investigation) Improve the contentStoreCleaner mechanism


The contentStoreCleaner is responsible cleaning up orphaned content from the contentstore. Currently this is done in batches of 1k and in case the database is large (150 mil nodes, 500k or 1 mil orphan nodes), it takes ~ 2.5 - 3 min for each batch to complete. Please view this comment for more details on the environment used for tests.
This means that we need weeks to delete a large amount of data (A/C for ).

Acceptance criteria
Check if there is a way to improve the time taken to purge the orphan nodes that will allow deleting 10 mil documents in a more timely manner. Also it looks like the current batch size is hard coded:

so maybe there will be an improvement if the batch size could be changed.


  • Depending on the change, testing scenarios could include: only fileContentStore, file and s3contentstore, only s3contentstore, s3 with deleted content store, azurecontentstore

Your pinned fields
Click on the next to a field label to start pinning.


Bruno Bossola


Andrei Forascu


Release Train


Delivery Team

Team 5