Seach Services 2.0.1 killed by Solr OOM killer script during indexing (reindex)

Description

DESCRIPTION
During a reindex using 2.0.1, solr starts to be killed by the Solr OOM killer (oom_solr.sh). Restarting solr will show indexing progress for a period until it repeats. Indexing will eventually complete after several restarts

REPRODUCTION

  1. Bootstrap the supplied test database

  2. disable content indexing - content not available

  3. Trigger the index creation

EXPECTED
Indexing completes without incident

OBSERVED

  • Towards the end of the indexing, solr starts being killed, with approx 120k transactions remaining out of 4.5 million

  • There are some 60 or so larger transactions in the remaining list, ranging from 1000 to 59544 nodes.

  • Yourkit snapshots show the problem threads look to be ForkJoinPool worker threads for the cascadeTracker (SolrInformationServer.cascadeUpdateV2)

  • My test memory settings were SOLR_JAVA_MEM="-Xms3g -Xmx6g". In both cases, the total heap flat-lined at close to the limit for several minutes before being killed.

  • The solr logs also show tens of thousands of the following exceptions from the time indexing starts.

Environment

None

Testcase ID

None

Activity

Show:
Alex Mukha
March 10, 2021, 11:18 AM

I will close the ticket, please reopen if further work is required.

Alex Mukha
January 29, 2021, 11:25 AM

Assigned to to validate if increased memory settings are acceptable as a workaround.
It also looks that there is an issue with the design of the solution that requires that many groups.

Angel Borroy
January 22, 2021, 12:40 PM

PR has been generated in order to provide development feature to skip indexing initial transactions for a repository:

Angel Borroy
January 22, 2021, 12:38 PM

Repository has been indexed using following JVM settings for SOLR

Deployment environment is available in

Angel Borroy
January 22, 2021, 10:07 AM
Edited

The node 16848799 is a user associated with 91,353 groups.

Every time the node is modified, all these GROUP parent paths need to be updated.

Fixed

Assignee

Mark Tunmer

Reporter

Mark Tunmer

Labels

Escalated By

None

Security Issue

None

ACT Numbers

01019913

Premier Customer

None

Code Branch

None

Build Location

None

Regression Since

None

Work Funnel End

None

Patch Attached

None

Dependent Version/s

None

Cloud or Enterprise

None

Prioritization Score

None

Delivery Team

Search

Bug Priority

Category 2

Story Points

13

Components

Sprint

None

Fix versions

Affects versions