Active Shards not displayed in ACS and Search queries fails due to "No available shards for solr query of store workspace://SpacesStore - trying non-dynamic configuration"

Description

Summary

Customer is in the process of upgrading to ACS 6.2.2.2 and Search & Insight Engine 1.4.2.1. As part of the upgrade they are also configuring DB_ID_RANGE dynamic sharding for Search and indexing. However in their UAT environment where they have multiple ACS nodes and multiple Shards (4 ACS and 4 Shards), the active shards are not displayed in the ACS admin console, plus the search queries are all failing due to "No available shards for solr query of store workspace://SpacesStore - trying non-dynamic configuration" error.

Actions/Investigation So Far

1) As per the fixes made in MNT-21591, the relevant properties have been defined in customer environment in ACS and Search Services config files. Though the configuration looks correct the issue still persists

search.solrShardRegistry.dbidRangeRefreshTimeoutInSeconds=30
search.solrShardRegistry.shardInstanceTimeoutInSeconds=30 (in alfresco-global)

alfresco.nodestate.tracker.cron=0/10 * * * * ? * (in solrcore.properties)

2) Tried doing PURGE and CLEAN multiple times in the ACS Sharding page, however the active shards are not displayed and the search queries fail

3) Added debug logging in ACS and verified the logs. ShardRegistry logs are captured and it says the shards are registered, however the info is not displayed in the console

2021-03-23 10:55:01,763 DEBUG [index.shard.ShardRegistry] [hz._hzInstance_2_MainRepository-39c1406a-e4d7-11dc-9f9d-6dbfb43ec547.event-8] **** INDEXING SHARDS SUBSCRIPTIONS MAP (RESTORED FROM PERSISTENCE STORAGE) ******* Core: shards0 **Shard #0 subscribed on Mon Mar 22 11:59:26 AEDT 2021 (timestamp = 1616374766592)*************************** Core: shards3 **Shard #3 subscribed on Mon Mar 22 11:59:26 AEDT 2021 (timestamp = 1616374766735)************************

4) Shards on its own looks to be fine as the metadata and index are indexed fine and no errors or warning in Search Services layer. Search queries work when run directly against the index.

5) As part of the troubleshooting we suggested customer to try Search Services 1.4.3 to see if the issue can be resolved. Using 1.4.3, we noticed the shards are visible in ACS admin console but the search queries still fail with the same message mentioned above (captured in the catalina logs). Each shard instance has its own ACS tracker instance in which solr.host is set to localhost and in this case the searches made from the ACS tracker instance node browser is working fine but the searches made in the main ACS/Alfresco Share is failing.

6) When ACS and Search & Insight Engine are in the same server/host, the issue is not seen. It's only when ACS and Search & Insight Engine are in different hosts the issue is noticed. Customer does not have any firewall rules between different hosts and all the ACS and Search Services are in the same subnet. We also checked the access b/w the different servers and it appears to be fine.

7) I tried to reproduce the issue in my local instance using a Docker project that spun up multiple ACS instances and multiple shards using the same versions used by the customer (ACS 6.2.2.2 & Search Services 1.4.2.1, Search Services 1.4.3) but the issue cannot be reproduced in my end. However, once ACS starts up Shard information is not displayed straightaway. Had to do a PURGE and then info is displayed and searches were also working fine. Not sure if there is any difference b/w Docker deployments and Distribution zip deployments in terms of sharding?

Questions

  • This issue appears to be the same/similar to the one already reported here - https://alfresco.atlassian.net/browse/MNT-21591 which is fixed in version ACS 6.2.1.1. Could this be a regression or all the changes are already ported to all latest versions of ACS?

  • Any other additional logging/debugging that can be done to find the root cause of the issue?

Would be great to get some advise from Engineering to determine the solution for this issue.

NOTE: All the config files and logs are attached in the JIRA

Environment

ACS 6.2.2.2
Search & Insight Engine 1.4.2.1
Database - MS SQL Server

Testcase ID

None

Activity

Show:
Karthick Mani
2 days ago

Thank you so much for all your efforts in delivering this hot fix. Much appreciated.

Martin Stanford
2 days ago

Hot fixes are cumulative, so each 6.2.2.x HF includes all fixes already released for 6.2.2.x.

Karthick Mani
3 days ago

Thanks for the update. Just to confirm, customer is on 6.2.2.2 so when we get the hot fix for 6.2.2.x i presume the fix on 6.2.2.2 will be included in the release as well? (6.2.2.2 is a hot fix version as well).

Karthick Mani
5 days ago

Thought so, Thanks for the confirmation.

Davide Cerbo
5 days ago

Sorry I haven’t been clear, I tried it locally with an image built locally, not a release. The fix will be in the new hotfix version. Thanks.

Fixed

Assignee

Keerat Lalia

Reporter

Karthick Mani

Labels

Security Issue

None

Escalated By

None

Hot Fix Version

ACT Numbers

00372166

Build Location

None

Regression Since

None

Premier Customer

Yes

Work Funnel End

None

Patch Attached

None

Dependent Version/s

None

Prioritization Score

None

Delivery Team

None

Bug Priority

Category 1

Story Points

5

Sprint

None

Fix versions