Active Shards not displayed in ACS and Search queries fails due to "No available shards for solr query of store workspace://SpacesStore - trying non-dynamic configuration"
Customer is in the process of upgrading to ACS 126.96.36.199 and Search & Insight Engine 188.8.131.52. As part of the upgrade they are also configuring DB_ID_RANGE dynamic sharding for Search and indexing. However in their UAT environment where they have multiple ACS nodes and multiple Shards (4 ACS and 4 Shards), the active shards are not displayed in the ACS admin console, plus the search queries are all failing due to "No available shards for solr query of store workspace://SpacesStore - trying non-dynamic configuration" error.
Actions/Investigation So Far
1) As per the fixes made in MNT-21591, the relevant properties have been defined in customer environment in ACS and Search Services config files. Though the configuration looks correct the issue still persists
search.solrShardRegistry.shardInstanceTimeoutInSeconds=30 (in alfresco-global)
alfresco.nodestate.tracker.cron=0/10 * * * * ? * (in solrcore.properties)
2) Tried doing PURGE and CLEAN multiple times in the ACS Sharding page, however the active shards are not displayed and the search queries fail
3) Added debug logging in ACS and verified the logs. ShardRegistry logs are captured and it says the shards are registered, however the info is not displayed in the console
2021-03-23 10:55:01,763 DEBUG [index.shard.ShardRegistry] [hz._hzInstance_2_MainRepository-39c1406a-e4d7-11dc-9f9d-6dbfb43ec547.event-8] **** INDEXING SHARDS SUBSCRIPTIONS MAP (RESTORED FROM PERSISTENCE STORAGE) ******* Core: shards0 **Shard #0 subscribed on Mon Mar 22 11:59:26 AEDT 2021 (timestamp = 1616374766592)*************************** Core: shards3 **Shard #3 subscribed on Mon Mar 22 11:59:26 AEDT 2021 (timestamp = 1616374766735)************************
4) Shards on its own looks to be fine as the metadata and index are indexed fine and no errors or warning in Search Services layer. Search queries work when run directly against the index.
5) As part of the troubleshooting we suggested customer to try Search Services 1.4.3 to see if the issue can be resolved. Using 1.4.3, we noticed the shards are visible in ACS admin console but the search queries still fail with the same message mentioned above (captured in the catalina logs). Each shard instance has its own ACS tracker instance in which solr.host is set to localhost and in this case the searches made from the ACS tracker instance node browser is working fine but the searches made in the main ACS/Alfresco Share is failing.
6) When ACS and Search & Insight Engine are in the same server/host, the issue is not seen. It's only when ACS and Search & Insight Engine are in different hosts the issue is noticed. Customer does not have any firewall rules between different hosts and all the ACS and Search Services are in the same subnet. We also checked the access b/w the different servers and it appears to be fine.
7) I tried to reproduce the issue in my local instance using a Docker project that spun up multiple ACS instances and multiple shards using the same versions used by the customer (ACS 184.108.40.206 & Search Services 220.127.116.11, Search Services 1.4.3) but the issue cannot be reproduced in my end. However, once ACS starts up Shard information is not displayed straightaway. Had to do a PURGE and then info is displayed and searches were also working fine. Not sure if there is any difference b/w Docker deployments and Distribution zip deployments in terms of sharding?
This issue appears to be the same/similar to the one already reported here - https://alfresco.atlassian.net/browse/MNT-21591 which is fixed in version ACS 18.104.22.168. Could this be a regression or all the changes are already ported to all latest versions of ACS?
Any other additional logging/debugging that can be done to find the root cause of the issue?
Would be great to get some advise from Engineering to determine the solution for this issue.
NOTE: All the config files and logs are attached in the JIRA
Search & Insight Engine 22.214.171.124
Database - MS SQL Server
Thank you so much for all your efforts in delivering this hot fix. Much appreciated.
Hot fixes are cumulative, so each 6.2.2.x HF includes all fixes already released for 6.2.2.x.
Thanks for the update. Just to confirm, customer is on 126.96.36.199 so when we get the hot fix for 6.2.2.x i presume the fix on 188.8.131.52 will be included in the release as well? (184.108.40.206 is a hot fix version as well).
Thought so, Thanks for the confirmation.
Sorry I haven’t been clear, I tried it locally with an image built locally, not a release. The fix will be in the new hotfix version. Thanks.