Office documents with EMF images embedded fail metadata extraction

Description

Steps to Reproduce
1. Set org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter to debug
2. Upload attached file to Share

Expected Results
Metadata is successfully extracted.

Observed Results
Metadata extraction fails with the following exception:

Notes

  • Customer has discovered the issue is because our patched tika-parsers-1.21-20190624-alfresco-patched.jar has notkept up with Apache's new poi-scratchpad-4.1.1.jar

  • Also reproduced with ATS AIO 2.3.7

Environment

None

Testcase ID

None

Activity

Show:
Scott Ashcraft
February 19, 2021, 6:12 AM

Note while this also reproduces in ATS, customer is using Legacy and that's where the hotfix is needed.

Alex Strachan
6 days ago

On request from , I've quickly re-tested this locally with 6.2.2.9 (with ATS disabled and legacy transforms turned on).

As far as I can tell, it all looks good - to follow shows the metadata extraction happening following by the thumbnail creation and to text transform for SOLR.

Duplicate

Assignee

Alexandru Epure

Reporter

Scott Ashcraft

Labels

None

ACT Numbers

00359716

Security Issue

None

Patch Attached

None

Premier Customer

None

Prioritization Score

None

Delivery Team

Customer Excellence

Hot Fix Version

Build Location

None

Bug Priority

Category 2

Work Funnel End

None

Escalated By

None

Dependent Version/s

None

Regression Since

None

Affects versions