Search Services -Error PDF Type1 Font

Description

Certain PDF files fail to index with error:

2021-02-10 22:16:57,425 ERROR [org.apache.pdfbox.pdmodel.font.PDType1Font] [pool-4-thread-2] Can't read the embedded Type1 font FDFBJU+NewsGothic
java.io.IOException: Expected INTEGER or REAL but got NAME
at org.apache.fontbox.type1.Type1Parser.arrayToNumbers(Type1Parser.java:256)
at org.apache.fontbox.type1.Type1Parser.readSimpleValue(Type1Parser.java:168)
at org.apache.fontbox.type1.Type1Parser.parseASCII(Type1Parser.java:139)
at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:61)

Steps to reproduce:

1 Install and configure ACS 6.2.2 and Search Services 2.0.1
2 Upload the sample pdf document to Share

Observed Behaviour:

File is indexed and we find the below error in the log:
Only filename is searchable, however content inside the file is not searchable.

2021-02-10 22:16:57,425 ERROR [org.apache.pdfbox.pdmodel.font.PDType1Font] [pool-4-thread-2] Can't read the embedded Type1 font FDFBJU+NewsGothic
java.io.IOException: Expected INTEGER or REAL but got NAME
at org.apache.fontbox.type1.Type1Parser.arrayToNumbers(Type1Parser.java:256)
at org.apache.fontbox.type1.Type1Parser.readSimpleValue(Type1Parser.java:168)
at org.apache.fontbox.type1.Type1Parser.parseASCII(Type1Parser.java:139)
at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:61)
at org.apache.fontbox.type1.Type1Font.createWithSegments(Type1Font.java:85)
at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:262)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:875)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:509)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:483)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:153)
at org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:835)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:124)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
at org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter.extractRaw(TikaPoweredMetadataExtracter.java:399)
at org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter$ExtractRawCallable.call(AbstractMappingMetadataExtracter.java:2005)
at org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter$ExtractRawCallable.call(AbstractMappingMetadataExtracter.java:1)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Expected Behaviour:

The content is indexed and searchable without any error.

Environment

None

Testcase ID

None

Activity

Show:
Shilpa Tupe
April 9, 2021, 6:10 AM

Thank you. It is a duplicate of MNT-22194.

Flagged
Duplicate
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Unassigned

Reporter

Shilpa Tupe

Labels

ACT Numbers

00335598

Delivery Team

Team 6

Bug Priority

Category 2