Transformation of PDF created with ilovepdf continues indefinitely
The attached PDF continues to be transformed until the hard disk runs out of space
Steps to reproduce:
The transformation completes successfully.
The transformation does not complete and the hard disk fills up.
I tested on Windows with the new T-Engine. Customer used Linux and Legacy.
Note (astrachan) - attachments (thread dumps and test PDF) are located in ftp.alfresco.com/support/Jira_Related/MNT-22082 and removed from this ticket.
Confirmed - Fixed.
Tested with docker image built from most recent transform-core master.
Command used: docker run -p 8090:8090 -e PDFBOX_NOTEXTRACTBOOKMARKS_DEFAULT='true' <AIO docker container id>
Note: Bug still reproduces if above flag is not set when the app/docker is deployed.
Exposes a new variable to the Tika and AIO T-engines to control the default behaviour of the notExtractBookmarksText request parameter, similar to the previous repo workaround.
This variable can be set in 1 of 2 ways:
Through the application-default.yamlfile of the T-engine. Update/add the following variable:
Through Environment Variable (this can be passed through to helm/ docker-compose):
docker-compose example snippet (““ quote marks are required here):
The default value for this variable is false so that previous functionality is maintained. i.e. if notExtractBookmarksText is not passed then the transformation will, as it always has, attempt to extract the bookmarks text.
can you suggest potential options and if my help is needed, specific how I can help?
ftp.alfresco.com seems to have been broken for the last couple months. Issue is now with but I don't know current status.
It looks like the content.transformer.PdfBox.extractBookmarksText=false property has been removed in ACS 7.0.0 and I can confirm that there is currently no way to set extractBookmarksText to false by default. I’m currently looking into updating the tika T-engine, to accept such a parameter, which will also update the AIO engine.