Uploading an MSG document creates an unnecessary/duplicate version2store bin fie using disk space

Description

When uploading an email (msg file) with attachment, alfresco usually creates a bin file relating to each workspace. these bin files relate to each node i.e. messag ebody node, attachment node, rendition node.
usually these nodes are all worspace nodes, but we found that there was a node with the same size as the attachment created in alf_data/Contentstore. for example in the below contentstore, you see two bin files with the same size of 178176:
~rw-r---- 1 alfresco alfresco 327 Jan 11 15:15 66c7f741-a91f-4dfd-9d7d-d72606741837.bin
rw-r---- 1 alfresco alfresco 64211 Jan 11 15:15 993e1952-18f3-477b-baf6-4f8eb914b0c7.bin
rw-r---- 1 alfresco alfresco 178176 Jan 11 15:15 e0743265-aaa8-4a5c-8f7f-43f6a97c7399.bin*
rw-r---- 1 alfresco alfresco 178176 Jan 11 15:15 f17593f1-64e9-460b-8497-5a47ef096988.bin
rw-r---- 1 alfresco alfresco 10089 Jan 11 15:15 f376638b-7c9f-4fc0-b67a-ee1ce2e9318d.bin
rw-r---- 1 alfresco alfresco 169598 Jan 11 15:15 faa28850-6886-46fc-b2e0-d848852e2f3b.bin~

Looking into this, we found that the Started bin file relates to a node in version store.
We used this query to find the relevant node for the started bin file:

~Select s.identifier,n.uuid,cu.content_url,cu.content_size,* from alf_node as n
join alf_node_properties as np on np.node_id=n.id
join alf_content_data as cd on cd.id=np.long_value
join alf_content_url as cu on cu.id=cd.content_url_id
join alf_store as s on s.id=n.store_id
where cu.content_url like '%e0743265-aaa8-4a5c-8f7f-43f6a97c7399.bin%'
~

Customer has reported this as an issue as it's using unnecessary disk space.

We believe this is an issue, because when the outlook integration plug in is used to save the file in alfresco from emails, this version2store node is not showing in the content store.
Both normal upload and upload via the OI, are using the same transformers as we have enabled the option to do this in share> outlook integration settings
_~Automatically convert emails (EML, MSG) uploaded using Share, CIFS, WebDAV, FTP, NFS: Enabled~_

Replication steps:

1)Install ACS 6.2.2 with OI 2.7.0

2) in share > admin tool > outlook >integration settings> set the above setting so msg uploads to share are atomically converted

3) upload attached mg file using the normal upload button in share and open to view in share ( see image1)

4)go to alf_data> contentstore >navigate throught the time tree and find all bin files relevant to this upload like I listed in summary section

Observed behavior:

You see two bin files with the same size using up disk space

Expected behavior:

you should not see any duplicate bin files

upload via outlook integration is working as expected - the preview of the uploaded file looks exactly as image1 above.

investigation notes:

running the above db query, for each of the bin files with the same size, we found the a noce from version2Store is showing for this bin file: 0743265-aaa8-4a5c-8f7f-43f6a97c7399.bin

 

found the node in nodebrowser using uuid:

Environment

ACS 6.2.2
Outlook integration 2.7.0

Testcase ID

None

Activity

Show:
Scott Ashcraft
January 13, 2021, 2:37 PM

It is odd that a version is being created but there is no version listed anywhere in Share. Believe this is being done by the Outlook integration.

Assignee

Unassigned

Reporter

Shima Matoorian

Labels

None

Escalated By

None

Security Issue

None

ACT Numbers

01019941

Premier Customer

None

Code Branch

None

Build Location

None

Regression Since

None

Work Funnel End

None

Patch Attached

None

Dependent Version/s

None

Cloud or Enterprise

None

Prioritization Score

None

Delivery Team

None

Bug Priority

Category 2

Components

Fix versions

Affects versions