Preferred GoldFynch File Production Format : GoldFynch Support

When receiving data from other parties, sometimes there is an opportunity to specify or negotiate the desired production format. In these instances, we recommend requesting data in as near-to-native a format as possible, in order to preserve as much of the original file metadata and forensic information as possible. This document further outlines GoldFynch's ideal incoming production format (including load file fields where necessary, in detail,) and may be sent to other parties as a production format specification.

Here is a list in decreasing order of preference for formats of productions that you wish to import:

1. Native load file production

2. Loose collection of native files

3. PDF load file production

4. Loose collection of document-based PDFs

5. TIFF load file production

6. Bulk PDF production

7. Loose collection of TIFF images

8. Paper

If opposing counsel will not produce documents in Native format, then a PDF production is the next best option, and finally a TIFF production. PDFs have much better text render quality. TIFF productions are typically in highly compressed GROUP 4 format, which is intended to be minimal file size for fax machines. These TIFF files can be physically hard to see/read and can give low-quality OCR results.

NOTE: GoldFynch provides additional services including sourcing data directly from your clients. You can learn more about them here.

File naming

Each produced file should be assigned a single Bates number, and named with that single number (including any prefix.)

Attachment and child files

Attachment and child files will be referred to using the parent file’s Bates name
The filename or location of the contained child file will be tracked relative to the parent file
Attachments or children files of a native file should not be produced individually, nor assigned their own Bates numbers

NOTE: This isn't valid in cases of redaction - see below for more on this

Redactions and non-native files

If redactions have been carried out on a file in GoldFynch, this means they will be in a non-native format. Both in this case, as well as in other cases where it is not possible or practical to produce a file in native format, the file should be rasterized or rendered and should be produced as a single PDF file with a searchable text layer.

Any of these generated, non-native PDF files should still be placed in the “NATIVES” folder and have the “NATIVE_PATH” column populated in the load file
The “TRUE_NATIVE” column in the load file should be set to “F” to indicate that the PDF is a derived representation of the true native file
In the case that a file is produced in a non-native PDF format, other files in the family should then be assigned their own Bates numbers and produced in native format. These files should then have the “PARENT_ID” column populated to track the file family hierarchy

Example: Consider an MSG email that has a single ZIP attachment. Ideally, the single MSG file would get assigned a single Bates number, e.g. FILE_0001, renamed to FILE_0001.msg, and the ZIP attachment would not be produced separately. In the case that the MSG file needs redactions, it would be rendered to a PDF file, named FILE_0001.pdf, and the ZIP attachment would then be assigned Bates FILE_0002 and have its “PARENT_ID” column set to “FILE_0001.”

Email

The first preference for email productions is bulk export/archive files like PST or OST for Outlook/Exchange systems, and MBOX for Gmail or other mail services.

One email archive file should be used per user or mailbox
Each archive file should be assigned and named with a single Bates identifier
In the case of such bulk email production, individual emails will not have an assigned identifier or Bates number. In these situations, emails will be tracked and identified using a combination of the container file’s identifier, and the email’s subject & message-id
Attachments will be tracked and identified by MD5 hash value or by name and parent email
The load file “CUSTODIAN” column should be populated with the mailbox owner’s name and email address

If bulk email production in PST or MBOX format is not possible, the next preference is for near-native individual MSG or EML (MIME) files.

Each email should be assigned and named with a single Bates identifier
The load file “CUSTODIAN” column should be populated with the mailbox owner’s name and email address
MSG files or EML files that do not have an “X-Gmail-Label” header set should populate the “MAILBOX_FOLDER” column in the load file with the folder location of the email within the user’s mailbox. (Example: “Inbox/Invoices/2018”)

Paper documents converted into electronic documents

Paper documents should be:

scanned with a resolution of at least 300 PPI
produced as document-level PDF files, with searchable text layers

PDFs generated from scanned documents should:

be placed in the “NATIVES” folder and have the “NATIVE_PATH” column in the load file set
have the “TRUE_NATIVE” column in the load file set to “F” to indicate that the PDF is a derived representation of the original paper document

Load file and additional production formatting

The production should:

consist of native files and generated PDF files
be named according to their assigned Bates numbers
be placed in a folder named “NATIVES,” which may consist of numbered subdirectories

The load file itself should:

be in DAT, CSV, or JSON format and be UTF-8 or UTF-16 encoded
contain a leading Byte Order Mark (BOM) indicating the proper UTF text encoding
reference the native location of files using a path relative to the top folder of the production

NOTE: In the case of JSON, the load file should be structured as an array, with one JSON object / key-value-map per produced file

Load file fields

Refer to the following table for load file fields, descriptions and examples:

Column Name	Description	Example
DOC_ID	The Bates number (with prefix) of the file.	FILE_0001
PARENT_ID	The Bates number of the parent file, in the case that individual files of a family are produced individually due to redactions.	FILE_0001
NATIVE_PATH	The path to the native file, or to the derived/rendered PDF file. It should be relative to the top folder of the production.	NATIVES/0001/FILE_0001.msg
TRUE_NATIVE	Indicates whether the file is truly a native file or is a derived PDF.	T
CUSTODIAN	Description of who/where the file originated.	John Doe
MAILBOX_FOLDER	Mailbox folder for individual email files	Inbox/Invoices/2018
OS	Name of the operating system where the file originated.	Windows 10
FILESYSTEM	Name of the disk filesystem where the file originated.	NTFS
FS_CREATED	Created date & time from the original filesystem.	2017-02-22T16:24:36Z
FS_MODIFIED	Modified date & time from the original filesystem.	2017-02-22T16:24:36Z
FS_ACCESSED	Accessed date & time from the original filesystem.	2017-02-22T16:24:36Z
APPLE_WHEREFROM	“com.apple.metadata:kMDItemWhereFroms” field populated from Apple Finder metadata.	https://dl-web.dropbox.com/get/file.pdf, https://www.dropbox.com/
APPLE_QUARANTINE	“com.apple.quarantine” field populated from Apple Finder metadata.	0001;55555555;Google Chrome;
ORIG_EXT	For redacted files, the extension of the original, native file.	.msg
ORIG_TYPE	For redacted files, the MIME filetype of the original, native file.	application/vnd.ms-outlook
CREATED	For redacted files, the internally-created metadata date from the original, native file.	2017-02-22T16:24:36Z
MODIFIED	For redacted files, the internally-created metadata date from the original, native file.	2017-02-22T16:24:36Z
AUTHOR	For redacted files, the internally-created metadata date from the original, native file.	Jane Doe
SUBJECT	For redacted emails, the subject from the original, native file.	Fwd: Some Subject
FROM	For redacted emails, the “from” field from the original, native file.	Jane Doe <jane@doe.com>
TO	For redacted emails, the “to” field from the original, native file.	John Doe <john@doe.com>; Jane Doe <jane@doe.com>
CC	For redacted emails, the “cc” field from the original, native file.	John Doe <john@doe.com>; Jane Doe <jane@doe.com>
BCC	For redacted emails, the “bcc” field from the original, native file.	John Doe <john@doe.com>; Jane Doe <jane@doe.com>
SENT	For redacted emails, the “date” header or PidTagClientSubmitTime from the original, native file.	2017-02-22T16:24:36Z
RECEIVED	For redacted emails, the latest “received-by date or PidTagMessageDeliveryTime from the original, native file.	2017-02-22T16:24:36Z
MESSAGE-ID	For redacted emails, the “message-id” header or PidTagInternetMessageId from the original, native file.	<email1@sample.com>
REFERENCES	For redacted emails, the “references” header or PidTagInternetReferences from the original, native file.	<email1@sample.com>, <email2@sample.com>
HEADERS	For redacted emails, the entire header section or PidTagTransportMessageHeaders from the original, native file.	Received-By: … From: … …
MSG_CLASS	For redacted MSG files, the PidTagMessageClass from the original, native file.	IPM.Note

GoldFynch Support

How can we help you today?

Preferred File Formatting for Productions to Import into GoldFynch Print

File naming

Attachment and child files

Redactions and non-native files

Email

Other electronic documents and files

Paper documents converted into electronic documents

Paper documents should be:

PDFs generated from scanned documents should:

Load file and additional production formatting

Load file fields

How can we help you today?

Preferred File Formatting for Productions to Import into GoldFynch Print

File naming

Attachment and child files

Redactions and non-native files

Email

Other electronic documents and files

Paper documents converted into electronic documents

Paper documents should be:

PDFs generated from scanned documents should:

Load file and additional production formatting

Load file fields

Related Articles