When receiving data from other parties, sometimes there is an opportunity to specify or negotiate the desired production format. In these instances, we recommend requesting data in as near-to-native a format as possible, in order to preserve as much of the original file metadata and forensic information as possible. This document further outlines GoldFynch's ideal incoming production format (including load file fields where necessary, in detail,) and may be sent to other parties as a production format specification. 

Here is a list in decreasing order of preference for formats of productions that you wish to import:


1. Native load file production

2. Loose collection of native files

3. PDF load file production

4. Loose collection of document-based PDFs

5. TIFF load file production

6. Bulk PDF production

7. Loose collection of TIFF images

8. Paper


If opposing counsel will not produce documents in Native format, then a PDF production is the next best option, and finally a TIFF production. PDFs have much better text render quality. TIFF productions are typically in highly compressed GROUP 4 format, which is intended to be minimal file size for fax machines. These TIFF files can be physically hard to see/read and can give low-quality OCR results. 


NOTE: GoldFynch provides additional services including sourcing data directly from your clients. You can learn more about them here.


File naming

Each produced file should be assigned a single Bates number, and named with that single number (including any prefix.)

Attachment and child files

  • Attachment and child files will be referred to using the parent file’s Bates name
  • The filename or location of the contained child file will be tracked relative to the parent file
  • Attachments or children files of a native file should not be produced individually, nor assigned their own Bates numbers

NOTE: This isn't valid in cases of redaction - see below for more on this

 

Redactions and non-native files

If redactions have been carried out on a file in GoldFynch, this means they will be in a non-native format. Both in this case, as well as in other cases where it is not possible or practical to produce a file in native format, the file should be rasterized or rendered and should be produced as a single PDF file with a searchable text layer.

  • Any of these generated, non-native PDF files should still be placed in the “NATIVES” folder and have the “NATIVE_PATH” column populated in the load file
  • The “TRUE_NATIVE” column in the load file should be set to “F” to indicate that the PDF is a derived representation of the true native file
  •  In the case that a file is produced in a non-native PDF format, other files in the family should then be assigned their own Bates numbers and produced in native format. These files should then have the “PARENT_ID” column populated to track the file family hierarchy


Example: Consider an MSG email that has a single ZIP attachment. Ideally, the single MSG file would get assigned a single Bates number, e.g. FILE_0001, renamed to FILE_0001.msg, and the ZIP attachment would not be produced separately. In the case that the MSG file needs redactions, it would be rendered to a PDF file, named FILE_0001.pdf, and the ZIP attachment would then be assigned Bates FILE_0002 and have its “PARENT_ID” column set to “FILE_0001.”

 

 

Email

The first preference for email productions is bulk export/archive files like PST or OST for Outlook/Exchange systems, and MBOX for Gmail or other mail services.

  • One email archive file should be used per user or mailbox

  •  Each archive file should be assigned and named with a single Bates identifier

  •  In the case of such bulk email production, individual emails will not have an assigned identifier or Bates number. In these situations, emails will be tracked and identified using a combination of the container file’s identifier, and the email’s subject & message-id

  • Attachments will be tracked and identified by MD5 hash value or by name and parent email

  • The load file “CUSTODIAN” column should be populated with the mailbox owner’s name and email address


If bulk email production in PST or MBOX format is not possible, the next preference is for near-native individual MSG or EML (MIME) files.

  •  Each email should be assigned and named with a single Bates identifier

  •  The load file “CUSTODIAN” column should be populated with the mailbox owner’s name and email address

  •  MSG files or EML files that do not have an “X-Gmail-Label” header set should populate the “MAILBOX_FOLDER” column in the load file with the folder location of the email within the user’s mailbox. (Example: “Inbox/Invoices/2018”)


Other electronic documents and files

Other digital documents and files should be produced in native format, as found on the originating filesystem where possible, especially any files that represent: 

  • Audio
  • Video
  • CAD drawings
  • Spreadsheets
  • Documents with tracked changes

Where possible, electronic files should populate:

  • an “OS” column in the load file, indicating the operating system of originating electronic device. (Example: “Windows 10”)
  • a “FILESYSTEM” column in the load file, indicating the hard drive filesystem of originating electronic disk. (Example: “NTFS)
  • the “FS_CREATED”, “FS_MODIFIED”, and “FS_ACCESSED” columns in the load file with ISO 8601 datetime strings, indicating the various timestamps as stored on the originating filesystem
  • the “CUSTODIAN” field in the load file with a description of the owner of the originating electronic device


Additionally, when available, files originating from Apple operating systems should populate:

  • the “APPLE_WHEREFROM” load file column with the file’s “com.apple.metadata:kMDItemWhereFroms” attribute, in a semicolon-delimited list
  • the “APPLE_QUARANTINE” load file column with the file’s “com.apple.quarantine” attribute, in a semicolon-delimited list


Paper documents converted into electronic documents

Paper documents should be:

  • scanned with a resolution of at least 300 PPI
  • produced as document-level PDF files, with searchable text layers

PDFs generated from scanned documents should:

  • be placed in the “NATIVES” folder and have the “NATIVE_PATH” column in the load file set
  • have the “TRUE_NATIVE” column in the load file set to “F” to indicate that the PDF is a derived representation of the original paper document


Load file and additional production formatting

The production should:

  1. consist of native files and generated PDF files 
  2. be named according to their assigned Bates numbers
  3. be placed in a folder named “NATIVES,” which may consist of numbered subdirectories

The load file itself should:

  • be in DAT, CSV, or JSON format and be UTF-8 or UTF-16 encoded
  • contain a leading Byte Order Mark (BOM) indicating the proper UTF text encoding
  • reference the native location of files using a path relative to the top folder of the production

NOTE: In the case of JSON, the load file should be structured as an array, with one JSON object / key-value-map per produced file


Load file fields

Refer to the following table for load file fields, descriptions and examples:


Column Name

Description

Example

DOC_ID

The Bates number (with prefix) of the file.

FILE_0001

PARENT_ID

The Bates number of the parent file, in the case that individual files of a family are produced individually due to redactions.

FILE_0001

NATIVE_PATH

The path to the native file, or to the derived/rendered PDF file. It should be relative to the top folder of the production.

NATIVES/0001/FILE_0001.msg

TRUE_NATIVE

Indicates whether the file is truly a native file or is a derived PDF.

T

CUSTODIAN

Description of who/where the file originated.

John Doe

MAILBOX_FOLDER

Mailbox folder for individual email files 

Inbox/Invoices/2018

OS

Name of the operating system where the file originated.

Windows 10

FILESYSTEM

Name of the disk filesystem where the file originated.

NTFS

FS_CREATED

Created date & time from the original filesystem.

2017-02-22T16:24:36Z

FS_MODIFIED

Modified date & time from the original filesystem.

2017-02-22T16:24:36Z

FS_ACCESSED

Accessed date & time from the original filesystem.

2017-02-22T16:24:36Z

APPLE_WHEREFROM

“com.apple.metadata:kMDItemWhereFroms” field populated from Apple Finder metadata.

https://dl-web.dropbox.com/get/file.pdf, https://www.dropbox.com/

APPLE_QUARANTINE

“com.apple.quarantine” field populated from Apple Finder metadata.

0001;55555555;Google Chrome;

ORIG_EXT

For redacted files, the extension of the original, native file.

.msg

ORIG_TYPE

For redacted files, the MIME filetype of the original, native file.

application/vnd.ms-outlook

CREATED

For redacted files, the internally-created metadata date from the original, native file.

2017-02-22T16:24:36Z

MODIFIED

For redacted files, the internally-created metadata date from the original, native file.

2017-02-22T16:24:36Z

AUTHOR

For redacted files, the internally-created metadata date from the original, native file.

Jane Doe

SUBJECT

For redacted emails, the subject from the original, native file.

Fwd: Some Subject

FROM

For redacted emails, the “from” field from the original, native file.

Jane Doe <jane@doe.com>

TO

For redacted emails, the “to” field from the original, native file.

John Doe <john@doe.com>; Jane Doe <jane@doe.com>

CC

For redacted emails, the “cc” field from the original, native file.

John Doe <john@doe.com>; Jane Doe <jane@doe.com>

BCC

For redacted emails, the “bcc” field from the original, native file.

John Doe <john@doe.com>; Jane Doe <jane@doe.com>

SENT

For redacted emails, the “date” header or PidTagClientSubmitTime from the original, native file.

2017-02-22T16:24:36Z

RECEIVED

For redacted emails, the latest “received-by date or PidTagMessageDeliveryTime from the original, native file.

2017-02-22T16:24:36Z

MESSAGE-ID

For redacted emails, the “message-id” header or PidTagInternetMessageId from the original, native file.

<email1@sample.com>

REFERENCES

For redacted emails, the “references” header or PidTagInternetReferences from the original, native file.

<email1@sample.com>, <email2@sample.com>

HEADERS

For redacted emails, the entire header section or PidTagTransportMessageHeaders from the original, native file.

Received-By: …
From: …

MSG_CLASS

For redacted MSG files, the PidTagMessageClass from the original, native file.

IPM.Note