Skip to content

File Store

Read from a local file system object store bucket.

FieldTypeRequiredDescription
triggertriggerHow often to run the command.
pathpath (string)Select filesystem path as root of store.
object-namesstringNames for objects. If we are listing these are prefixes.
modeModeList-and-download, List or just download.
ignore-linebreaksboolean (bool)Treat object as one event.
timestamp-modeTimestamp ModeDerive a timestamp for this blob for filtering purposes based on the selected strategy.
maximum-agestringRemove any blobs older than this many seconds from the candidate list.
fingerprintingboolean (bool)Enable object fingerprinting, which will cause a object to only be downloaded once.
fingerprinting-db-pathpath (string)Specify a custom path for the fingerprinting database.
maximum-fingerprint-ageduration (string)Remove any object fingerprints older than this from the tracker.
preprocessorsPreprocessorsPreprocessors (process downloaded data before making it available to the job) these processors will be run in the order they are specified.
include-regexstringInclude objects matching the specified regular expressions.
exclude-regexstringExclude objects matching the specified regular expressions.
retryRetryTimeout and Retry.
Object Properties
FieldTypeRequiredDescription
object-name-fieldevent-field (string)The field that a blob name from an operation should be stored in.
creation-time-fieldevent-field (string)The field that the blob creation time should be stored in.
last-modified-fieldevent-field (string)The field that the blob last modified time should be stored in.
content-length-fieldevent-field (string)The field that the blob content length information should be stored in.
content-type-fieldevent-field (string)The field that the blob content type information should be stored in.
etag-fieldevent-field (string)The field that the object ETag should be stored in.
data-fieldevent-field (string)A field that the blob data should be nested in.
FieldTypeRequiredDescription
countintegerHow to retry? Either forever or for a limited number of times.
pausestringHow long to pause before re-trying.
ValueNameDescription
list-and-download-objectslist-and-download-objectsList Objects and Download
list-objectslist-objectsList Objects
download-objectsdownload-objectsDownload Given Objects
ValueNameDescription
nonenoneThe default mode, do not filter based on timestamps
last-modifiedlast-modifiedFilter object on the last-modified timestamp reported by the service
blob-name-patternblob-name-patternFilter blobs on the timestamp derived from the object name for example: relevant-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/
ValueNameDescription
extensionextensionPreprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)
gzipgzipUnGzip the received data
parquetparquetExtract the received data as JSON rows from a parquet file
base64base64Encode the binary data as base64