Skip to content

S3

Stream data from a S3 Object.

FieldTypeRequiredDescription
triggertriggerHow often to run the command.
modeModeThe operating mode for this input.
ignore-linebreaksboolean (bool)Treat object as one event.
preprocessorsPreprocessorsPreprocessors (process downloaded data before making it available to the job) these processors will be run in the order they are specified.
timestamp-modeTimestamp ModeDerive a timestamp for this object for filtering purposes based on the selected strategy.
maximum-agestringRemove any objects older than this many seconds from the candidate list.
fingerprintingboolean (bool)Enable object fingerprinting, which will cause an object to only be downloaded once.
maximum-fingerprint-ageduration (string)Remove any object fingerprints older than this from the tracker.
include-regexstringInclude objects matching the specified regular expressions.
exclude-regexstringExclude objects matching the specified regular expressions.
retryRetryTimeout and Retry.
Authentication
FieldTypeRequiredDescription
access-keystringAccess Key ID.
secret-keystringSecret Key ID.
security-tokenstringSecurity Token.
session-tokenstringSession Token.
role-arnstringA Role ARN for assuming role using above credentials.
Location
FieldTypeRequiredDescription
bucket-namestringThe storage service container for created objects.
object-namesstringThe object names. When using list modes these are treated as search prefixes.
regionstringS3 Region.
endpointstringS3 Endpoint.
Object Properties
FieldTypeRequiredDescription
object-name-fieldevent-field (string)The field that the object name from an operation should be stored in.
creation-time-fieldevent-field (string)The field that the object creation time should be stored in.
last-modified-fieldevent-field (string)The field that the object last modified time should be stored in.
content-length-fieldevent-field (string)The field that the object content length information should be stored in.
content-type-fieldevent-field (string)The field that the object content type information should be stored in.
etag-fieldevent-field (string)The field that the object ETag should be stored in.
data-fieldevent-field (string)A field that the object data should be nested in.
FieldTypeRequiredDescription
countintegerHow to retry? Either forever or for a limited number of times.
pausestringHow long to pause before re-trying.
ValueNameDescription
list-and-download-objectslist-and-download-objectsList Objects and Download
list-objectslist-objectsList Objects
download-objectsdownload-objectsDownload Given Objects
ValueNameDescription
extensionextensionPreprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)
gzipgzipUnGzip the received data
parquetparquetExtract the received data as JSON rows from a parquet file
base64base64Encode the binary data as base64
ValueNameDescription
nonenoneThe default mode, do not filter based on timestamps
last-modifiedlast-modifiedFilter object on the last-modified timestamp reported by the service
blob-name-patternblob-name-patternFilter blobs on the timestamp derived from the object name for example: relevant-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/