Job inputs
From a high level view, these are the ways to get data into the system:
- files - read files from a directory, as they appear. Useful for consuming logs and the output of other programs. [S]
- echo - specify events for a job directly [S}]
- exec - execute a command in the current shell and capture its output. [S]
- object stores: s3, azure, gcp, filesystem. The unifying idea is remote objects are contained in a bucket, and one can match object names. Objects can be uncompressed and Parquet files can be streamed as JSON events. [S]
- http-poll - perform an HTTP request. Specify headers, query parameters and body. [S]
- http-server - start listening on a port and accept incoming HTTP requests.
- internal-messages - listen to Edge IQ messages.
- window-event-log - (Windows Only) listen to system events.
- worker-channel - listen to events generated by other jobs.
The inputs marked with [S] are scheduled, that is they are triggered by messages or a regular time interval. These all have a Trigger
option.
The other inputs wait for data as it becomes available, so for instance files
waits for new files to be created.
Controlling Data Format
Section titled “Controlling Data Format”Most inputs have shared concepts of Ignore Linebreaks
and JSON
.
You set JSON
if the data is known to be JSON, otherwise it is arbitrary text that must be quoted - ’{“_raw”: “arbitrary text”}’
Normally, we process events on a line by line basis, but setting Ignore Linebreaks
will grab all of the output as a single event. For instance, some HTTP APIs return JSON documents with linefeeds, so both of these options need to be enabled.
The set of events that arrive all in a ‘chunk’ (as a result of a executable single run or a HTTP request) is called a document; can ask to batch these things together [[Output Batching]]
exec
and http-poll
have also got Batch
, which has a different function to the output Batch
. It allows you to label the events of a document. If enabled then by default it adds line-num
and line-count
to each event.
Retrying
Section titled “Retrying”Many operations (input, output or action?) can fail semi-randomly but can be successfully retried. The best example is when doing a HTTP request. Retrying involves specifying two things (a) number of times to retry? and (b) an optional pause between retries.
After each retry, the pause is doubled, up to a maximum of 15 seconds.