mcat - emit objects by reference


mcat FILE ...


mcat emits the contents of a Manta object as an output of the current task, but without actually fetching the data. For example:

mcat ~~/stor/scores.csv

emits the object ~~/stor/scores.csv as an input to the next phase (or as a final job output), but without actually downloading it as part of the current phase.

The shortcut ~~ is equivalent to /:login where :login is the account login name.

As with mpipe, when you use mcat, the task's stdout will not be captured and saved as it is by default.

mcat is particularly useful when you tend to run many jobs on the same large set of input objects. You can store the set of objects in a separate "manifest" object and have the first phase of your job process that with "mcat". So instead of this:

 $ mfind ~~/public | mjob create -m wc

which may take a long time if mfind returns a lot of objects, you could do this once:

mfind ~~/public > /var/tmp/inputs
$ mput -f /var/tmp/inputs ~~/public/inputs

And then for subsequent jobs, just do this:

echo ~~/public/inputs | mjob create -m "xargs mcat" -m wc)

This is much quicker to kick off, since you're just uploading one object name. The first phase invokes "mcat" on lines from ~~/public/inputs. Each of these lines is treated as a Manta path, and the corresponding object becomes an input to the second phase.

The object path is not resolved until it's processed for the next phase. So if you specify an object that does not exist, this will produce a ResourceNotFoundError for the phase after the mcat. Similarly, if you specify an object that you don't have access to, you'll get an error in the next phase when you try to use it.


mcat ~~/stor/scores.csv


-r Specify the reducer the Manta object should be directed to.


Report bugs at Github