manta-compute-bin is a collection
of utilities that are on the
$PATH in a compute job.
Each of these utilities aids in proceesing and moving data around within a
compute job. Recall that each phase of a job is expressed in terms of a
Unix command. These utilities are invoked as part of the job
For example, if you had the following as your
grep foo | cut -f 4 | sort | uniq -c
And needed to preserve the
grep foo output, you could use the
to capture that part of the pipeline to a object:
grep foo | mtee ~~/stor/grep_foo.txt | cut -f 4 | sort | uniq -c
The current set of utilities:
maggr- Performs key-wise aggregation on plain text files.
mcat- Emits the named object as an output for the current task.
mpipe- Output pipe for the current task.
msplit- Split the output stream for the current task to many reducers.
mtee- Capture stdin and write to both stdout and a object.
Detailed documentation that can be found by clicking one of the command names above.
Testing in Compute
If you are testing changes or forked this repository, you can upload and run your changes in Compute with something like:
$ make bundle $ mput -f manta-compute-bin.tar.gz ~~/stor/manta-compute-bin.tar.gz $ echo ... | mjob create \ -s ~~/stor/manta-compute-bin.tar.gz \ -m "cd /assets/ && gtar -xzf ~~/stor/manta-compute-bin.tar.gz &&\ cd manta-compute-bin && ./bin/msplit -n 3" \ -r "cat" --count 3
Docs can be found here http://apidocs.joyent.com/manta/