Joyent Manta

Line count by file extension

This job takes an arbitrary number of text files as input and produces a report summarizing the most common file types (based on file extension, like ".txt", or ".js") and how many files and lines were found for each file type. This example runs over the files in the Node.js v0.10.17 source tree.

This example demonstrates using the $MANTA_INPUT_OBJECT environment variable to refer to the name of the object, and uses bash features to extract just the file extension.

Run it yourself

Once you've set up the Manta CLI tools, you can run this job yourself on the publicly accessible dataset using the following command:

$ mfind -t o /manta/public/examples/node-v0.10.17 | grep '[^/]\.[^/]*$' |
    mjob create -n "Line count by file extension" -o \
        -m 'echo "${MANTA_INPUT_OBJECT##*.}" "$(wc -l)"' \
    -r "awk '{ l[\$1] += \$2; f[\$1]++; } \
     END { for (i in l) { printf(\"%10s %4d %8d\n\", i, f[i], l[i]); } }' | \
        sort -rn -k3,3 | head -15"

which outputs:

added 1000 inputs to 61aec44b-dd91-4ca8-c6da-841b7ff7b055
added 1000 inputs to 61aec44b-dd91-4ca8-c6da-841b7ff7b055
added 1000 inputs to 61aec44b-dd91-4ca8-c6da-841b7ff7b055
added 1000 inputs to 61aec44b-dd91-4ca8-c6da-841b7ff7b055
added 1000 inputs to 61aec44b-dd91-4ca8-c6da-841b7ff7b055
added 950 inputs to 61aec44b-dd91-4ca8-c6da-841b7ff7b055
         c 1182   440132
        cc  254   351897
         h  620   216657
        js 1510   203378
         s   65    88721
        pl  164    85431
        py  188    79270
       asm   32    45631
      html  163    43539
       pod  337    38653
      json  194    31637
        md  343    27893
       txt   43    12749
  markdown   44    12569
       com   39    12395

Job body

[
        {
                "exec": "echo \"${MANTA_INPUT_OBJECT##*.}\" \"$(wc -l)\"",
                "type": "map"
        },
        {
                "exec": "awk '{ l[$1] += $2; f[$1]++; } \t END { for (i in l) { printf(\"%10s %4d %8d\\n\", i, f[i], l[i]); } }' | \t    sort -rn -k3,3 | head -15",
                "type": "reduce"
        }
]

Input summary

Showing first 50 input objects
(show) /manta/public/examples/node-v0.10.17/CONTRIBUTING.md
(show) /manta/public/examples/node-v0.10.17/common.gypi
(show) /manta/public/examples/node-v0.10.17/.travis.yml
(show) /manta/public/examples/node-v0.10.17/README.md
(show) /manta/public/examples/node-v0.10.17/node.gyp
(show) /manta/public/examples/node-v0.10.17/vcbuild.bat
(show) /manta/public/examples/node-v0.10.17/benchmark/common.js
(show) /manta/public/examples/node-v0.10.17/benchmark/fs-write-stream-throughput.js
(show) /manta/public/examples/node-v0.10.17/benchmark/http-flamegraph.sh
(show) /manta/public/examples/node-v0.10.17/benchmark/compare.js
(show) /manta/public/examples/node-v0.10.17/benchmark/http.sh
(show) /manta/public/examples/node-v0.10.17/benchmark/http_bench.js
(show) /manta/public/examples/node-v0.10.17/benchmark/http_simple.js
(show) /manta/public/examples/node-v0.10.17/benchmark/http_simple_bench.sh
(show) /manta/public/examples/node-v0.10.17/benchmark/http_simple.rb
(show) /manta/public/examples/node-v0.10.17/benchmark/http_simple_auto.js
(show) /manta/public/examples/node-v0.10.17/benchmark/http_server_lag.js
(show) /manta/public/examples/node-v0.10.17/benchmark/http_simple_cluster.js
(show) /manta/public/examples/node-v0.10.17/benchmark/report-startup-memory.js
(show) /manta/public/examples/node-v0.10.17/benchmark/idle_clients.js
(show) /manta/public/examples/node-v0.10.17/benchmark/plot.R
(show) /manta/public/examples/node-v0.10.17/benchmark/io.c
(show) /manta/public/examples/node-v0.10.17/benchmark/idle_server.js
(show) /manta/public/examples/node-v0.10.17/benchmark/static_http_server.js
(show) /manta/public/examples/node-v0.10.17/benchmark/buffers/buffer-creation.js
(show) /manta/public/examples/node-v0.10.17/benchmark/buffers/dataview-set.js
(show) /manta/public/examples/node-v0.10.17/benchmark/buffers/buffer-base64-encode.js
(show) /manta/public/examples/node-v0.10.17/benchmark/buffers/buffer-read.js
(show) /manta/public/examples/node-v0.10.17/benchmark/buffers/buffer-write.js
(show) /manta/public/examples/node-v0.10.17/lib/_debugger.js
(show) /manta/public/examples/node-v0.10.17/lib/_linklist.js
(show) /manta/public/examples/node-v0.10.17/lib/_stream_transform.js
(show) /manta/public/examples/node-v0.10.17/lib/_stream_readable.js
(show) /manta/public/examples/node-v0.10.17/lib/_stream_passthrough.js
(show) /manta/public/examples/node-v0.10.17/lib/_stream_duplex.js
(show) /manta/public/examples/node-v0.10.17/lib/_stream_writable.js
(show) /manta/public/examples/node-v0.10.17/lib/assert.js
(show) /manta/public/examples/node-v0.10.17/lib/child_process.js
(show) /manta/public/examples/node-v0.10.17/lib/console.js
(show) /manta/public/examples/node-v0.10.17/lib/buffer.js
(show) /manta/public/examples/node-v0.10.17/lib/cluster.js
(show) /manta/public/examples/node-v0.10.17/lib/constants.js
(show) /manta/public/examples/node-v0.10.17/lib/dgram.js
(show) /manta/public/examples/node-v0.10.17/lib/crypto.js
(show) /manta/public/examples/node-v0.10.17/lib/events.js
(show) /manta/public/examples/node-v0.10.17/lib/domain.js
(show) /manta/public/examples/node-v0.10.17/lib/dns.js
(show) /manta/public/examples/node-v0.10.17/lib/freelist.js
(show) /manta/public/examples/node-v0.10.17/lib/http.js
(show) /manta/public/examples/node-v0.10.17/lib/https.js

Output summary

1 total outputs
(show) /manta/jobs/61aec44b-dd91-4ca8-c6da-841b7ff7b055/stor/reduce.1.7b9b59de-f207-44ed-8634-4e27aac6b4fc

Error summary

0 total errors