summaryrefslogtreecommitdiff
path: root/system/splitjob/README
blob: 985797496be8a6e81e4f4474f1a662b231c2a658 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
This program is used to split up data from stdin in blocks which are
sent as input to parallel invocations of commands. The output from
those are then concatenated in the right order and sent to stdout.

Splitting up and parallelizing jobs like this might be useful to speed
up compression using multiple CPU cores or even multiple computers.

For this approach to be useful, the compressed format needs to allow
multiple compressed files to be concatenated. This is the case for
gzip, bzip2, lzip and xz.

Example 1, use multiple logical cores:
splitjob -j 4 bzip2 < bigfile > bigfile.bz2

Example 2, use remote machines:
splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz

The above example assumes that ssh is configured to allow logins
without asking for password. See the manpage for ssh-keygen or do
a google search for examples on how to accomplish this.

Example 3, Use bigger blocks to reduce overhead:
splitjob -j 2 -b 10M gzip < file > file.gz

For "xz -9" a block size of 384 MB gives best compression.

Example 4, parallel decompression:
splitjob -X -r 10 -j 10 -b 384M "xz -d -" < file.xz > file