Bad. For you, UNIX means GNU/Linux and GNU in particular. There is no stdbuf on UNIX, nor does traditional xargs support -P. The rest are GNU tools, which aren't part of a real UNIX consolidation.
GNU stands for "GNU is not UNIX".
Sit down, you get an F until you learn the subject matter.
One can install these tools on any unix system, and the general concepts still apply
What is this ADD command you use?
--pipe is slow, especially when you keep the blocksize at (default) 1M.
Try this instead (requires version 20161222):
$ time parallel -a 5gb_file --pipepart --block -1 wc -l
2554914
2517740
2539712
2515308
2556364
2524006
2487755
2554523
real 0m0.971s
user 0m1.272s
sys 0m3.404s
Compared to:
$ time parallel --pipe wc -l >/dev/null < 5gb_file
real 0m25.342s
user 0m23.440s
sys 0m30.500s
And:
$ time wc -l < 5gb_file
20250322
real 0m1.356s
user 0m0.596s
sys 0m0.752s
Excellent, thanks Ole.
I'll update the article after testing out the --pipepart option
#!/bin/zsh
# for 4 CPU
setopt extendedglob
add() { paste -d+ -s | bc; }
para() {
lock=/tmp/para_$$_$((paracnt++))
# sleep as long as the 4th lock file exists
until [[ -z /tmp/para_$$_*(#q[4]N) ]] { sleep 0.1 }
# Launch the job in a subshell
( touch $lock ; eval $* ; rm $lock ) &
# Wait for subshell start and lock creation
until [[ -f $lock ]] { sleep 0.001 }
}
for n in 0 1 2 3 ; { para "wc -l l.0$n | cut -f1 -d\ " } | add
@Klak, Interesting shell only solution, thanks.
The sleeps wouldn't be too bad for long running sub commands.
I would probably pass 4 (or $(nproc) etc.) to para() rather than hardcoding.