Unix tools - xargs

I don’t often come across a lot of opportunities to use xargs. Occasionally I will grep a directory of files and maybe use xargs to mv all the files with pattern matches, but that is usually all I tend to really need it for. Because of this, I often get a little giddy (nerdy…but whatever) when I find some other real world scenarios for me to use it more!

This week I was working on a feature branch that required almost identical copies of some directories and their respective files within…..just with a minor name change and similar changes to some of the json values in each file.

At the start the directory contained the following sub-directories:

~/big_query$ ls -1
json_a_access_log_table/
json_b_access_log_table/
json_c_access_log_table/
json_e_access_log_table/
json_f_access_log_table/
json_g_access_log_table/
json_i_access_log_table/
json_j_access_log_table/
json_k_access_log_table/
json_m_access_log_table/
json_n_access_log_table/
json_q_access_log_table/
json_r_access_log_table/
json_u_access_log_table/
json_v_access_log_table/
json_y_access_log_table/
json_z_access_log_table/
native_a_access_log_table/
native_b_access_log_table/
native_c_access_log_table/
native_e_access_log_table/
native_f_access_log_table/
native_g_access_log_table/
native_i_access_log_table/
native_j_access_log_table/
native_k_access_log_table/
native_m_access_log_table/
native_n_access_log_table/
native_q_access_log_table/
native_r_access_log_table/
native_u_access_log_table/
native_v_access_log_table/
native_y_access_log_table/
native_z_access_log_table/

I needed to make a *_error_log_table/ directory for every one of the existing sub-directories that started with json_*. I approached the problem in steps. The use of unix redirection with pipes (|) lends itself to this type of thinking.

1) Filtering out non-json directories. First ls the directory contents => match directory names that contain json.

ls | grep -e json.*

~/big_query$ ls | grep -e json.*
json_a_access_log_table/
json_b_access_log_table/
json_c_access_log_table/
json_d_access_log_table/
json_e_access_log_table/
json_f_access_log_table/
json_g_access_log_table/
json_h_access_log_table/
json_i_access_log_table/
json_j_access_log_table/
json_k_access_log_table/

2) Replace “access” w/ “error”. We can simply pipe the grep stdout to sed to do the substitution.

ls | grep -e json.* | sed 's/access/error/g'

~/big_query$ ls | grep -e json.* | sed 's/access/error/g'
json_a_error_log_table/
json_b_error_log_table/
json_c_error_log_table/
json_d_error_log_table/
json_e_error_log_table/
json_f_error_log_table/
json_g_error_log_table/
json_h_error_log_table/
json_i_error_log_table/
json_j_error_log_table/
json_k_error_log_table/

3) Pass the sed command’s output to xargs to feed it to the mkdir command. Finally we get to use XARGS! I’ll show you the complete command, and we can do a quick dive into the xargs man page for additional info on the options used.

ls | grep -e json.* | sed 's/access/error/g' | xargs -n1 mkdir

~/big_query$ ls -1
json_a_access_log_table/
json_a_error_log_table/
json_b_access_log_table/
json_b_error_log_table/
json_c_access_log_table/
json_c_error_log_table/
json_d_access_log_table/
json_d_error_log_table/
json_e_access_log_table/
json_e_error_log_table/
json_f_access_log_table/
json_f_error_log_table/
json_g_access_log_table/
json_g_error_log_table/
json_h_access_log_table/
json_h_error_log_table/
json_i_access_log_table/
json_i_error_log_table/
json_j_access_log_table/
json_j_error_log_table/
json_k_access_log_table/
json_k_error_log_table/
native_a_access_log_table/
native_b_access_log_table/
native_c_access_log_table/
native_e_access_log_table/
native_f_access_log_table/
native_g_access_log_table/
native_i_access_log_table/
native_j_access_log_table/
native_k_access_log_table/
native_m_access_log_table/
native_n_access_log_table/
native_q_access_log_table/
native_r_access_log_table/
native_u_access_log_table/
native_v_access_log_table/
native_y_access_log_table/
native_z_access_log_table/

The xargs man page outlines the anatomy of the command as such:

SYNOPSIS
     xargs [-0oprt] [-E eofstr] [-I replstr [-R replacements] [-S replsize]] [-J replstr] [-L number] [-n number [-x]] [-P maxprocs] [-s size] [utility [argument ...]]

My use of the command employed the -n option.

-n number
             Set the maximum number of arguments taken from standard input for each invocation of utility.  An invocation of utility will use less than number standard input arguments
             if the number of bytes accumulated (see the -s option) exceeds the specified size or there are fewer than number arguments remaining for the last invocation of utility.
             The current default value for number is 5000.

With my use of xargs I am taking sed’s stdout and for each 1 of them (-n1) call mkdir. If you would like to debug and confirm that xargs is indeed going to run the command you expect it to, you can always use the -p option. It will prompt the user for “y” or “n” confirmation before executing the command it was going to run. Think of it as a dryrun.

-p      Echo each command to be executed and ask the user whether it should be executed.  An affirmative response, ‘y’ in the POSIX locale, causes the command to be executed, any
        other response causes it to be skipped.  No commands are executed if the process is not attached to a terminal.
~/big_query$ ls | grep -e json.* | sed 's/access/error/g' | xargs -n1 -p mkdir
mkdir json_a_error_log_table/?...y

Great. All the necessary new directories have been created! Next week I’ll cover actually copying the actual copying of the table_def.json files from the original access dir to the new error counterpart, and the following minor edits needed in these files leveraging good ole vim.