Unpacking Argument Lists
I was recently working on a python wrapper around grep. I often find myself having to search for random patterns through tons of CSVs. At work I have used some applications that have helped out, but inevitably I found that said applications are lacking in some way or another. Maybe it’s that the application did not easily allow any kind of recursive searching, did not allow ways to specify certain directories to not even bother searching, or (godforbid!) it did not allow any kind of regular expression use. I thought it’d be fun to try to whip up a little script that could meet my needs….fun & practical. The code is available on my GitHub (although it may have changed some from the time of this writing).
In a nutshell, the python script utilizes the Subprocess module to call grep to look for a given regular expression. In the example below I simply was looking for the pattern “smith” (case insensitive) in a directory appropriately named “test_data”.
The results are below:
“Great!”, I thought. Now I can make a head
call to gather the given file’s column headers. Seems simple enough. In this case, my test data was all in one directory appropriately name “test_data”, so I did not think much of how to accomplish this task. We can use make a head -1
call using the same Subprocess module
to collect the first row from the files we are working with (assuming that the files have column headers in the first row).
Before I can make the head -1
call, I need to first parse the path to the file from the data yielded in the first column (as shown above). Lastly, I can pass the individual variables client & file_name to the os.path.join()
method to get a intelligently joined path. :
COMPLICATING STUFF
Well what if your grep search finds hits at different depths of recursion? For instance, what if the directory containing all the data I wished to grep looked something like this:
Unfortunately, os.path.join()
is a function call requiring separate positional arguments. This means one cannot simply create a list of strings that are to be joined by os.path.join()
to create the path to a file to pull headers for.
This puzzled me for a while, but rather than immediately try to start coding a solution I made myself take a quick break and grab a coffee. I had a feeling I had stumbled across something similar when I was reading through every Python text book I could find (back when I was reading a lot more than actually writing).
Sure enough I remembered reading about how to denote functions that could be called with arbitrary arguments lists using the “*” character. The same character can also unpack arguments that are already in a list or tuple to be used for a function call that needs separate arguments….like os.path.join()
!
Utilizing the almighty * I was able to easily grep files for a regex pattern & pull header rows for any files that were reported as having matches by my previous grep call with one python script.