February 20, 2015: Shell script file manipulation

Last Edited: April 12, 2017

This was originally an old post that was confusing and had some really bad code from early 2015. I've decided to update it because I have learned a thing or two since then and can't bear to let the few posts I have be garbage, so here goes.

In the past I have been confronted with log files, particular taken from Windows platforms, that come in very strange forms. In this example I'm taking a number of server log files that (1) have horrific file names, and (2) whose data inside said files contain no distiguishing identifiers as to which server they belong to.

(sample names)

	$ ls
	[8787&( ges )]{\data\server1}.txt
	[8787&( ges )]{\data\server2}.txt
	[8787&( ges )]{\data\server3}.txt
	[8787&( ges )]{\data\server4}.txt

(sample content)

	$ cat \[8787\&\(\ ges\ \)\]\{\\data\\server1}.txt

Each file contained the same type of data about the given servers, however the data within the files did not specify which server it came from, so dumping all the data to a spreadsheet together for analysis would have been useless.

I turned to shell scripting to solve my problem, figuring there must be a way to prepend each data string with the file name so I could tell which data belonged to which server.

My first problem was the file names. They were crazy since they came from a Windows platform and were not friendly at all to my linux file system when I tried to read from them...errors like "no such directory" and whatnot...so I decided to first clean up the names without corrupting their content, then move on to prepending the filename and combining everything.

Trying to cat the files directly yield errors like this unless I quoted everything...

	$ ls | while read -r FILE; do cat $FILE; done
	cat: [8787&(: No such file or directory
	cat: ges: No such file or directory
	cat: )]{\data\server1}.txt: No such file or directory
	cat: [8787&(: No such file or directory
	cat: ges: No such file or directory

So I needed to get creative with the file names using some awk commands. Here's the one-liner I came up with...which significantly cleaned up some earlier attempts at this same problem:

	# two loops are needed, one to enumerate the files from the directory and
	# the second to enumerate the file contents of each file.
	$ ls | while read -r FILE; do for j in $(cat "$FILE");  
	     do echo $(echo "$FILE" | awk -F '\\' '{print $3}' | awk -F '}' '{print $1}') $j >> master.file; 
          done; done
	$ cat master.file
	server1 data1
	server1 data2
	server1 data3
	server1 data4
	server1 data5

With that, my logs can now be shown in one clean file with all the data associated with its appropriate server. While other peoples files will vary, my hope is this kind of scripting helps others faced with similar situations who don't want to spend three days in a spreadsheet trying to parse out content. As a former co-worker use to tell me: "Let the technology work for you."