«

»

Jan 22 2011

Using the Linux/Unix shell more Effectively (Part III)

I mentioned in Part II that I was going to dive into a few more useful CLI commands.  Here now are some more useful tools that can enable you to do more with the shell.  I use these less often, however they are extremely helpful when manipulating text files, particularly those with any form of field (or column) concept in them.  Obvious examples are .csv files, where columns are delimited by commas, or mysql query output redirected to a file, where depending on how it was sent to file, the delimiter is whitespace, or whitespace and |’s.  Less obvious examples are log files (columns for parts of date, service or pid etc…) and config files.

grep

Now, most people have used grep here and there.  This one alone is not such an advanced tool.  However, there are various switches I use regularly that help make life “easier.”  Here are a few useful flags, and examples of when they’re useful.  These are just some of the ones I thought of, from my use of it just this week.  There are more, and you can find them on the man page.

  • -aN (where N is a number):  This outputs N lines before and after the line that matches your search.  This is useful when you are looking for a particular entry in /var/log/syslog for instance, but what interests you is what happens before said entry, or if you know that something you’re looking for in a text is “somewhere around” the term you’ve searched for.
  • --color=auto: This will color your match on the line using the value in GREP_COLOR variable for your shell.  Usually this is bold/Red.  Ubuntu started using this by default in their shell.   When you’re searching for particular word in a line which is super long (like what some lines in /var/log/maillog typically look like on an active mailserver this is useful, because it helps you see it better.
  • -E:  This allows you to use extended regex.  If you know regex, you’re probably using it.  If you don’t know regex… don’t worry, it’s planned for a future post.   Regex gives you flexibility that you wouldn’t believe when hunting down specific stuff in a file.
  • -n:  This one’s huge for when you’re hunting something in a huge file of code, and then you want to go change it.  When you use -n in your grep command, the matching lines are prepended by the line number in the file you’re searching through.
  • -v:  This is “inverse” match.  This is useful, especially piped to another command on the CLI, where you have some output but you want to filter out stuff you don’t want to see.
  • -i:  This one’s useful for matching non-case-sensitive strings you’re looking for.  What if you’re looking for particular text, but you don’t know if it’s all caps, all lowercase, or has only the first letter capitalized?

cut

I use cut often, since I often need to look at performance data that’s pumped into .csv format for the GUI people to look at.  It’s horribly slow on Excel, not to mention these files often have more than 65k lines.  But hey… that’s THEIR problem,  I got my shell.  There are only a couple of options with cut that I commonly use.  -d and -f-d defines the delimiter, so in the case of .csv files, it’s the comma.  The -f tells cut which resulting columns I want to see, and outputs them.  Say you have a file.csv that has three columns, date, productID, and inventory, but you only want date and inventory.  you can run cut -d',' -f'1,3' file.csv and you’ll have two columns, one for the date, and one for the inventory.

tr

This one’s rarely something I need to use, but when I do use it, it saves me some trouble.  What it does is translate the chosen characters into the one you specify.   It has many uses really.  What if you had a perl script that defined some variable, but for clarity or whatever reason, you want to change it’s name, but it’s called in script 982724320983 times?  Why reinvent the wheel with sed/awk for such a simple task that tr can handle?  You can check out the man-page for details.  I don’t use it often enough to know it all by heart, I often just look at the manpage when I need it. For me, the following is useful, to convert a line of values followed by commas into a column of text.  I find it handy when I am looking for specific columns out of a .csv file that has 60+ columns. Counting commas is tedious, and prone to error.  An example of how I use it:  head -n1 file.csv |tr " " "n" > columns.txt

sort

The command sort is a simple command, and it allows you to simply sort in numerical or alphabetical order.  I use it when I have output from a series of files that are rotated by periodic, and I want to put them together, in order of epoch time (usually the first column, to ease stuff like sorting) so I can take the output and graph it.   It’s greatest use, like many of the other tools I’ve covered so far in this series, is when combined with the others via a pipe.  Here’s a more complex example of where it comes in handy, where filename is just the newest file from filename, filename.0, filename.1 etc… : head –n1 filename > combined.file && for i in `ls –tr filename*` ; do tail --lines=+2 $i ; done |sort –un >> combined.file.  The result is a single file I can now fiddle with, where all entries are in chronological order, thanks to it sorting by epoch time.  In the example above, I also added -u so that I only include duplicates once.  Just FYI, to my knowledge, for loops are bash shell only.

Uniq

Since I just mentioned the -u flag on sort, I may as well add the uniq function to this list.  This one is a useful tool when you want to pay attention to the duplicates.  For instance, you can use the -c flag to count how many times the same exact line appears in a log file.  You also have some basic field delimitation and skipping so that you can look at lines that are identical, but that the timestamp would obviously change.  You can set fields, and skip those for date/time using a combination of -D and -f.  You should look at the manpage if ever you find yourself needing this.  The options for fields and such is rudimentary within uniq.  I find myself often taking the contents of a file, piping it through grep, and cut, then finally through uniq -c to get a count of all similar lines, because I’ll often have multiple repeated lines:

cat example.file* | cut -d' ' -f'3,9,12,21,36' |grep -Ev "null$" | sort |uniq -c| sed "s/^[ t]*//" |cut -d '1,6'

Looks complex, but the resulting output of this tells me within seconds, for all the example.file iterations (they are rotated into example.file.1, example.file.2 etc… by periodic), how many times the column 36, which indicates an important boolean to me, was true or false, and looks like this:

28029 True

23 False

wc

Finally, we have wc.  This one’s pretty simple… it gives you a word count.  I only really ever use it to give me a line count, using wc -l.  I often used it with mailq to give me an indication of how big the mailqueue was on the smtp server, especially when one of the clients who forwarded their mail outbound through our SMTP server where I used to work would start sending out their mass emails.  Those weren’t fun days.   Specifically, I ran mailq |wc -l .

So that’s it for now, I’m only scratching the surface of just some of the many useful commands available on the linux/unix shell.  My hope is that I can share this knowledge with everyone so more and more people can discover that the boring, old, scary, painful, “limiting” shell is still the most powerful toolbelt you’ll find.  Anywhere.   Stay tuned for further examples of different commands you can use on the shell, I’ll probably just have one post for each moving forward, instead of these longer posts.

1 ping

  1. Using the Linux/Unix shell more Effectively (Part II) | obsecured.net

    [...] allow other commands (like say grep, cut, tr, sort, wc or uniq – which I've covered in Part III [...]

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>