Wednesday, 22 June 2016

How to Parse Log file using awk command

How to parse http status code in log file

How to parse http response code in log file

Filter log file using awk command


This post is about "How to parse log files". Below is a format of the log file we are considering.


example.com:80 208.88.125.227 - - [10/Oct/2012:05:05:01 -0400] "GET /status.html HTTP/1.1" 200 404 "-" "curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15"

The very first thing one should know before parsing the log file is the structure of the log file. Like the first column specify the domain with port, than the second column is of IP and similarly the 10th column is of the response code.

How to parse log files using cut command

After having the knowledge of these, we can parse the required things by running few commands.
Let's parse the logs now.

Here is the video tutorial explaining the example taken post 



"awk" command

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

First let's print out all the response code. As we know the response code is in the 10th column and in the awk command the column number is given by $10 where 10 is the column/field number.


awk '{print $10}' sample.log

The output would be


200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
404
404
200
200
502
502
200
502
200
200
304
304
404
200
302
200
200
200
200
200
200
200
200
200
200
200
200
404
200
200
200
200
404
200

In the above command we have displayed the 10th column.
Note that this result is not sorted.

Now let's apply the sort filter to sort these response code.
awk '{print $10}' sample.log | sort

The output would be sorted.


200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
302
304
304
404
404
404
404
404
502
502
502

Now we have the sorted list of the response code. All 200 response on the top, than 302 and so on.
You must be expecting to count the number of each type of response code, don't wait, let's count them by applying the uniq filter


awk '{print $10}' sample.log | sort | uniq -c

The output will have the count of all the unique items in the 10th column.


  39 200
   1 302
   2 304
   5 404
   3 502

This shows that there are 39 responses with 200 status, 1 with 302 status and so on.
As you can see the output is sorted on the basis of response code. What if we want to sort it on the basis of number of count instead of response code.
Let's do it by applying a sort filter again.


awk '{print $10}' sample.log | sort | uniq -c | sort

This will sort the output on the basis of number of counts


   1 302
   2 304
   3 502
   5 404
  39 200

This is being sorted in increasing order of number of counts.
Let's sort it in the decreasing order now. Just put -r option in sort


awk '{print $10}' sample.log | sort | uniq -c | sort -r

Here is the sorted output


  39 200
   5 404
   3 502
   2 304
   1 302

So far we have got the desired result in all the required way.
Let's get in more details like which requests threw which error. Observe the fields in log, the 8th field is of the requests. So let's include the 8th field and 10th field in our command and see the result.


awk '{print $8 "  "  $10}' sample.log | sort

Here is the output

* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 404
/call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
/call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
/category/FSC/feed/ 304
/category/bit-lug/ 200
/category/fedora/feed/ 304
/feed/ 200
/my-scripts/ 200
/status.html 200
/status.html 200
/status.html 200
/status.html 404
/wp-comments-post.php 302
/wp-content/themes/cordobo-green-park-2/img/logo-cgp2.png 200
/wp-content/uploads/2010/11/523433.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 404
/wp-content/uploads/2010/11/facebook.png 502
/wp-content/uploads/2010/11/flicker1.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 502
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 404
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 404
/wp-content/uploads/2010/11/twitter.png 502
/wp-login.php 200
/wp-login.php 200

Now let's again count the number of each requests with corresponding response code. You already know the trick, just apply uniq filter.

awk '{print $8 " " $10}' sample.log | sort | uniq -c

The output will have the count of each kind of requests with their corresponding response code. The count would be based on the combination of both, request and response code.


   9 * 200
   1 * 404
   2 /call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
   1 /category/FSC/feed/ 304
   1 /category/bit-lug/ 200
   1 /category/fedora/feed/ 304
   1 /feed/ 200
   1 /my-scripts/ 200
   3 /status.html 200
   1 /status.html 404
   1 /wp-comments-post.php 302
   1 /wp-content/themes/cordobo-green-park-2/img/logo-cgp2.png 200
   1 /wp-content/uploads/2010/11/523433.png 200
   3 /wp-content/uploads/2010/11/facebook.png 200
   1 /wp-content/uploads/2010/11/facebook.png 404
   1 /wp-content/uploads/2010/11/facebook.png 502
   1 /wp-content/uploads/2010/11/flicker1.png 200
   5 /wp-content/uploads/2010/11/linkedin.png 200
   1 /wp-content/uploads/2010/11/linkedin.png 502
   5 /wp-content/uploads/2010/11/rss.png 200
   1 /wp-content/uploads/2010/11/rss.png 404
   4 /wp-content/uploads/2010/11/twitter.png 200
   1 /wp-content/uploads/2010/11/twitter.png 404
   1 /wp-content/uploads/2010/11/twitter.png 502
   2 /wp-login.php 200

So now you have learned enough to parse any file using "awk" command. If you think any example/suggestion that should be included in this post than do let me know in the comments.

2 comments:

  1. Can you use awk to find stats by minute showing response codes logged, total counts per code found in the log, and breakdown of number of requests in each response time range?

    ReplyDelete
  2. out of wp-content/uploads/2010/11/twitter.png 502 out ]put i want only twitter to get print . means remove /wp-content/uploads/2010/11/

    ReplyDelete

 

Copyright @ 2013 Appychip.

Designed by Appychip & YouTube Channel