Not a member? Why not join us or log in? (Free to Post your Articles and Blogs on Technology.)

How to make a histogram In Linux

Home > Linux>
Dated : April 8, 2009
Follow us on Twitter
Connect with us on Facebook
Subscribe via RSS Feed
Add Techgurulive to Google

A histogram is set of count, value pairs indicating how often the value occurs. The basic operation will be to sort, then count how many values occur in a row and then reverse sort so that the value with the highest count is at the top of the report.

$ ... | sort |uniq -c|sort -r -n

Note that sort sorts on the whole line, but the first column is obviously significant just as the first letter in someone’s last name significantly positions their name in a sorted list.

uniq -c collapses all repeated sequences of values but prints the number of occurrences in front of the value. Recall the previous sorting:

$ awk '{print $3;}' < /home/public/cs601/unix/pageview-20021022.log | \
  sort | \
  uniq
/article/index.jsp
/article/index.jsp?page=1
/article/index.jsp?page=10
/article/index.jsp?page=2
...

Now add -c to uniq:

$ awk '{print $3;}' < /home/public/cs601/unix/pageview-20021022.log | \
  sort | \
  uniq -c
 623 /article/index.jsp
   6 /article/index.jsp?page=1
  10 /article/index.jsp?page=10
 109 /article/index.jsp?page=2
...

Now all you have to do is reverse sort the lines according to the first column numerically.

$ awk '{print $3;}' < /home/public/cs601/unix/pageview-20021022.log | \
  sort | \
  uniq -c | \
  sort -r -n
6170 /index.jsp
2916 /search/results.jsp
1397 /faq/index.jsp
1018 /forums/index.jsp
 884 /faq/home.jsp?topic=Tomcat
...

In practice, you might want to get a histogram that has been “despidered” and only has faq related views. You can filter out all page view lines associated with spider IPs and filter in only faq lines:

$ grep -v -f /tmp/spider.IPs /home/public/cs601/unix/pageview-20021022.log | \
  awk '{print $3;}'| \
  grep '/faq' | \
  sort | \
  uniq -c | \
  sort -r -n
1397 /faq/index.jsp
 884 /faq/home.jsp?topic=Tomcat
 525 /faq/home.jsp?topic=Struts
 501 /faq/home.jsp?topic=JSP
 423 /faq/home.jsp?topic=EJB
...

If you want to only see despidered faq pages that were referenced more than 500 times, add an awk command to the end.

$ grep -v -f /tmp/spider.IPs /home/public/cs601/unix/pageview-20021022.log | \
  awk '{print $3;}'| \
  grep '/faq' | \
  sort | \
  uniq -c | \
  sort -r -n | \
  awk '{if ($1>500) print $0;}'
1397 /faq/index.jsp
 884 /faq/home.jsp?topic=Tomcat
 525 /faq/home.jsp?topic=Struts
 501 /faq/home.jsp?topic=JSP

Tags:, , ,

Written by Editor


Not Getting : Search to find what you are looking for


Liked this article? To continue getting our latest free Howtos and Tutorials,
you can also grab the RSS feed or Subscribe to Techgurulive by Email

Leave a Reply



This Howtos posted under : Home > Linux->

How to make a histogram In Linux