A histogram is set of count, value pairs indicating how often the value occurs. The basic operation will be to sort, then count how many values occur in a row and then reverse sort so that the value with the highest count is at the top of the report.
$ ... | sort |uniq -c|sort -r -n
Note that sort sorts on the whole line, but the first column is obviously significant just as the first letter in someone’s last name significantly positions their name in a sorted list.
uniq -c collapses all repeated sequences of values but prints the number of occurrences in front of the value. Recall the previous sorting:
$ awk '{print $3;}' < /home/public/cs601/unix/pageview-20021022.log | \
sort | \
uniq
/article/index.jsp
/article/index.jsp?page=1
/article/index.jsp?page=10
/article/index.jsp?page=2
...
Now add -c to uniq:
$ awk '{print $3;}' < /home/public/cs601/unix/pageview-20021022.log | \
sort | \
uniq -c
623 /article/index.jsp
6 /article/index.jsp?page=1
10 /article/index.jsp?page=10
109 /article/index.jsp?page=2
...
Now all you have to do is reverse sort the lines according to the first column numerically.
$ awk '{print $3;}' < /home/public/cs601/unix/pageview-20021022.log | \
sort | \
uniq -c | \
sort -r -n
6170 /index.jsp
2916 /search/results.jsp
1397 /faq/index.jsp
1018 /forums/index.jsp
884 /faq/home.jsp?topic=Tomcat
...
In practice, you might want to get a histogram that has been “despidered” and only has faq related views. You can filter out all page view lines associated with spider IPs and filter in only faq lines:
$ grep -v -f /tmp/spider.IPs /home/public/cs601/unix/pageview-20021022.log | \
awk '{print $3;}'| \
grep '/faq' | \
sort | \
uniq -c | \
sort -r -n
1397 /faq/index.jsp
884 /faq/home.jsp?topic=Tomcat
525 /faq/home.jsp?topic=Struts
501 /faq/home.jsp?topic=JSP
423 /faq/home.jsp?topic=EJB
...
If you want to only see despidered faq pages that were referenced more than 500 times, add an awk command to the end.
$ grep -v -f /tmp/spider.IPs /home/public/cs601/unix/pageview-20021022.log | \
awk '{print $3;}'| \
grep '/faq' | \
sort | \
uniq -c | \
sort -r -n | \
awk '{if ($1>500) print $0;}'
1397 /faq/index.jsp
884 /faq/home.jsp?topic=Tomcat
525 /faq/home.jsp?topic=Struts
501 /faq/home.jsp?topic=JSP
Tags:histogram, linux, make, unix

