Tuesday, October 16, 2012

Basic Pig usage to process Argus data

Some quick notes on testing out Pig in local mode to process some basic Argus data.

Argus
  • Capture a sampling of network traffic with Argus
    • argus -w capture.arg -i eth0
  • Pre-process the Argus data
    • ra -r capture.arg -nn -c, -s proto saddr sport daddr dport bytes - tcp or udp > capture.csv
Install Hadoop and Pig
  • cd /usr/local
  • tar -xvzf hadoop-0.20.2.tar.gz 
  • ln -s /usr/local/hadoop-0.20.2 /usr/local/hadoop
  • tar -xvzf pig-0.10.0.tar.gz 
  • ln -s /usr/local/pig-0.10.0 /usr/local/pig
  • Add to your .bash_profile
    • export JAVA_HOME=/usr
    • export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/pig/bin
Run your Pig scripts like
  • pig -x local <blah>.pig
Sample Pig configs

sum_dport.pig


argfile = load 'capture.csv' using PigStorage(',') as (Proto,SrcAddr,Sport,DstAddr,Dport,TotBytes);
grouped = group argfile by Dport;
mysum   = foreach grouped generate group, SUM(argfile.TotBytes);
store mysum into 'arg_pig_out';

sum_dport_by_saddr.pig

argfile = load 'capture.csv' using PigStorage(',') as (Proto,SrcAddr,Sport,DstAddr,Dport,TotBytes:int);
grouped = group argfile by (SrcAddr,Dport);
mysum   = foreach grouped generate group, SUM(argfile.TotBytes);
bysum   = order mysum by $1 desc;
top10   = limit bysum 10;
dump top10;

sum_dport_by_saddr_filter22.pig

argfile = load 'capture.csv' using PigStorage(',') as (Proto,SrcAddr,Sport,DstAddr,Dport:chararray,TotBytes:int);
onlyssh = filter argfile by Dport matches '22';
grouped = group onlyssh by (SrcAddr,Dport);
mysum   = foreach grouped generate group, SUM(onlyssh.TotBytes);
bysum   = order mysum by $1 desc;
top10   = limit bysum 10;
dump top10;

unique_srcip_dstip_dstport.pig

argfile = load 'capture.csv' using PigStorage(',') as (Proto,SrcAddr);
grouped = group argfile by SrcAddr;
uniq    = distinct grouped;
top10   = limit uniq 10;
dump top10;

Basic Pig usage to process Argus data

Some quick notes on testing out Pig in local mode to process some basic Argus data. Argus Capture a sampling of network traffic with Argus a...