Thursday, November 1, 2007

Learning with Honeypots

Honeypot Layout

I've recently rented some dedicated server and public IP resources for running some honeypot/honeynet/whatever setups, more or less to learn, figured I would post my game plan here.

My basic idea is not earth shattering or anything new, I just hope to gain new insight, see what works or doesn't work, and find ways of using honeypots for intrusion detection or as an early warning system for a piece of the overall security monitoring puzzle.

As we all know, any traffic hitting a honeypot system is suspicious, or not warranted at best, it whittles down the amount of traffic we have to look at compared to a production host. However, if you have ever looked at logs or traffic of a publicly accessible, non-production machine, this "whittled down" traffic can still be quite large. Both from things such as worms propagating to your annoying SSH brute force scans. So how do we both look for the unkown nasties while not wasting time on the redundant, now passe, routine malicious scans and such? One way is by filtering and tiering our honeypot architecture.

Filtering, Tiering and Multiple Tools

Fortunately, there are many great tools out there for honeypots and analysis:

honeyd: http://www.honeyd.org/
nepenthes: http://nepenthes.mwcollect.org/
honeyc: https://www.client-honeynet.org/honeyc.html
Capture-HPC: https://www.client-honeynet.org/creleases.html
Honeywall: http://www.honeynet.org/tools/cdrom/

Combined with your standard monitoring and access control tools such as snort, tshark and iptables - and you come away with many ways to both watch, contain and direct how things happen.

I have planned to heavily use VMware for the virtualization aspects of both the high interaction honeypots and some of the low interaction honeypots. Tiering between filters to low interaction honeypots, then to high interaction honeypots - reduces the load and increases the matching of known misuse early on with the least amount of resources squandered.

The Plan

So, here's what I intend to do as a starting point.

An initial box will run VMware, IPtables, and monitoring software (such as tshark/argus/snort or possibly sguil). This box will pass pre-defined traffic after being filtered to a set of IP addresses exposed to an instance of honeyd.

This honeyd machine controls a set number of public IP addresses that I intend to bind to various templates at various times - floating between Linux, Windows and dynamic emulations based on honeyd's passive fingerprinting capabilities provided by p0f signatures and other abilities (how about blacklisted source IPs for instance).

At this point, honeyd will offer some custom service emulation scripts, watch for probes and pokes on various tcp and udp ports defined, and then with the help of some perl glue, make a determination what do with it. The "what to do with it" part, will be either to drop it on the floor, pass it to nepenthes, or sending it to a high interaction honeypot (a Windows one if it is most likely a Windows exploit, a Linux one if it is most likely a Linux exploit, etc.).

The virtual machines running nepenthes and the high interaction honeypots, will be on a NAT'ed network, funneled through the public IP space offered up by the front-end of this setup. Nepenthes will provide a second-line of defense, noticing worms and malware that are already known. If nepenthes does not recognize the traffic, or if the initial honeyd setup determines that these should go elsewhere, the traffic will be destined for an appropriate virtual machine running an OS most likely to match the intended target, or potentially to an emulated service.

In addition, custom perl scripts will handle SMTP service emulation, to both capture and analyze spam and the resulting links and attachments they contain. Tarpitting and utilizing client honeypot tools to visit the linked websites, is on the agenda as well.

Things to Watch For

So many things come to mind as needing that extra care and attention, or that will just be plain fun to mess around with. Here's my list:

* Routing the traffic. Both the honeyd aspect, and the perl glue that will be used to make other determinations, etc.

* Automation. How to maintain my sanity while still providing a valuable learning environment.

* Control. As with any honeypot setup, maintaining control of the various aspects as things are exploited and probed.

* Keeping the various parts of this setup from being fingerprinted and identified as "not real".

* Building a database of everything learned, and providing a usable interface to this data.

Final Thoughts

I intend for this post to be a starting point for what I learn works or doesn't work, interesting tidbits found, etc. Both documenting things I'd like to keep tabs on and sharing with other interested parties. As always, comments and thoughts are welcome.

Much of the ideas and technical know-how came from the recent, and excellent, book on Virtual Honeypots, I highly recommend you check it out.

No comments:

Post a Comment

Basic Pig usage to process Argus data

Some quick notes on testing out Pig in local mode to process some basic Argus data. Argus Capture a sampling of network traffic with Argus a...