Friday, November 9, 2007

Syslog-NG Performance Tuning


I figured I would post some general tuning options that really improve performance on busy central syslog-ng servers. The following settings are used in 2.x, although most will work in some earlier versions as well. These settings work well for me in a tiered environment where client servers are sending both over tcp and udp, from standard syslog and syslog-ng, to a central server(s) running syslog-ng 2.0.5. They are both used in heavy usage (25+ GB / day) situations, and in environments with plenty of hosts (900+).

On to the configuration choices for your central log servers...

Name Resolution

You most likely will want to resolve the IP addresses of client hosts to their hostnames, so enabling name lookups via use_dns(yes) is probably turned on. However, you should ensure you are using your cache properly. Adding dns_cache(1500) and dns_cache_expire(86400), both allow a cache of 1500 entries and set the expiration of entries in the cache to 24 hours. Keep in mind, to allow for enough entries, and account for how often your hosts change IP addresses - such as in dynamic dns environments, etc. These numbers here are just given as an example, tailor to your situation.

If you would rather use the hosts file instead, look into use_dns(persist_only) and dns_cache_hosts(/etc/hosts).

Message Size

Not so much a performance tuning option, but one that needs addressing anyhow. If you are only collecting system logs, the default setting of 8192 bytes is probably enough - but if you collect application logs, you will need to plan accordingly with your log_msg_size(#) option. You will see in your logs, indications of messages being split because they are too long if you have messages going beyond this length.

Output Buffers

Here is an extremely important setting - log_fifo_size(#). The log_fifo_size(#) setting sizes the output buffer, which every destination has. The output buffer must be large enough to store the incoming messages of every source. This setting can be set globally or per destination.

For the log_fifo_size(#), the number indicated is the number of lines/entries/messages that it can hold. By default, it is globally set, extremely conservatively - and if you do any amount of traffic, you will end up seeing dropped messages at some point. The statistics that include dropped messages are printed to syslog every 10 minutes unless you have altered this. In the statistics line it will let you know which destination is dropping messages and how many. You can then make determinations there of whether to globally increase it or per destination, and also an idea of how much larger you need to make it.

Flushing Buffers with sync

From the syslog-ng documentation: "The syslog-ng application buffers the log messages to be sent in an output queue. The sync() parameter specifies the number of messages held in this buffer."

By default, sync(#) is set to 0, which flushes messages immediately - which depending on your logging volume, can be fairly taxing. Increasing this number gently, say to 10 or 20, will hold that number of messages in its buffer before they are written to their destination.

Other Important Considerations

If you are still having trouble with dropped messages, look into using flow control within syslog-ng. Flow control allows you to finely tune the amount of messages received from a source. Although, there are potential other issues you must account for, such as slowing down the source application if it cannot hand off its log messages, etc.

Users with traditional syslog clients sending their logs via UDP, should have a look at this page on UDP Buffer Sizing.

Also, sync() and log_fifo_size() should be tweaked on your client servers as necessary if they are using syslog-ng, and handle heavy loads, sporadic sources, etc. Remember to use your statistics log entries to help you identify problems and load effectively.

Thursday, November 1, 2007

Learning with Honeypots

Honeypot Layout

I've recently rented some dedicated server and public IP resources for running some honeypot/honeynet/whatever setups, more or less to learn, figured I would post my game plan here.

My basic idea is not earth shattering or anything new, I just hope to gain new insight, see what works or doesn't work, and find ways of using honeypots for intrusion detection or as an early warning system for a piece of the overall security monitoring puzzle.

As we all know, any traffic hitting a honeypot system is suspicious, or not warranted at best, it whittles down the amount of traffic we have to look at compared to a production host. However, if you have ever looked at logs or traffic of a publicly accessible, non-production machine, this "whittled down" traffic can still be quite large. Both from things such as worms propagating to your annoying SSH brute force scans. So how do we both look for the unkown nasties while not wasting time on the redundant, now passe, routine malicious scans and such? One way is by filtering and tiering our honeypot architecture.

Filtering, Tiering and Multiple Tools

Fortunately, there are many great tools out there for honeypots and analysis:

honeyd: http://www.honeyd.org/
nepenthes: http://nepenthes.mwcollect.org/
honeyc: https://www.client-honeynet.org/honeyc.html
Capture-HPC: https://www.client-honeynet.org/creleases.html
Honeywall: http://www.honeynet.org/tools/cdrom/

Combined with your standard monitoring and access control tools such as snort, tshark and iptables - and you come away with many ways to both watch, contain and direct how things happen.

I have planned to heavily use VMware for the virtualization aspects of both the high interaction honeypots and some of the low interaction honeypots. Tiering between filters to low interaction honeypots, then to high interaction honeypots - reduces the load and increases the matching of known misuse early on with the least amount of resources squandered.

The Plan

So, here's what I intend to do as a starting point.

An initial box will run VMware, IPtables, and monitoring software (such as tshark/argus/snort or possibly sguil). This box will pass pre-defined traffic after being filtered to a set of IP addresses exposed to an instance of honeyd.

This honeyd machine controls a set number of public IP addresses that I intend to bind to various templates at various times - floating between Linux, Windows and dynamic emulations based on honeyd's passive fingerprinting capabilities provided by p0f signatures and other abilities (how about blacklisted source IPs for instance).

At this point, honeyd will offer some custom service emulation scripts, watch for probes and pokes on various tcp and udp ports defined, and then with the help of some perl glue, make a determination what do with it. The "what to do with it" part, will be either to drop it on the floor, pass it to nepenthes, or sending it to a high interaction honeypot (a Windows one if it is most likely a Windows exploit, a Linux one if it is most likely a Linux exploit, etc.).

The virtual machines running nepenthes and the high interaction honeypots, will be on a NAT'ed network, funneled through the public IP space offered up by the front-end of this setup. Nepenthes will provide a second-line of defense, noticing worms and malware that are already known. If nepenthes does not recognize the traffic, or if the initial honeyd setup determines that these should go elsewhere, the traffic will be destined for an appropriate virtual machine running an OS most likely to match the intended target, or potentially to an emulated service.

In addition, custom perl scripts will handle SMTP service emulation, to both capture and analyze spam and the resulting links and attachments they contain. Tarpitting and utilizing client honeypot tools to visit the linked websites, is on the agenda as well.

Things to Watch For

So many things come to mind as needing that extra care and attention, or that will just be plain fun to mess around with. Here's my list:

* Routing the traffic. Both the honeyd aspect, and the perl glue that will be used to make other determinations, etc.

* Automation. How to maintain my sanity while still providing a valuable learning environment.

* Control. As with any honeypot setup, maintaining control of the various aspects as things are exploited and probed.

* Keeping the various parts of this setup from being fingerprinted and identified as "not real".

* Building a database of everything learned, and providing a usable interface to this data.

Final Thoughts

I intend for this post to be a starting point for what I learn works or doesn't work, interesting tidbits found, etc. Both documenting things I'd like to keep tabs on and sharing with other interested parties. As always, comments and thoughts are welcome.

Much of the ideas and technical know-how came from the recent, and excellent, book on Virtual Honeypots, I highly recommend you check it out.

Tuesday, September 11, 2007

Capturing flow data from your Linksys at home


As a big believer in flow/session data collection in all NIDS locations, it is only right that there be an easy way to do so at home without putting a full-time IDS in place. So with a trusty Linksys router re-flashed with DD-WRT, an extra package installed on the router, and a suite of flow collection/analysis tools on your primary Linux desktop, we can easily achieve this.

On your Linksys:

  1. First things first. In this scenario we re-flashed a Linksys router with DD-WRT, following these instructions.
  2. Next, via the DD-WRT web interface, we enabled JFFS2 support and SSH located in subsections of the Administration tab.
  3. Moving on, update your ipkg configuration, with: ipkg update. Then install fprobe via ipkg: ipkg install fprobe.
  4. Finally, add a shell script to /jffs/etc/config/fprobe.startup. Change permissions: chmod 700 fprobe.startup and reboot your router. The file should contain the following command: fprobe -i br0 -f ip 192.168.1.100:9801
A brief discussion of the fprobe command is needed:

  • -i specifies the interface you are interested in watching flows on. I chose my internal interface.
  • - f specifies a bpf filter. In this scenario, I chose to only create flow records for IP traffic.
  • IP:Port, is the remote IP address and UDP port that you have your flow collector listening on - this will be later on your desktop Linux box.
On your Linux box:

  1. Install flow-tools from here. All that is needed, is a standard: configure; make; make install. *There is one caveat to watch out for, if you use gcc 4.x, a patch available where you downloaded the tarball is necessary.
  2. Create a directory to store your flow data: mkdir -p /data/flows/internal
  3. If you run IPTables or some other host-based firewall, make sure to allow UDP 9801 connections from your router.
  4. Finally, both run the following command and add it somehow to your system startup (via /etc/rc.local, for example): /usr/local/netflow/bin/flow-capture 192.168.1.100/192.168.1.1/9801 -w /data/flows/internal
A brief discussion of the flow-capture command is needed:

  • You specify the network interface you want your collector to listen on, then the address of the flow probe, followed by the UDP port to use - all in a local/remote/UDP format.
  • -w specifies to write out flow files to that directory. By default, flow-capture will have new ones for every 15 minute chunk of time.
So now that we have some flow data being collected to your machine, what are some cool things we can do with it? Looking in flow-tools default install directory for binaries, /usr/local/netflow/bin, we see numerous flow-* tools. We'll look at a few briefly below.

Using flow-print:

flow-print < ft-v05.2007-09-11.080001-0400

The above command will print out the results contained in that particular flow file. The columns will contain srcIP/dstIP/protocol/srcPort/dstPort/octets/packets. The octets line is the equivalent of bytes. This is your standard session/flow data.

Adding a "-f 1" flag will produce timestamps among other things. The -f flag allows for numerous types of formatting and additional columns, etc.

On a sidenote, standard *nix tools - such as awk and grep can be very useful in pulling data from plain old dumps of the flow records.

Using flow-cat and flow-stat:

Much like Argus, with flow-tools you stack together various of the utilities to get output like you want.

flow-cat ft-v05.2007-09-11.0* | flow-stat -f9 -S2

In the above set of commands, flow-cat is used to concatenate all the files that names match that criteria. The resulting output is passed to flow-stat for crunching and displaying. The flow-stat command generates reports, taking formatting options via the -f flag and sorting on both -S and -s. Our example specified a report format on the Source IP address, and sorting based on the Octet (ie. Bytes) field (have a look at the man page for flow-stat to see all the various options). Thus, we now have detailed output from all those files, showing the *noisiest* source hosts listed by most bytes transferred.

Utilizing your desktop and a router, things you probably already have at home, you too can watch/collect/analyze flow data to keep a watchful eye on your network - without deploying a dedicated NIDS or NSM sensor.

Monday, September 3, 2007

Writing Prelude LML rules

In this post, we will cover the basics of adding your own rules to Prelude-LML, which is Prelude's own log monitoring analysis engine. Highly optimized in C, Prelude-LML, comes with numerous rules built-in for everything from SSH authentication to Netscreen firewall rules.

To start we will navigate to the default LML ruleset directory, which is located in /usr/local/etc/prelude-lml/ruleset - unless you specified otherwise.

Our Example

For example purposes, we'll use the made up syslog entry below to follow along with. It shows the date, a host named some_hostame, with a service called mylogger_service, that printed some message.

Aug 22 05:22:05 some_hostname mylogger_service: We have an imortant message here.

Setting Up

The first thing you want to do is have a look at pcre.rules. The pcre.rules file, provides a way to match on certain criteria that all rules in a certain set will all match on, a lot of them are based on services, such as ssh, which allows LML to limit how many log messages are processed against which rules.

Since our service, mylogger_service, is new and no other specific rulesets apply to it, we'll add a line to pcre.rules to only apply our new rule (which we will add to its own file later, called local.rules). Adding the following to pcre.rules will do just this:

regex=mylogger_service; include = local.rules;

What this does, is only process rules in the local.rules file, if the log entry contains "mylogger_service" in it.


Adding a Rule to its own Ruleset

So now we see that we have prepped our new rule file (local.rules), to receive any log entries that contain the line mylogger_service in them. We next need to add rules to our local.rules file for further processing and alerting on any matches. Here is what we will add to local.rules for our example:

# Detect important messages from the mylogger_service.
# LOG:Aug 22 05:22:05 some_hostname mylogger_service: We have an imortant message here.
regex=important message; \
classification.text=Important Message Detected.; \

id=32001; \

revision=1; \

assessment.impact.type=other; \

assessment.impact.severity=low; \

assessment.impact.description=An important message was detected with the mylogger_service; \

last;


Stepping through this example, we see the following:

- A comment line that describes what this rule is about.
- A LOG line that shows an actual example syslog entry for what we are looking for.
- regex, that is what we are matching on, any potential regex can be used here. Such as character classes (\w,\d) or wildcards (.,*,+).
- classification.text, here is the main alert text for this rule
- id, which differentiates on this particular rule
- revision, bump this up by one as you make production edits
- impact type, which can be things such as admin, user, other, etc.
- impact severity, such as low, medium, high
- impact description, a longer description of what most likely is referenced in classification.text
- last, which basically tells LML to stop further processing if this rule matches

Many more IDMEF fields may be used, such as references or process names.

When mylogger_service is seen in a syslog entry that LML processes, it will process this entry against all the rules in the local.rules file (which is how we set this up in pcre.rules). Furthermore, if the entry matches our regex of "important message", we will get an alert with severity of low, a message text of "Important Message Detected", and the various other settings we have set.

Conclusion

This example showed a simple way of adding rules to your LML engines. You may have noticed when looking in pcre.rules, at the top is a "best practices" section on creating/adding LML rules. For much more extensive information, look here.

Friday, August 17, 2007

Threat Assessments with Argus


A useful practice for both incident response and general discovery, is the practice of threat assessments using session/flow data. My tool of choice for this is Argus, but any session/flow tool such as NetFlow or SANCP will do. For further information beyond this post, reference the book Extrusion Detection for extensive details of traffic threat assessments with both Argus and SANCP. I'll assume you are already familiar with collecting Argus data, if not have a look at the Argus labels on this blog for articles pertaining to it.

What I'll describe here for conducting a threat assessment, is what I call a blind threat assessment. What I mean by "blind", is that I am not looking for particular traffic like you would when responding to an incident - where you know a victim address, and possibly a source address and protocols. In the past during any downtime that I had, I would pick an Argus data file (which I generally rotate either daily or every X number of hours, depending on how busy the sensor collecting the data is), and pick it apart.

Let's move on to an example, reading in your Argus file of choice.

ra -nn -r /data/argus_data.arg

This pulls in and displays all the data in the Argus file, including src/dst IPs & ports, data transferred, etc. But let's apply some BPFs to it - let's say your mail server is at address 192.168.l.25, and for this assessment you don't care about traffic to/from it.

ra -nn -r /data/argus_data.arg - not host 192.168.1.25

So now on the screen scrolls by gobs of data that does not contain anything related to your mail server at that address. Next, we may decide that any web traffic is of no interest to us today - so we append more BPFs to our current one and continue to whittle down the amount of traffic displayed by the Argus client.

ra -nn -r /data/argus_data.arg - not host 192.168.1.25 and not port 80 and not port 443

Next, you realize you are seeing a bunch of ARP traffic that is of little use to you currently - so let's get rid of it too.

ra -nn -r /data/argus_data.arg - not host 192.168.1.25 and not port 80 and not port 443 and not arp

The basic premise of this blind assessment is to narrow down your view of the data until you get to various things you may never notice, such as a user running a new peer-to-peer client or a rogue MP3 server on your corporate network. You can continue to limit with BPFs, adding them on to the end of your list, or start utilizing rasort to find larger bandwidth sessions (maybe you like the noisy stuff). The whole principle of this blind threat assessment, is that there is no wrong way of doing it - stumbling randomly across some weird connection then applying a human's logic to it, is something your traditional signature-based NIDS can't do.

You won't always be able to catch everything this way, as depending on how much traffic you look at and what you decide to globally eliminate, huge chunks of traffic will never be reviewed. Nonetheless, I feel that the occasional, manual review, adds value as you usually turn up something interesting that you did not know about. So take fifteen minutes of your day or week, and notice something new.

Tuesday, July 31, 2007

Managing Snort Rules

I know many people like to use Oinkmaster to pull in rule updates, manage rules, etc. - and I am no different. But....one thing I tend not to do, is maintain numerous Oinkmaster configurations for various hosts. Let me elaborate on a rule management scheme for Snort signatures that I find easier to manage.

Let's say we have five Snort sensors deployed, watching various different types of networks and hosts - and thus requiring vastly different rule tweaks and tuning. Each of these sensors is going to require both some of the same and different rulesets and individual rules to be loaded. You could let Oinkmaster handle this or you could use thresholding within Snort itself.

Example Architecture:

  • Download new Snort signatures daily from snort.org and BleedingThreats via Oinkmaster, disabling rules that you globally do NOT use (with disablesid lines).
  • Run these new rules through a loaded Snort configuration (basically a config with most everything turned on), in testing mode ( -T ). Potentially quarantine or notify a rule maintainer that new rules are available.
  • If the new rules pass the Snort testing phase, make these rules centrally available - possibly via a revision control repository, scp, or packaged (rpm, deb, etc.).
  • Have your sensors check for new signatures at various intervals (using revision, timestamp, or version to decipher if a new ruleset is available). Let Snort restart with the new rules.
Here's the per-sensor tweak:
  • Utilize thresholding in Snort to suppress or threshold rules that affect less-than-all of your sensors. Have a look at threshold.conf that comes along with the Snort tarball.
So essentially, this approach, relies on one server to update and maintain centrally, new Snort signatures. A rule that can be removed from all sensor configurations is done so at the Oinkmaster level, while rules that affect anything but all of the sensors is handled at the sensor level in threshold.conf.

Here is the caveat and where many people take exception to this process: it isn't as efficient on the Snort detection engine itself. When rules are removed at the Oinkmaster level, they are never loaded into the Snort configuration. However, when suppressed via Snort thresholding, it happens post-processing (ie. Snort has already detected a hit on the rule, and it then decides it should be suppressed).

So the moral of the story is, if you have the spare cycles in beefy sensors that are not bogged down with the traffic they are watching - then you can benefit from the administration ease and having rule modifications stay close to the sensor for view, etc. If your sensors are already taxed, you should really use Oinkmaster all the way through.

Monday, July 23, 2007

High Availability Prelude Central Services

I recently posted to the Prelude wiki, a sample configuration for providing high availability for your central Prelude services.

Basically what the configuration provides is a setup across two servers to host these services: Manager, Correlator, Apache, MySQL, and Prewikka. It is purely a fault tolerant scheme, as opposed to a performance booster. Although, you could spread the load to increase performance - such as with a split of the web interface to one while the other provides database interaction for incoming events, or offloading things such as reporting and backups to the secondary.

MySQL v5 is required to avoid potential auto increment collisions when doing multi-master replication. Other than that, there really should be minimal changes needed for using various versions with regards to the other software pieces. Either host in the pair, is capable of taking over as the primary - only caveat is heartbeat is for machine failure, not service failure. So....you will still need your application-level monitoring (ie. Nagios or other snmp-based solution) in place to be notified of service issues.