Friday, November 9, 2007

Syslog-NG Performance Tuning


I figured I would post some general tuning options that really improve performance on busy central syslog-ng servers. The following settings are used in 2.x, although most will work in some earlier versions as well. These settings work well for me in a tiered environment where client servers are sending both over tcp and udp, from standard syslog and syslog-ng, to a central server(s) running syslog-ng 2.0.5. They are both used in heavy usage (25+ GB / day) situations, and in environments with plenty of hosts (900+).

On to the configuration choices for your central log servers...

Name Resolution

You most likely will want to resolve the IP addresses of client hosts to their hostnames, so enabling name lookups via use_dns(yes) is probably turned on. However, you should ensure you are using your cache properly. Adding dns_cache(1500) and dns_cache_expire(86400), both allow a cache of 1500 entries and set the expiration of entries in the cache to 24 hours. Keep in mind, to allow for enough entries, and account for how often your hosts change IP addresses - such as in dynamic dns environments, etc. These numbers here are just given as an example, tailor to your situation.

If you would rather use the hosts file instead, look into use_dns(persist_only) and dns_cache_hosts(/etc/hosts).

Message Size

Not so much a performance tuning option, but one that needs addressing anyhow. If you are only collecting system logs, the default setting of 8192 bytes is probably enough - but if you collect application logs, you will need to plan accordingly with your log_msg_size(#) option. You will see in your logs, indications of messages being split because they are too long if you have messages going beyond this length.

Output Buffers

Here is an extremely important setting - log_fifo_size(#). The log_fifo_size(#) setting sizes the output buffer, which every destination has. The output buffer must be large enough to store the incoming messages of every source. This setting can be set globally or per destination.

For the log_fifo_size(#), the number indicated is the number of lines/entries/messages that it can hold. By default, it is globally set, extremely conservatively - and if you do any amount of traffic, you will end up seeing dropped messages at some point. The statistics that include dropped messages are printed to syslog every 10 minutes unless you have altered this. In the statistics line it will let you know which destination is dropping messages and how many. You can then make determinations there of whether to globally increase it or per destination, and also an idea of how much larger you need to make it.

Flushing Buffers with sync

From the syslog-ng documentation: "The syslog-ng application buffers the log messages to be sent in an output queue. The sync() parameter specifies the number of messages held in this buffer."

By default, sync(#) is set to 0, which flushes messages immediately - which depending on your logging volume, can be fairly taxing. Increasing this number gently, say to 10 or 20, will hold that number of messages in its buffer before they are written to their destination.

Other Important Considerations

If you are still having trouble with dropped messages, look into using flow control within syslog-ng. Flow control allows you to finely tune the amount of messages received from a source. Although, there are potential other issues you must account for, such as slowing down the source application if it cannot hand off its log messages, etc.

Users with traditional syslog clients sending their logs via UDP, should have a look at this page on UDP Buffer Sizing.

Also, sync() and log_fifo_size() should be tweaked on your client servers as necessary if they are using syslog-ng, and handle heavy loads, sporadic sources, etc. Remember to use your statistics log entries to help you identify problems and load effectively.

No comments:

Post a Comment

Basic Pig usage to process Argus data

Some quick notes on testing out Pig in local mode to process some basic Argus data. Argus Capture a sampling of network traffic with Argus a...