Re: syslogd hangs the system

From: guy keren <choo_at_actcom.co.il>
Date: Tue, 16 Nov 2010 04:53:14 +0200

as a work around, you may:

1. disable name lookups completely using the '-x' flag of syslogd.

or

2. use a local caching name server (for example: nscd).

--guy

Yinglin Sun wrote:
> Hi folks,
>
> Recently I'm struggling with a problem that syslogd hangs the system.
> syslogd is running on our system, and we configured a couple of remote
> log servers in /etc/syslogd.conf. We found syslogd was stuck in doing
> gethostbyname when our DNS servers are not reachable, so blocked all
> processes writing logs to syslogd.
>
> After digging a little bit, I found for each remote log host,
> gethostbyname takes 20 seconds to return until timeout when one DNS
> server is unreachable, 40 s if two DNS servers cannot be reached.
> Since we have many lines doing remote logging in /etc/syslogd.conf,
> syslogd takes a lot of time for gethostbyname and hangs the system.
>
> By searching Internet, this problem looks very popular. Many people
> ran into it. However, I cannot find the solution for it. I'm wondering
> if there is already some fix to address this problem?
>
> By looking at the 1.5 code, I found two problems.
> 1. f->f_time is not updated in the case F_FORW_UNKN at line 1820.
> This makes it do gethostbyname 10 times consecutively if the logging
> messages come in the high rate. Let's say 3 DNS servers are not
> reachable, 5 lines in syslogd.conf use remote server. Then syslogd
> will have to take 3 * 20 * 5 * 10 = 50 minutes for gethostbyname. The
> thing will get even worse if more remote servers and DNS servers are
> used.
>
> If f->f_time is updated every time we hit case F_FORW_UNKN, we can
> distribute these lookups every INET_SUSPEND_TIME (3 minutes). Although
> the system still hangs for a while, it's much better than hanging for
> 50 minutes consecutively.
>
> 2. resolve the same remote host every time
> When we use the same remote log servers in multiple lines of
> syslogd.conf, syslogd always resolves the same servers again, since it
> treats every line separately. If we can resolve the same servers only
> once in a period like INET_SUSPEND_TIME, and reuse the result in the
> following attempts, that will save a lot of time for gethostbyname.
>
> I don't know if we already have the fix for this critical problem. Any
> information will be helpful for me.
>
> I attach our syslogd.conf.
>
> Thanks!
>
> Yinglin
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> *.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
> /ddr/var/log/messages
> *.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
> @abcdefgsldkjf.com
> *.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
> @kdsljfdjff.com
>
> *.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
> /ddr/var/log/debug/messages.support
>
> *.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
> /ddr/var/log/debug/messages.engineering
>
> local1.notice
> /ddr/var/log/messages
> local1.notice
> @abcdefgsldkjf.com
> local1.notice
> @kdsljfdjff.com
>
> local1.notice;local3.notice
> /ddr/var/log/debug/messages.support
>
> local1.notice;local3.notice;local4.*
> /ddr/var/log/debug/messages.engineering
>
> authpriv.*
> /ddr/var/log/debug/secure.log
>
> mail.*
> /var/log/maillog
>
> cron.*
> /var/log/cron
>
> *.alert;local3.none;local4.none *
> *.alert;local3.none;local4.none
> @abcdefgsldkjf.com
> *.alert;local3.none;local4.none
> @kdsljfdjff.com
>
> uucp,news.crit
> /var/log/spooler
>
> *.alert;local3.none;local4.none
> |/ddr/dev/ems_pipe
>
> kern.alert
> |/ddr/dev/kmsg_pipe
>
> local2.notice
> /dev/console
>
> kern.*
> /ddr/var/log/debug/platform/kern.info
> kern.*
> @abcdefgsldkjf.com
> kern.*
> @kdsljfdjff.com
>
> kern.error
> /ddr/var/log/debug/platform/kern.error
> kern.error
> @abcdefgsldkjf.com
> kern.error
> @kdsljfdjff.com
>
> local6.*
> /ddr/var/log/debug/cifs/cifs.log
>
>
Received on Tue Nov 16 2010 - 03:53:14 CET

This archive was generated by hypermail 2.2.0 : Tue Nov 16 2010 - 03:53:23 CET