Re: syslogd hangs the system

From: Martin Schulze <joey_at_infodrom.org>
Date: Tue, 16 Nov 2010 20:51:16 +0100

Yinglin Sun wrote:
> Hi folks,
>
> Recently I'm struggling with a problem that syslogd hangs the system.
> syslogd is running on our system, and we configured a couple of remote
> log servers in /etc/syslogd.conf. We found syslogd was stuck in doing
> gethostbyname when our DNS servers are not reachable, so blocked all
> processes writing logs to syslogd.

If your DNS servers are unreliable it might be possible to circumvent
this problem by using the IP address in syslog.conf instead of the
host names.

> After digging a little bit, I found for each remote log host,
> gethostbyname takes 20 seconds to return until timeout when one DNS
> server is unreachable, 40 s if two DNS servers cannot be reached.
> Since we have many lines doing remote logging in /etc/syslogd.conf,
> syslogd takes a lot of time for gethostbyname and hangs the system.
>
> By searching Internet, this problem looks very popular. Many people
> ran into it. However, I cannot find the solution for it. I'm wondering
> if there is already some fix to address this problem?
>
> By looking at the 1.5 code, I found two problems.
> 1. f->f_time is not updated in the case F_FORW_UNKN at line 1820.
> This makes it do gethostbyname 10 times consecutively if the logging
> messages come in the high rate. Let's say 3 DNS servers are not
> reachable, 5 lines in syslogd.conf use remote server. Then syslogd
> will have to take 3 * 20 * 5 * 10 = 50 minutes for gethostbyname. The
> thing will get even worse if more remote servers and DNS servers are
> used.

That code apparently needs some reworking...

> If f->f_time is updated every time we hit case F_FORW_UNKN, we can
> distribute these lookups every INET_SUSPEND_TIME (3 minutes). Although
> the system still hangs for a while, it's much better than hanging for
> 50 minutes consecutively.

a lot.

> 2. resolve the same remote host every time
> When we use the same remote log servers in multiple lines of
> syslogd.conf, syslogd always resolves the same servers again, since it
> treats every line separately. If we can resolve the same servers only
> once in a period like INET_SUSPEND_TIME, and reuse the result in the
> following attempts, that will save a lot of time for gethostbyname.
>
> I don't know if we already have the fix for this critical problem. Any
> information will be helpful for me.

Not yet as far as I know. Your assertion sound valid so I'd be glad
to apply a patch from you.

Regards,

        Joey

-- 
Unix is user friendly ...  It's just picky about its friends.
Received on Tue Nov 16 2010 - 20:51:16 CET

This archive was generated by hypermail 2.2.0 : Tue Nov 16 2010 - 20:57:09 CET