Hi folks,
Recently I'm struggling with a problem that syslogd hangs the system.
syslogd is running on our system, and we configured a couple of remote
log servers in /etc/syslogd.conf. We found syslogd was stuck in doing
gethostbyname when our DNS servers are not reachable, so blocked all
processes writing logs to syslogd.
After digging a little bit, I found for each remote log host,
gethostbyname takes 20 seconds to return until timeout when one DNS
server is unreachable, 40 s if two DNS servers cannot be reached.
Since we have many lines doing remote logging in /etc/syslogd.conf,
syslogd takes a lot of time for gethostbyname and hangs the system.
By searching Internet, this problem looks very popular. Many people
ran into it. However, I cannot find the solution for it. I'm wondering
if there is already some fix to address this problem?
By looking at the 1.5 code, I found two problems.
1. f->f_time is not updated in the case F_FORW_UNKN at line 1820.
This makes it do gethostbyname 10 times consecutively if the logging
messages come in the high rate. Let's say 3 DNS servers are not
reachable, 5 lines in syslogd.conf use remote server. Then syslogd
will have to take 3 * 20 * 5 * 10 = 50 minutes for gethostbyname. The
thing will get even worse if more remote servers and DNS servers are
used.
If f->f_time is updated every time we hit case F_FORW_UNKN, we can
distribute these lookups every INET_SUSPEND_TIME (3 minutes). Although
the system still hangs for a while, it's much better than hanging for
50 minutes consecutively.
2. resolve the same remote host every time
When we use the same remote log servers in multiple lines of
syslogd.conf, syslogd always resolves the same servers again, since it
treats every line separately. If we can resolve the same servers only
once in a period like INET_SUSPEND_TIME, and reuse the result in the
following attempts, that will save a lot of time for gethostbyname.
I don't know if we already have the fix for this critical problem. Any
information will be helpful for me.
I attach our syslogd.conf.
Thanks!
Yinglin
----------------------------------------------------------------------------------------------------------------------------------------------------------------
*.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
/ddr/var/log/messages
*.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
@abcdefgsldkjf.com
*.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
@kdsljfdjff.com
*.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
/ddr/var/log/debug/messages.support
*.notice;auth.info,mail.none,news.none,authpriv.none,cron.none,kern.none,local1.none,local3.none,local4.none
/ddr/var/log/debug/messages.engineering
local1.notice
/ddr/var/log/messages
local1.notice
@abcdefgsldkjf.com
local1.notice
@kdsljfdjff.com
local1.notice;local3.notice
/ddr/var/log/debug/messages.support
local1.notice;local3.notice;local4.*
/ddr/var/log/debug/messages.engineering
authpriv.*
/ddr/var/log/debug/secure.log
mail.*
/var/log/maillog
cron.*
/var/log/cron
*.alert;local3.none;local4.none *
*.alert;local3.none;local4.none
@abcdefgsldkjf.com
*.alert;local3.none;local4.none
@kdsljfdjff.com
uucp,news.crit
/var/log/spooler
*.alert;local3.none;local4.none
|/ddr/dev/ems_pipe
kern.alert
|/ddr/dev/kmsg_pipe
local2.notice
/dev/console
kern.*
/ddr/var/log/debug/platform/kern.info
kern.*
@abcdefgsldkjf.com
kern.*
@kdsljfdjff.com
kern.error
/ddr/var/log/debug/platform/kern.error
kern.error
@abcdefgsldkjf.com
kern.error
@kdsljfdjff.com
local6.*
/ddr/var/log/debug/cifs/cifs.log
Received on Tue Nov 16 2010 - 03:27:24 CET
This archive was generated by hypermail 2.2.0 : Tue Nov 16 2010 - 03:27:26 CET