[PATCH 0/2] Fixes for logging remote host failures

From: John Haxby <john.haxby_at_oracle.com>
Date: Fri, 23 Jul 2010 10:42:13 +0100

Hello All,

I had a problem where logging to a remote host was failing silently. The
problem turned out to be a mis-typed hostname in an @hostname forwarding
entry but nothing reported this.

Fixing this uncovered a different problem. I added a called to logerror()
when getaddrinfo() failed and this appeared to work except that the
F_FORW_UNKN retry timeout caused a spectacular failure: after three minutes
syslogd went into a recursive loop reporting an unknown host error. It
turned out that this was because the broken forwarding line started
"*.notice" and the F_FORW_UNKN retry waits three minutes and then starts
using the duplicate log count (f_prevcount) for the number of retries. I
was quite surprised how quickly syslogd managed to blow the stack limit
logging duplicate unknown host failures :-)

The first of the following two patches splits off the retry count into a
separate field in struct filed and resets the timeout interval each time we
retry the getaddrinfo() for the unknown host. The effect now is what I
believe was originally intended: a forwarding hostname is tried every three
minutes for half an hour before giving up on the host.

The second patch does what I wanted originally: failures for getaddrinfo()
are logged. Obviously if no messages are logged locally then you will
never find out what is going wrong. However, it seems to be more common to
have a local copy of a log and that is where you will see the getaddrinfo()
lookup failures.

jch
Received on Fri Jul 23 2010 - 11:42:13 CEST

This archive was generated by hypermail 2.2.0 : Fri Jul 23 2010 - 11:42:46 CEST