syslog collection on a 1500+ node cluster

From: Jon Stearley (jrstear@sandia.gov)
Date: Tue Sep 18 2001 - 21:13:12 CEST


i've got 1500+ disk-less linux nodes i must collect syslog output from
(http://www.cs.sandia.gov/cplant/). their ethernet interfaces are
roughly arranged like this:

  compute_node_0 ---\
  ... ---->--- admin_node_0 ---\
  compute_node_31 ---/ ... ---->--- syslog_node (has a disk)
                           admin_node_48 ---/

a while back someone here tried pointing the compute_node syslogs at
their admin_node, and admin_nodes then pointed at the syslog_node
(using -h), which receives it all and sticks it on disk. he told me
that this completely overwhelmed syslogd (on both admin and syslog
nodes), although i've not personally tried it (yet).

before i try a config (from which i'll provide "barf" details if they
occur), i wanted to query for advice. i'd like to not only collect
the syslogs onto a single host, but save them into separate files (ie:
per-admin_node, if not per-compute_node).

1. any collection strategy advice?

   ie: if i could have each admin_node log to a different port on the
   syslog_node, perhaps i could just spawn multiple syslogds there,
   each listening to appropriate ports, or bump MAXFUNIX and try it
   with just one syslogd). not sure if this is feasible,
   in particular, how to aim syslogd at different remote ports.

2. how can the bottlenecks be eased?

   ie: if syslogd would buffer messages to some limit before
   transmission, and then send them all in a batch, perhaps this would
   alleviate the bottlenecks? ideally, the buffer_flush_size would be
   a runtime option, so it could be modified as needed (ie- a host
   dies and the logs are lost from the unflushed buffer, decrease the
   buffer size). i'm not sure if the code, or transmission protocol,
   would allow this. i'm willing and able to hack, i'm just not sure
   the feasibility of this buffering idea.

some here think we should buy something like Consoleworks
(http://www.tditx.com/products_consoleworks.html, experiences anyone?)
for the syslog_node, thinking it'll scale better (?).

i'm wide open for advice. Thanks!

-- 
+--------------------------------------------------------------+
| Jon Stearley			(505) 845-7571  (FAX 844-2067) |
| Compaq Federal LLC		High Performance Solutions     |
| Sandia National Laboratories	Scalable Systems Integration   |
+--------------------------------------------------------------+

PS-

> A monthly archive of this list is available at > ftp://ftp.infodrom.north.de/pub/usenet/mailing-lists/sysklogd.yymm

the archives are unavailable.



This archive was generated by hypermail 2.1.2 : Tue Sep 18 2001 - 21:14:42 CEST