pidgin.im

Fri Jan 18 11:09:21 EST 2008

On Fri, Jan 18, 2008 at 10:58:58AM -0500, Ethan Blanton wrote:
> Luke Schierer spake unto us the following wisdom:
> > When I noticed this on reaching a computer here at my office, the load
> > average was up over 100, and trac.fcgi was using a steady 99% of the
> > CPU.  Memory did not seem to be an issue, though munin tells me that the
> > amount committed was growing again during that 3 hour block this morning
> > while it could run.
> > 
> > I halted both lighttpd and usher, and killed all mtn jobs to let the
> > system recover.  Once the load was again below 10, which happened
> > relatively rapidly once I was able to kill trac.fcgi and the mtn jobs, I
> > restarted both usher and ligttpd. 
> 
> I also stopped lighttpd at about 0930, because the load average was
> well over 100.  I allowed things to settle for about a minute, and
> didn't have to kill anything else -- the pending cron jobs (including
> mtn updates) all finished rapidly once trac was down.  A restart
> seemed to go OK.
> 
> > This is the second time since the 14th that the committed memory has
> > spiked, something that used to happen regularly but other than one spike
> > about a week ago, had not happened since this past summer.  Does anyone
> > have any idea what has changed recently that might be leaking?
> 
> Was memory the problem, this time?  At 0930, it didn't *appear* to be,
> but I can't swear ... it was really hard to find anything out, because
> the shell was so sluggish.  I got impatient and killed lighttpd,
> because trac was sitting on 100% CPU.
> 
> Ethan

As best I can determine, memory was not a cause this time, I noted that
it was growing only in the interest of completeness, and because I know
that very high memory loads have, in the past, sometimes caused our
overall load to go up.  In this case though, I think it was CPU that was
the problem.

luke