Discussion:
[psad-discuss] Zombie Processes
3Turtles
2014-06-09 00:51:34 UTC
Permalink
My Ubuntu servers are all currently suffering from zombie processes. I
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).

In my psad.conf file i have the noemail configured, but emails are still
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail instead.

Any idea how i can solve this? After a few hours i have around 35
zombie processes.
3Turtles
2014-06-10 22:18:19 UTC
Permalink
Here's what ps is showing me:

UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>

How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie processes. I
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are still
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail instead.
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
Michael Rash
2014-06-11 12:52:43 UTC
Permalink
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information for
IP's that is has flagged. The problem is that the whois client sometimes
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving on so
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that the
whois client itself either already has a 'timeout' parameter (or one can be
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which is
set to 60 seconds by default which seems pretty long. One thing you could
try is disabling whois lookups just to confirm that this is the problem -
use the --no-whois option.

Thanks,

--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie processes. I
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are still
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
------------------------------------------------------------------------------
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
Dan Dickey
2014-06-11 16:35:23 UTC
Permalink
Mike -
The defunct processes have all called exit and are done done done.
They are still hanging around because the parent process (psad?) hasn't
done a wait() call on them to collect the exit information.
I haven't looked at the psad code in some time, but it may be worthwhile
in the loop logic to call waitpid(-1, &status, WNOHANG) periodically.
It would then clean up children processes who have exited.

Just trying to be helpful... I've been using psad on my systems for some time.
Thanks for a quality product and the support you've given it over the years!
-Dan
Post by Michael Rash
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information for
IP's that is has flagged. The problem is that the whois client sometimes
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving on so
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that the
whois client itself either already has a 'timeout' parameter (or one can be
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which is
set to 60 seconds by default which seems pretty long. One thing you could
try is disabling whois lookups just to confirm that this is the problem -
use the --no-whois option.
Thanks,
--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie processes. I
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are still
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
--------------------------------------------------------------------------
---->
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--------------------------------------------------------------------------
---- HPCC Systems Open Source Big Data Platform from LexisNexis Risk
Solutions Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--
Dan A. Dickey
***@icecoldsoftware.com
3Turtles
2014-06-11 18:00:16 UTC
Permalink
I did notice that whois look ups are failing completely maybe that's why
this is happening.

Where do i add the --no-whois switch? Do i add it to its init.d script ?

P.S. Thanks for all the help.
Post by Dan Dickey
Mike -
The defunct processes have all called exit and are done done done.
They are still hanging around because the parent process (psad?) hasn't
done a wait() call on them to collect the exit information.
I haven't looked at the psad code in some time, but it may be worthwhile
in the loop logic to call waitpid(-1, &status, WNOHANG) periodically.
It would then clean up children processes who have exited.
Just trying to be helpful... I've been using psad on my systems for some time.
Thanks for a quality product and the support you've given it over the years!
-Dan
Post by Michael Rash
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information for
IP's that is has flagged. The problem is that the whois client sometimes
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving on so
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that the
whois client itself either already has a 'timeout' parameter (or one can be
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which is
set to 60 seconds by default which seems pretty long. One thing you could
try is disabling whois lookups just to confirm that this is the problem -
use the --no-whois option.
Thanks,
--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie processes. I
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are still
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
--------------------------------------------------------------------------
---->
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--------------------------------------------------------------------------
---- HPCC Systems Open Source Big Data Platform from LexisNexis Risk
Solutions Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
Michael Rash
2014-06-12 03:49:49 UTC
Permalink
Post by Dan Dickey
Mike -
The defunct processes have all called exit and are done done done.
They are still hanging around because the parent process (psad?) hasn't
done a wait() call on them to collect the exit information.
I haven't looked at the psad code in some time, but it may be worthwhile
in the loop logic to call waitpid(-1, &status, WNOHANG) periodically.
It would then clean up children processes who have exited.
Thanks for thinking of this, but should this be required given that psad
just (currently anyway) uses system() to execute the whois client?

https://github.com/mrash/psad/blob/master/psad#L7283

I'll do some more digging - clearly zombies are getting created, and that
implies exactly what you said about psad not doing a wait() against child
processes.
Post by Dan Dickey
Just trying to be helpful... I've been using psad on my systems for some time.
Thanks for a quality product and the support you've given it over the years!
Glad you like psad, and thanks for the feedback.

--Mike
Post by Dan Dickey
-Dan
Post by Michael Rash
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information for
IP's that is has flagged. The problem is that the whois client sometimes
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving on
so
Post by Michael Rash
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that
the
Post by Michael Rash
whois client itself either already has a 'timeout' parameter (or one can
be
Post by Michael Rash
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which is
set to 60 seconds by default which seems pretty long. One thing you
could
Post by Michael Rash
try is disabling whois lookups just to confirm that this is the problem -
use the --no-whois option.
Thanks,
--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie processes.
I
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are
still
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
--------------------------------------------------------------------------
Post by Michael Rash
Post by 3Turtles
---->
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--------------------------------------------------------------------------
Post by Michael Rash
Post by 3Turtles
---- HPCC Systems Open Source Big Data Platform from LexisNexis Risk
Solutions Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--
Dan A. Dickey
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
Michael Rash
2014-06-12 12:48:33 UTC
Permalink
Post by Michael Rash
Post by Dan Dickey
Mike -
The defunct processes have all called exit and are done done done.
They are still hanging around because the parent process (psad?) hasn't
done a wait() call on them to collect the exit information.
I haven't looked at the psad code in some time, but it may be worthwhile
in the loop logic to call waitpid(-1, &status, WNOHANG) periodically.
It would then clean up children processes who have exited.
Thanks for thinking of this, but should this be required given that psad
just (currently anyway) uses system() to execute the whois client?
https://github.com/mrash/psad/blob/master/psad#L7283
I'll do some more digging - clearly zombies are getting created, and that
implies exactly what you said about psad not doing a wait() against child
processes.
Seems like what might be happening is that even though system() is being
used, psad is also wrapping system() with an alarm without also calling
waitpid().

Thanks,

--Mike
Post by Michael Rash
Post by Dan Dickey
Just trying to be helpful... I've been using psad on my systems for some time.
Thanks for a quality product and the support you've given it over the years!
Glad you like psad, and thanks for the feedback.
--Mike
Post by Dan Dickey
-Dan
Post by Michael Rash
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information
for
Post by Michael Rash
IP's that is has flagged. The problem is that the whois client
sometimes
Post by Michael Rash
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving
on so
Post by Michael Rash
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that
the
Post by Michael Rash
whois client itself either already has a 'timeout' parameter (or one
can be
Post by Michael Rash
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which
is
Post by Michael Rash
set to 60 seconds by default which seems pretty long. One thing you
could
Post by Michael Rash
try is disabling whois lookups just to confirm that this is the problem
-
Post by Michael Rash
use the --no-whois option.
Thanks,
--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie
processes. I
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are
still
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
--------------------------------------------------------------------------
Post by Michael Rash
Post by 3Turtles
---->
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--------------------------------------------------------------------------
Post by Michael Rash
Post by 3Turtles
---- HPCC Systems Open Source Big Data Platform from LexisNexis Risk
Solutions Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--
Dan A. Dickey
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
Dan Dickey
2014-06-12 17:17:02 UTC
Permalink
Post by Michael Rash
Post by Michael Rash
Post by Dan Dickey
Mike -
The defunct processes have all called exit and are done done done.
They are still hanging around because the parent process (psad?) hasn't
done a wait() call on them to collect the exit information.
I haven't looked at the psad code in some time, but it may be worthwhile
in the loop logic to call waitpid(-1, &status, WNOHANG) periodically.
It would then clean up children processes who have exited.
Thanks for thinking of this, but should this be required given that psad
just (currently anyway) uses system() to execute the whois client?
If the way you are calling system() guarantees it will do a waitpid(),
then you should not need to call it yourself. However...
Post by Michael Rash
Post by Michael Rash
https://github.com/mrash/psad/blob/master/psad#L7283
I'll do some more digging - clearly zombies are getting created, and that
implies exactly what you said about psad not doing a wait() against child
processes.
Seems like what might be happening is that even though system() is being
used, psad is also wrapping system() with an alarm without also calling
waitpid().
Yes, an alarm can probably cause system() to not wait any further for
the child process, hence the zombies. I haven't looked at the system() code
lately, but that is most likely what is happening.
And in any case, the evidence shows that perl (psad) has zombie children,
so a waitpid() needs to be done to take care of them.
Post by Michael Rash
Thanks,
You're welcome.
-Dan
Post by Michael Rash
--Mike
Post by Michael Rash
Post by Dan Dickey
Just trying to be helpful... I've been using psad on my systems for some time.
Thanks for a quality product and the support you've given it over the years!
Glad you like psad, and thanks for the feedback.
--Mike
Post by Dan Dickey
-Dan
Post by Michael Rash
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information
for
Post by Michael Rash
IP's that is has flagged. The problem is that the whois client
sometimes
Post by Michael Rash
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving
on so
Post by Michael Rash
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that
the
Post by Michael Rash
whois client itself either already has a 'timeout' parameter (or one
can be
Post by Michael Rash
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which
is
Post by Michael Rash
set to 60 seconds by default which seems pretty long. One thing you
could
Post by Michael Rash
try is disabling whois lookups just to confirm that this is the problem
-
Post by Michael Rash
use the --no-whois option.
Thanks,
--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie
processes. I
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are
still
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
-------------------------------------------------------------------------
-
Post by Michael Rash
Post by 3Turtles
---->
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
-------------------------------------------------------------------------
-
Post by Michael Rash
Post by 3Turtles
---- HPCC Systems Open Source Big Data Platform from LexisNexis Risk
Solutions Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--
Dan A. Dickey
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
--
Dan A. Dickey
***@icecoldsoftware.com
3Turtles
2014-06-12 19:37:27 UTC
Permalink
Michael, how long do you think it will take to get this patched?
Post by Dan Dickey
Post by Michael Rash
Post by Michael Rash
Post by Dan Dickey
Mike -
The defunct processes have all called exit and are done done done.
They are still hanging around because the parent process (psad?) hasn't
done a wait() call on them to collect the exit information.
I haven't looked at the psad code in some time, but it may be worthwhile
in the loop logic to call waitpid(-1, &status, WNOHANG) periodically.
It would then clean up children processes who have exited.
Thanks for thinking of this, but should this be required given that psad
just (currently anyway) uses system() to execute the whois client?
If the way you are calling system() guarantees it will do a waitpid(),
then you should not need to call it yourself. However...
Post by Michael Rash
Post by Michael Rash
https://github.com/mrash/psad/blob/master/psad#L7283
I'll do some more digging - clearly zombies are getting created, and that
implies exactly what you said about psad not doing a wait() against child
processes.
Seems like what might be happening is that even though system() is being
used, psad is also wrapping system() with an alarm without also calling
waitpid().
Yes, an alarm can probably cause system() to not wait any further for
the child process, hence the zombies. I haven't looked at the system() code
lately, but that is most likely what is happening.
And in any case, the evidence shows that perl (psad) has zombie children,
so a waitpid() needs to be done to take care of them.
Post by Michael Rash
Thanks,
You're welcome.
-Dan
Post by Michael Rash
--Mike
Post by Michael Rash
Post by Dan Dickey
Just trying to be helpful... I've been using psad on my systems for some time.
Thanks for a quality product and the support you've given it over the years!
Glad you like psad, and thanks for the feedback.
--Mike
Post by Dan Dickey
-Dan
Post by Michael Rash
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information
for
Post by Michael Rash
IP's that is has flagged. The problem is that the whois client
sometimes
Post by Michael Rash
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving
on so
Post by Michael Rash
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that
the
Post by Michael Rash
whois client itself either already has a 'timeout' parameter (or one
can be
Post by Michael Rash
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which
is
Post by Michael Rash
set to 60 seconds by default which seems pretty long. One thing you
could
Post by Michael Rash
try is disabling whois lookups just to confirm that this is the problem
-
Post by Michael Rash
use the --no-whois option.
Thanks,
--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie
processes. I
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are
still
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
-------------------------------------------------------------------------
-
Post by Michael Rash
Post by 3Turtles
---->
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data
Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
-------------------------------------------------------------------------
-
Post by Michael Rash
Post by 3Turtles
---- HPCC Systems Open Source Big Data Platform from LexisNexis Risk
Solutions Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--
Dan A. Dickey
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
Michael Rash
2014-06-14 12:49:39 UTC
Permalink
Post by 3Turtles
Michael, how long do you think it will take to get this patched?
Should have a candidate patch by Monday.

Mike
Post by 3Turtles
Post by Dan Dickey
Post by Michael Rash
Post by Michael Rash
Post by Dan Dickey
Mike -
The defunct processes have all called exit and are done done done.
They are still hanging around because the parent process (psad?) hasn't
done a wait() call on them to collect the exit information.
I haven't looked at the psad code in some time, but it may be worthwhile
in the loop logic to call waitpid(-1, &status, WNOHANG) periodically.
It would then clean up children processes who have exited.
Thanks for thinking of this, but should this be required given that psad
just (currently anyway) uses system() to execute the whois client?
If the way you are calling system() guarantees it will do a waitpid(),
then you should not need to call it yourself. However...
Post by Michael Rash
Post by Michael Rash
https://github.com/mrash/psad/blob/master/psad#L7283
I'll do some more digging - clearly zombies are getting created, and that
implies exactly what you said about psad not doing a wait() against child
processes.
Seems like what might be happening is that even though system() is being
used, psad is also wrapping system() with an alarm without also calling
waitpid().
Yes, an alarm can probably cause system() to not wait any further for
the child process, hence the zombies. I haven't looked at the system() code
lately, but that is most likely what is happening.
And in any case, the evidence shows that perl (psad) has zombie children,
so a waitpid() needs to be done to take care of them.
Post by Michael Rash
Thanks,
You're welcome.
-Dan
Post by Michael Rash
--Mike
Post by Michael Rash
Post by Dan Dickey
Just trying to be helpful... I've been using psad on my systems for some time.
Thanks for a quality product and the support you've given it over the years!
Glad you like psad, and thanks for the feedback.
--Mike
Post by Dan Dickey
-Dan
Post by Michael Rash
Post by 3Turtles
UID PID PPID C STIME TTY TIME CMD
root 1167 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 4689 26489 0 13:59 ? 00:00:00 [sh] <defunct>
root 6781 26489 0 14:38 ? 00:00:00 [sh] <defunct>
root 7072 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 7390 26489 0 14:51 ? 00:00:00 [sh] <defunct>
root 7989 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 8715 26489 0 15:14 ? 00:00:00 [sh] <defunct>
root 10157 26489 0 15:46 ? 00:00:00 [sh] <defunct>
root 10249 26489 0 15:48 ? 00:00:00 [sh] <defunct>
This is most likely an artifact of how psad gathers whois information
for
Post by Michael Rash
IP's that is has flagged. The problem is that the whois client
sometimes
Post by Michael Rash
takes a while to return data because it has to query upstream whois
databases over the network. psad makes the tradeoff that if whois is
taking too long to respond, then it doesn't wait around before moving
on so
Post by Michael Rash
the process becomes a zombie. There is likely a better way to do this
though. I may need to make this more configurable, and I'm hoping that
the
Post by Michael Rash
whois client itself either already has a 'timeout' parameter (or one
can be
Post by Michael Rash
added). There is a variable in the psad.conf file WHOIS_TIMEOUT which
is
Post by Michael Rash
set to 60 seconds by default which seems pretty long. One thing you
could
Post by Michael Rash
try is disabling whois lookups just to confirm that this is the problem
-
Post by Michael Rash
use the --no-whois option.
Thanks,
--Mike
Post by 3Turtles
root 13369 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 13709 26489 0 16:53 ? 00:00:00 [sh] <defunct>
root 15342 26489 0 17:23 ? 00:00:00 [sh] <defunct>
root 15999 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 17398 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 19833 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 23286 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25189 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 25546 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 26489 1 0 Jun09 ? 00:00:18 /usr/bin/perl -w
/usr/sbin/psad
root 26868 26489 0 00:00 ? 00:00:00 [sh] <defunct>
root 28371 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 35755 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36124 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36214 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 36484 26489 0 03:07 ? 00:00:00 [sh] <defunct>
root 41507 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 41513 26489 0 04:52 ? 00:00:00 [sh] <defunct>
root 42148 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 44183 26489 0 05:45 ? 00:00:00 [sh] <defunct>
root 44235 26489 0 05:46 ? 00:00:00 [sh] <defunct>
root 44280 26489 0 05:47 ? 00:00:00 [sh] <defunct>
root 44898 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 45006 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 47485 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 49095 26489 0 07:17 ? 00:00:00 [sh] <defunct>
root 49538 26489 0 07:27 ? 00:00:00 [sh] <defunct>
root 50873 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 51348 26489 0 08:03 ? 00:00:00 [sh] <defunct>
root 51767 26489 0 08:10 ? 00:00:00 [sh] <defunct>
root 52446 26489 0 08:25 ? 00:00:00 [sh] <defunct>
root 53859 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 55522 26489 0 09:27 ? 00:00:00 [sh] <defunct>
root 56889 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 57510 26489 0 10:05 ? 00:00:00 [sh] <defunct>
root 58433 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 59599 26489 0 10:51 ? 00:00:00 [sh] <defunct>
root 60515 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 60786 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 62869 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63332 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63646 26489 0 Jun09 ? 00:00:00 [sh] <defunct>
root 63774 26489 0 12:11 ? 00:00:00 [sh] <defunct>
root 65493 26489 0 12:49 ? 00:00:00 [sh] <defunct>
How do i fix this?
Post by 3Turtles
My Ubuntu servers are all currently suffering from zombie
processes. I
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
narrowed down the culprit to PSAD (sh <defunct>'s parent is psad).
In my psad.conf file i have the noemail configured, but emails are
still
Post by Michael Rash
Post by 3Turtles
Post by 3Turtles
trying to send out and they are failing (i did this on purpose so my
email doesnt get spammed to death) and being sent to my root mail
instead.
Post by 3Turtles
Any idea how i can solve this? After a few hours i have around 35
zombie processes.
-------------------------------------------------------------------------
-
Post by Michael Rash
Post by 3Turtles
---->
Post by 3Turtles
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
-------------------------------------------------------------------------
-
Post by Michael Rash
Post by 3Turtles
---- HPCC Systems Open Source Big Data Platform from LexisNexis Risk
Solutions Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
--
Dan A. Dickey
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
psad-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/psad-discuss
Michael Rash
2014-08-21 03:13:47 UTC
Permalink
Any news on this? And will the release be stable (no crashes/freezing
server)?
It took me a while, but I believe I have fixed the zombie processes
problem. Here is a -pre release of psad-2.2.4 that contains the fix:

https://www.cipherdyne.org/psad/download/psad-2.2.4-pre1.tar.gz

sha256: d734553fa80dfa92125fdd43781d997a84c1dc059ce2e032eafae3e4b0e93afe

Thanks,

--Mike
[off list]
Still working on this, hoping to have it finished by tomorrow.
Thanks,
--Mike
--
Michael Rash | Founder
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F
Loading...