Exchange 2003 - cluster node regularly runs out of NPP memory and fails over...
Hello,
We have an Exchange 2003 cluster made up of 4 active nodes and two passive nodes. We have about 1,000 mailboxes per (active) node split over several databases.
On one of the nodes, OWA/HTTP regularly (say twice a month) runs out of non-paged pool memory and fails over to another node. The average is about 90mb and it peaks at about 120mb before failing over. Since the spread of mailboxes over all nodes is fairly
equal, the usage should be similar and we don't understand why that node only is affected.
The server has the /3GB and /USERVA=3030 switches.
Can anyone please suggest what to look for?
Thanks,
- Alan.
May 11th, 2011 10:01am
We had a server that once was running out of non paged memory. It turned out to be the Symantec product.
We ran Poolmon.exe to see which process were consuming the memory.
http://support.microsoft.com/kb/177415
Free Windows Admin Tool Kit Click here and download it now
May 11th, 2011 2:58pm
Additionaly please reffer to
http://blogs.technet.com/b/dblanch/archive/2009/04/18/paged-and-non-paged-pool-issues-on-exchange-2000-2003.aspx
Dhruv
Dhruv
May 11th, 2011 4:16pm
How is thing going on? Can you locate the problematic application via the poolmon.exe utility?
If there is any progress or question, please feel free to post it here to discuss.
Regards,
Novak Wu
TechNet Subscriber Support in forum
If you have any feedback on our support, please contact
tngfb@microsoft.com Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
May 13th, 2011 3:26am
Well, the thing is I already know the problem application is the owa (http) component of Exchange. So poolmon won't help.
May 13th, 2011 3:29am
Windows IIS will not allow anymore web connections ( OWA ) if the non paged pool mem runs below a certain threshold. This is exactly what happened to us. How I found it was the error logs and the refused connections under c:\windows\system32\logfiles\httperr
Try to run poolmon and post a screen shot.
http://blogs.msdn.com/b/david.wang/archive/2005/09/21/howto-diagnose-iis6-failing-to-accept-connections-due-to-connections-refused.aspx
http://technet.microsoft.com/en-us/library/aa996269(EXCHG.80).aspx
Free Windows Admin Tool Kit Click here and download it now
May 13th, 2011 9:08am
We had a similar problem which turned out to be the SAN HBA drivers.
Neill
May 13th, 2011 9:15am
Thanks but we don't get any refused connections. What happens is that it failsover to another node in the cluster. And that node happily starts to accept connections.
Free Windows Admin Tool Kit Click here and download it now
May 13th, 2011 9:15am
I can be a little stubborn from time to time but I do think you need to look at poolmon.
Have your run the Exchange performance alanlyzer?
http://blogs.technet.com/b/exchange/archive/2005/12/07/415733.aspx
What happens when kernel memory resources are exhausted?
Symptoms of kernel memory exhaustion include:
Slow performance
Server crashes or cluster failovers
Errors that report complete exhaustion of system page table entries (PTEs) or kernel pool memory
May 13th, 2011 9:35am
Paul is correct. OWA may be using a lot more memory than it should but that might be 'real' ram rather than non-paged pool memory.
And the problem is that drivers (such as the HBA mentioned above) won't show up in task manager as using any resources, that's why you have to dig deeper.
By the way what does the Best Practice Analyzer say about your system?
Neill
Free Windows Admin Tool Kit Click here and download it now
May 14th, 2011 11:18am
Running out of pool memory is often caused by drivers.
Each driver consume kernel memory and when the memory is over a certain limit, strange things happen.
Whach pool memory carefully and uninstall unesasary drivers. Every KB count if you're running on the edge.
lasse at humandata dot se, http://anewmessagehasarrived.blogspot.com
May 15th, 2011 8:23am
Has this been resolved?
I am curious.
Paul
Free Windows Admin Tool Kit Click here and download it now
May 23rd, 2011 2:36pm
Device drivers, filter drivers, excessive working set trimming; all of these can cause issues with NPP. Have you considered updating to a version of Exchange that runs on a 64 bit OS?
May 23rd, 2011 5:43pm
Running Poolmon on a production system isn't straightforward what with the registry edits. So we'll live with the occasional failovers and wait for our migration project.
We have dual network cards for both the normal and heartbeat networks, as well as SAN drivers so all these will gobble NPP memory.
Thanks anyway.
Free Windows Admin Tool Kit Click here and download it now
May 24th, 2011 5:03am
Enabling Tag Mode
Before running PoolMon, you must enable pool tagging and then restart your computer. The pool tagging feature collects and calculates statistics about pool memory sorted by the tag value of the memory allocation.
Note It is not necessary to enable pool tagging in Windows Server 2003 as it is enabled by default.
You do not need to modify the registry for windows 2003. You should be able to run all you have to do is run the tool from the folder you extra the file to. I really doesn't do anything. I just tried it again on one of my 2003 servers.
Paul
P - Sorts tag list by Paged, Non-Paged, or mixed. Note that P cycles through each one.
B - Sorts tags by max byte usage.
M - Sorts tags by max byte allocation.
T - Sort tags alphabetically by tag name.
E - Display Paged, Non-paged total across bottom. Cycles through.
A - Sorts tags by allocation size.
F - Sorts tags by "frees".
S - Sorts tags by the differences of allocs and frees.
E - Display Paged, Non-paged total across bottom. Cycles through.
Q - Quit.
May 24th, 2011 8:45am
Any update on this ?
lasse at humandata dot se, http://anewmessagehasarrived.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 2nd, 2011 5:57am
Hi,
We've been experiencing this with a number of our customers; while the problem is experienced with the http virt serv, that is really a symptom of a more general lack of npp memory - the virtserv will shut down when npp is close to exhaustion, but without
a poolmon it's difficult to prove that the virtserv is responsible for that exhaustion - we've had one customer who wasn't using owa at all, but the virtserv shutting down was triggering a failover, as exres pinged it every 5 minutes or so.
if you're in exchange 2003 with the 3gb switch running, then your npp memory is limited to 128MB... nice.
a poolmon is needed here to see what is using that npp. you shouldn't need to do anything in the registry at all if you're on win2k3 - we've never had to:
http://support.microsoft.com/kb/177415
without poolmon output everything else is guesswork, but a common problem i've seen is a huuuge mmcm usage - this is a contiguous memory block grabbed by drivers on startup. removing redundant network cards or storage devices can help reduce this. the ntdebugging
blog has some good stuff on this: http://blogs.msdn.com/b/ntdebugging/archive/2009/10/27/mmcm-a-non-paged-pool-accounting-adventure.aspx
depending on which antivirus product you use, that might be grabbing a bg bunch of it too; we've had issues where as the av drivers grow over time (each release is bigger than the last, it seems) eventually it squishes the amount of npp down to the point
where the system is unstable. change your av.
one thing you could try without doing a poolmon is to enable aggressive memory recycling, with the regstry key in this article:
Start Registry Editor (Regedt32.exe). Locate and then click the following key in the registry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Memory Management
On the Edit menu, click Add Value, and then add the following registry value:
Value name: PoolUsageMaximum
Data type: REG_DWORD
Radix: Decimal
Value data: 60
http://support.microsoft.com/kb/312362
July 4th, 2011 5:48am
Thanks Ishmael, we have redundant network cards for both the public and private/cluster LAN so I think that's the first thing to try changing. Those drivers take a lot of npp memory.
I'll try the poolmon and look at the aggressive memory recycling key asap.
Much appreciated!
Free Windows Admin Tool Kit Click here and download it now
July 4th, 2011 6:05am
It is far easier to run poolman than switch out network card drivers!! Run the poolmone like I said months again. All guess work until you run the tool. without it, you are wasting your time.
Also, if you find the driver, program or what have you, you will most likely not need to modify your registry. "Hacking the reg " should be one of you last attempts.
July 4th, 2011 8:41pm