DPM 2012 Agent fails on Windows 2003 Server

Hello,

We have a Windows 2003 Server file server that our old DOS machines write out tests to. We would like to back up this data with System Center 2012 Service Pack 1 DPM etc... yet mostly every day (sometimes it works) I get an Critical Error / Alert.

When I click the "View Detailed Errors" link I get the following:

  • Error details: The DPM service was unable to communicate with the protection agent on "SERVER" (ID 52 Details: The semaphore timeout period has expired (0x80070079))
  • The recommended actions are to restart the DPMRA on the client and start DPM CPWrapper if server is configured using certificates. It is not so this doesn't help.

I don't want to have to start the DPMRA every morning so that doesn't help either.

I have checked the Event logs this morning under "Applications and Services Logs --> DPM Alerts" and have looked at the same time stamp and have the following:

  • The back up to tape job failed for the following reasons: (ID: 3311)
  • The DPM service was unable to communicate with the protection agent on "SERVER NAME". (ID: 52)

There was however a "Warning" just prior to the above error that details the following:

  • The DPM protection agent on "SERVER" could not be contacted. Subsequent protection activities for this computer may fail if the connection is not established. The attempted contact failed for the following reason: (ID: 3122)
  • The protection agent operation on "SERVER" failed because the service did not respond. (ID: 316)

I have crawled through all the Error ID's and it said ID: 316 is a firewall issue? When I uninstalled and re-installed the Protection Agent on the Windows 2003 Server a little while ago (in the hope it would fix the issue at hand) I was informed by DPM admin console that this can only be achieved by manually installing it so I followed some instructions on how to do this. Now I'm not sure if this is why I'm having the issues or not but everything seems to check out on the client in terms of the firewall.

Any help on this issue will be appreciated because I would rather it work seamlessly rather than myself having to go into both the Client server and the DPM server and restarting the back ups.

January 28th, 2015 1:37am

Hi,

Have you tried changing any networking components (NIC, cable, changing port etc.) ?  Seems like a packet loss issue.

Try this workaround, increase the TcpMaxDataRetransmissions to 10 or more on both DPM and Protected servers.

How to modify the TCP/IP maximum retransmission timeout
http://support.microsoft.com/kb/170359

Free Windows Admin Tool Kit Click here and download it now
January 28th, 2015 3:11am

Hello Mike,

Thanks for your reply. Not long after I posted my question I came across another post with the same issue that you had previously posted to. I have followed the link you have provided and increased the TcpMaxDataRetransmissions to 10 on both.

I will monitor this over the coming days since it doesn't always fail and will post the results.

January 28th, 2015 3:16am

Hi Mike,

Unfortunately the fix you provided hasn't helped. I'm still receiving the same issues as before.

When I restart DPMRA like it suggests I'm able to run my backups fine again. 

I have noticed in the Events on the server with the issue there is a DCOM issue.

Do you think this may be contributing to the problems I'm having? I have allowed the communication through the firewall.

If you have other fixes I'm happy to have them. For the moment I will monitor it to see after the changes I have made fix the issue.

Thank you.


Free Windows Admin Tool Kit Click here and download it now
February 1st, 2015 4:13pm

Hi,

DPMRA service on the protected server should stop when there are no active jobs running and start on demand when the dpm server initiates a new backup.  Can you verify that the startup type is manual for the dpmra service, then check to see if it properly shuts down between backup jobs. 

February 2nd, 2015 9:44am

Hello,

I can confirm that DPMRA is set to manual. Looking at it this morning I had another failed backup and have noticed that the DPMRA service was still running. I stopped it manually and then went back to the DPM server and was able to re-run the backup job. Seems to be an issue with it shutting down between jobs.

What are the steps involved in fixing DPMRA to ensure it shuts down properly between backup jobs?

Free Windows Admin Tool Kit Click here and download it now
February 2nd, 2015 2:52pm

Hi,

Perhaps something is wrong with service control manager.  In the DPMRA*.Errlog on the protected server, you should see log entries like the following when it's time to shutdown the service.

<snip>
1E24 21CC 02/01 15:05:07.956 05 genericagent.cpp(266) [00000000011F0820]  NORMAL Agent Can Shutdown if there is only default wokitem active[1]
1E24 21CC 02/01 15:05:07.956 29 dpmra.cpp(354) [00000000011F0820]  NORMAL CDPMRA::Shutting down dpmra, force-shutdown :yes
1E24 21CC 02/01 15:05:07.956 03 workitem.cpp(415)   NORMAL Timing out WI [00000000012D61E0], WI GUID = {B71B4544-7067-4A30-B5FB-BA320B10D82A}, ..last DM activity happened 332017500msec back, WI Idle Timeout = 390000msec
1E24 21CC 02/01 15:05:07.956 22 genericthreadpool.cpp(684) [00000000012D9AE0]  NORMAL CGenericThreadPool: Waiting for threads to exit
1E24 21CC 02/01 15:05:09.981 22 genericthreadpool.cpp(684) [00000000011F3AC0]  NORMAL CGenericThreadPool: Waiting for threads to exit
1E24 20C4 02/01 15:05:11.983 03 timer.cpp(513) [00000000012D1508]  ACTIVITY Shutting down timer thread.
1E24 21CC 02/01 15:05:11.983 03 service.cpp(81)   ACTIVITY CService::StopThisService
1E24 21CC 02/01 15:05:11.983 03 service.cpp(281) [0000000000A5F840]  ACTIVITY CService::StopService()
1E24 1710 02/01 15:05:11.983 03 service.cpp(298) [0000000000A5F840]  ACTIVITY CService::AnnounceServiceStatus
1E24 1710 02/01 15:05:11.984 03 runtime.cpp(603) [00000000011F3AC0]  NORMAL CDLSRuntime::Uninitialize, bForce: 1
1E24 1710 02/01 15:05:11.984 03 service.cpp(298) [0000000000A5F840]  ACTIVITY CService::AnnounceServiceStatus
>snip<

 

February 2nd, 2015 3:07pm

Yes I do have entries the same as your post. I did notice this however repeating throughout:

WARNING Failed: Hr: = [0x80070002] : Error trying to open RegKey [HKLM\Software\Microsoft\Microsoft Data Protection Manager\Agent\2.0\Certificates\PROTECTED SERVER]


Free Windows Admin Tool Kit Click here and download it now
February 2nd, 2015 3:33pm

Hi,

Right, if you are not doing certificate based authentication, then those errors are normal errors.

So the issue seems to be that the agent intermittently gets in a nonresponsive state and has to be restarted manually before backup's succeed ?

Please make sure you are running the latest update rollup #8 for DPM 2012 SP1 and update the

February 2nd, 2015 7:50pm

Hello,

Yes that seems to be the problem. Some times the service is running fine and I have no problems. Other times I have to restart the service and kick off the backup again. It tends to fail more than it works though.

Free Windows Admin Tool Kit Click here and download it now
February 2nd, 2015 7:56pm

I can confirm that all the updates have been applied including the "update rollup #8". I have looked at the Protection agent version and it is: 4.1.3465.0. I'm still experiencing the same issue.
February 4th, 2015 3:01pm

Hi,

Is this problem isolated to this single server ?   Do you have other Windows 2003 servers that works OK ?

To me it sounds like DPM agent is a victim of some Windows infrastructure (com/dcom) issue.  Check to see if you have latest COM fixes installed.

Availability of Windows Server 2003 Post-Service Pack 2 COM+ 1.5 Hotfix Rollup Package 12

FIX: The COM+ Event System does not deliver timely or reliable statistics to subscribers of the IComTrackingInfoEvents event interface in Windows Server 2003

The COM+ Event System stops processing the query for matching subscriptions when it detects a corrupted subscription on a Windows Server 2003-based computer

Free Windows Admin Tool Kit Click here and download it now
February 4th, 2015 4:02pm

Sorry for the late reply. Long weekend in my country among other things.

I have tried to install those hotfixes with no luck. I continually get the following message "Not enough storage is available to process this command" and I have looked into fixes for that and apply them but I still get the same message. Unable to get past the message even though I have set "IRPStackSize" to 40 (decimal) and rebooted the server.

We don't have another 2003 server being backed up but I think I may set up a new one over the week and test it. Will be back with news shortly. 

Thanks for all your help to date. Hopefully will have it sorted soon.

 
February 9th, 2015 8:20pm

Hello,

Really sorry for the late reply but there has been some development and I thought I should let you know for your own piece of mind.

Our SAN and some of our server blades were only using one out of the 2 fiber paths available to them. Once these second paths were opened we haven't had any failures of this kind.

We think what the issue may have been was a bottleneck of sorts which was restricting DPMRA communicating somehow? It really is a strange case since it has just started working out of nowhere.

Never the less I'm thankful for your help in this issue.

  • Marked as answer by i6Shot 2 hours 53 minutes ago
Free Windows Admin Tool Kit Click here and download it now
April 22nd, 2015 12:48am

Hello,

Really sorry for the late reply but there has been some development and I thought I should let you know for your own piece of mind.

Our SAN and some of our server blades were only using one out of the 2 fiber paths available to them. Once these second paths were opened we haven't had any failures of this kind.

We think what the issue may have been was a bottleneck of sorts which was restricting DPMRA communicating somehow? It really is a strange case since it has just started working out of nowhere.

Never the less I'm thankful for your help in this issue.

  • Marked as answer by i6Shot Wednesday, April 22, 2015 4:48 AM
April 22nd, 2015 4:48am

Hello,

Really sorry for the late reply but there has been some development and I thought I should let you know for your own piece of mind.

Our SAN and some of our server blades were only using one out of the 2 fiber paths available to them. Once these second paths were opened we haven't had any failures of this kind.

We think what the issue may have been was a bottleneck of sorts which was restricting DPMRA communicating somehow? It really is a strange case since it has just started working out of nowhere.

Never the less I'm thankful for your help in this issue.

  • Marked as answer by i6Shot Wednesday, April 22, 2015 4:48 AM
Free Windows Admin Tool Kit Click here and download it now
April 22nd, 2015 4:48am

Hello,

Really sorry for the late reply but there has been some development and I thought I should let you know for your own piece of mind.

Our SAN and some of our server blades were only using one out of the 2 fiber paths available to them. Once these second paths were opened we haven't had any failures of this kind.

We think what the issue may have been was a bottleneck of sorts which was restricting DPMRA communicating somehow? It really is a strange case since it has just started working out of nowhere.

Never the less I'm thankful for your help in this issue.

  • Marked as answer by i6Shot Wednesday, April 22, 2015 4:48 AM
April 22nd, 2015 4:48am

Hello,

Really sorry for the late reply but there has been some development and I thought I should let you know for your own piece of mind.

Our SAN and some of our server blades were only using one out of the 2 fiber paths available to them. Once these second paths were opened we haven't had any failures of this kind.

We think what the issue may have been was a bottleneck of sorts which was restricting DPMRA communicating somehow? It really is a strange case since it has just started working out of nowhere.

Never the less I'm thankful for your help in this issue.

  • Marked as answer by i6Shot Wednesday, April 22, 2015 4:48 AM
Free Windows Admin Tool Kit Click here and download it now
April 22nd, 2015 4:48am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics