BSOD on DPM server 2012

Our DPM went down today morning with blue screen error. I haven't physical access to it, but it came up after 3 hours. This server having MS patching issue which I haven't fixed yet since these patches failed to install DPM takes 3-4 hours to rollback these changes. So I planned to fix patches on weekend. 

But today server crashed with BSOD, I analyzed dump file which is indicating issue with volsnap.sys file. I Google it and found that this was happened during volume snapshot. Symptoms I found on KB is "you may experience this issue after you enable shadow copy functionality on a volume that is configured with a secondary Data Protection Manager (DPM) server for backup."

This is our primary DPM server, and I am backing up Its databases, C: drive and System state on another DPM server. Just want to confirm it was happened while secondary DPM was backing this? I can see recovery point volume for System state was full on secondary DPM for primary DPM's system state. 

BSOD analysisc :


Microsoft (R) Windows Debugger Version 6.3.9600.17298 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\vijakuma\Desktop\011115-114910-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available


************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       SRV*C:\Windows\symbol_cache*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*C:\Windows\symbol_cache*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 7 Kernel Version 7601 (Service Pack 1) MP (24 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 7601.18247.amd64fre.win7sp1_gdr.130828-1532
Machine Name:
Kernel base = 0xfffff800`0b613000 PsLoadedModuleList = 0xfffff800`0b8566d0
Debug session time: Mon Jan 12 11:34:13.020 2015 (UTC + 5:30)
System Uptime: 34 days 1:14:03.275
Loading Kernel Symbols
...............................................................
................................................................
........................
Loading User Symbols
Loading unloaded module list
................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 19, {21, fffffa802cf22000, 44b0, 0}

Probably caused by : volsnap.sys ( volsnap!VspFreeBitMap+3d )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 0000000000000021, the data following the pool block being freed is corrupt.  Typically this means the consumer (call stack ) has overrun the block.
Arg2: fffffa802cf22000, The pool pointer being freed.
Arg3: 00000000000044b0, The number of bytes allocated for the pool block.
Arg4: 0000000000000000, The corrupted value found following the pool block.

Debugging Details:
------------------


BUGCHECK_STR:  0x19_21

POOL_ADDRESS: GetPointerFromAddress: unable to read from fffff8000b8c0100
GetUlongFromAddress: unable to read from fffff8000b8c01c0
 fffffa802cf22000 Nonpaged pool

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT_SERVER

PROCESS_NAME:  System

CURRENT_IRQL:  0

ANALYSIS_VERSION: 6.3.9600.17298 (debuggers(dbg).141024-1500) amd64fre

LAST_CONTROL_TRANSFER:  from fffff8000b7bb9b2 to fffff8000b688bc0

STACK_TEXT:  
fffff880`029f0088 fffff800`0b7bb9b2 : 00000000`00000019 00000000`00000021 fffffa80`2cf22000 00000000`000044b0 : nt!KeBugCheckEx
fffff880`029f0090 fffff880`00db2dfd : 00000000`00000000 00000000`00000287 fffff880`6d536f56 00000000`00000000 : nt!ExDeferredFreePool+0xfaa
fffff880`029f0140 fffff880`00dcb117 : 00000000`0000007b 00000000`00000000 00000000`0007ccd2 00000000`0f4f0b00 : volsnap!VspFreeBitMap+0x3d
fffff880`029f0170 fffff880`00dcbcd6 : fffffa80`2279eff0 fffffa80`190a1060 fffffa80`2279eff0 00000000`00000000 : volsnap!VspMarkFreeSpaceInBitmap+0x1e7
fffff880`029f0360 fffff880`00dcde11 : fffffa80`1933a180 ffffffff`80003978 00000000`00000000 00000000`00000000 : volsnap!VspOptimizeDiffAreaFileLocation+0x2a6
fffff880`029f06e0 fffff880`00ddd68d : 00000000`00000001 fffffa80`214af901 fffff880`029f08b0 00000000`00004001 : volsnap!VspOpenDiffAreaFile+0x481
fffff880`029f0860 fffff880`00de45e7 : fffff880`029f0bc0 00000000`00000000 00000000`00000000 00000000`00003fe7 : volsnap!VspCreateInitialDiffAreaFile+0x1ed
fffff880`029f08b0 fffff880`00de55d6 : fffffa80`19226180 fffff880`00000000 fffffa80`19226180 fffffa80`19226180 : volsnap!VspTryPrepareForSnapshot+0x737
fffff880`029f0b90 fffff880`00dc30fc : fffffa80`4190b170 00000000`00000000 fffffa80`46a993b0 fffffa80`3d2a4290 : volsnap!VspPrepareForSnapshot+0x116
fffff880`029f0c50 fffff800`0b97f1d3 : fffffa80`19226030 fffff800`0b82e2d8 fffffa80`188adb50 fffffa80`188adb50 : volsnap!VspPostWorker+0x2c
fffff880`029f0c80 fffff800`0b692261 : fffff800`0b82e200 fffff800`0bc09a01 fffff800`0b82e200 fffffa80`188adb50 : nt!IopProcessWorkItem+0x23
fffff880`029f0cb0 fffff800`0b9252ea : 00000000`00000000 fffffa80`188adb50 00000000`00000080 fffffa80`1887a890 : nt!ExpWorkerThread+0x111
fffff880`029f0d40 fffff800`0b6798e6 : fffff880`02271180 fffffa80`188adb50 fffff880`0227c4c0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`029f0d80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16


STACK_COMMAND:  kb

FOLLOWUP_IP: 
volsnap!VspFreeBitMap+3d
fffff880`00db2dfd 4c8b5b08        mov     r11,qword ptr [rbx+8]

SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  volsnap!VspFreeBitMap+3d

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: volsnap

IMAGE_NAME:  volsnap.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4ce792c8

IMAGE_VERSION:  6.1.7601.17514

FAILURE_BUCKET_ID:  X64_0x19_21_volsnap!VspFreeBitMap+3d

BUCKET_ID:  X64_0x19_21_volsnap!VspFreeBitMap+3d

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:x64_0x19_21_volsnap!vspfreebitmap+3d

FAILURE_ID_HASH:  {189b9ea3-05ac-4460-8e69-9d1f7014d79c}

Followup: MachineOwner
---------

0: kd> lmvm volsnap
start             end                 module name
fffff880`00dae000 fffff880`00dfa000   volsnap    (pdb symbols)          c:\windows\symbol_cache\volsnap.pdb\6C762AD2B37146AA848E4A16BD846B852\volsnap.pdb
    Loaded symbol image file: volsnap.sys
    Mapped memory image file: c:\windows\symbol_cache\volsnap.sys\4CE792C84c000\volsnap.sys
    Image path: \SystemRoot\system32\drivers\volsnap.sys
    Image name: volsnap.sys
    Timestamp:        Sat Nov 20 14:50:08 2010 (4CE792C8)
    CheckSum:         000527ED
    ImageSize:        0004C000
    File version:     6.1.7601.17514
    Product version:  6.1.7601.17514
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        3.7 Driver
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft Windows Operating System
    InternalName:     volsnap.sys
    OriginalFilename: volsnap.sys
    ProductVersion:   6.1.7601.17514
    FileVersion:      6.1.7601.17514 (win7sp1_rtm.101119-1850)
    FileDescription:  Volume Shadow Copy Driver
    LegalCopyright:   Microsoft Corporation. All rights reserved.
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 0000000000000021, the data following the pool block being freed is corrupt.  Typically this means the consumer (call stack ) has overrun the block.
Arg2: fffffa802cf22000, The pool pointer being freed.
Arg3: 00000000000044b0, The number of bytes allocated for the pool block.
Arg4: 0000000000000000, The corrupted value found following the pool block.

Debugging Details:
------------------


BUGCHECK_STR:  0x19_21

POOL_ADDRESS:  fffffa802cf22000 Nonpaged pool

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT_SERVER

PROCESS_NAME:  System

CURRENT_IRQL:  0

ANALYSIS_VERSION: 6.3.9600.17298 (debuggers(dbg).141024-1500) amd64fre

LAST_CONTROL_TRANSFER:  from fffff8000b7bb9b2 to fffff8000b688bc0

STACK_TEXT:  
fffff880`029f0088 fffff800`0b7bb9b2 : 00000000`00000019 00000000`00000021 fffffa80`2cf22000 00000000`000044b0 : nt!KeBugCheckEx
fffff880`029f0090 fffff880`00db2dfd : 00000000`00000000 00000000`00000287 fffff880`6d536f56 00000000`00000000 : nt!ExDeferredFreePool+0xfaa
fffff880`029f0140 fffff880`00dcb117 : 00000000`0000007b 00000000`00000000 00000000`0007ccd2 00000000`0f4f0b00 : volsnap!VspFreeBitMap+0x3d
fffff880`029f0170 fffff880`00dcbcd6 : fffffa80`2279eff0 fffffa80`190a1060 fffffa80`2279eff0 00000000`00000000 : volsnap!VspMarkFreeSpaceInBitmap+0x1e7
fffff880`029f0360 fffff880`00dcde11 : fffffa80`1933a180 ffffffff`80003978 00000000`00000000 00000000`00000000 : volsnap!VspOptimizeDiffAreaFileLocation+0x2a6
fffff880`029f06e0 fffff880`00ddd68d : 00000000`00000001 fffffa80`214af901 fffff880`029f08b0 00000000`00004001 : volsnap!VspOpenDiffAreaFile+0x481
fffff880`029f0860 fffff880`00de45e7 : fffff880`029f0bc0 00000000`00000000 00000000`00000000 00000000`00003fe7 : volsnap!VspCreateInitialDiffAreaFile+0x1ed
fffff880`029f08b0 fffff880`00de55d6 : fffffa80`19226180 fffff880`00000000 fffffa80`19226180 fffffa80`19226180 : volsnap!VspTryPrepareForSnapshot+0x737
fffff880`029f0b90 fffff880`00dc30fc : fffffa80`4190b170 00000000`00000000 fffffa80`46a993b0 fffffa80`3d2a4290 : volsnap!VspPrepareForSnapshot+0x116
fffff880`029f0c50 fffff800`0b97f1d3 : fffffa80`19226030 fffff800`0b82e2d8 fffffa80`188adb50 fffffa80`188adb50 : volsnap!VspPostWorker+0x2c
fffff880`029f0c80 fffff800`0b692261 : fffff800`0b82e200 fffff800`0bc09a01 fffff800`0b82e200 fffffa80`188adb50 : nt!IopProcessWorkItem+0x23
fffff880`029f0cb0 fffff800`0b9252ea : 00000000`00000000 fffffa80`188adb50 00000000`00000080 fffffa80`1887a890 : nt!ExpWorkerThread+0x111
fffff880`029f0d40 fffff800`0b6798e6 : fffff880`02271180 fffffa80`188adb50 fffff880`0227c4c0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`029f0d80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16


STACK_COMMAND:  kb

FOLLOWUP_IP: 
volsnap!VspFreeBitMap+3d
fffff880`00db2dfd 4c8b5b08        mov     r11,qword ptr [rbx+8]

SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  volsnap!VspFreeBitMap+3d

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: volsnap

IMAGE_NAME:  volsnap.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4ce792c8

IMAGE_VERSION:  6.1.7601.17514

FAILURE_BUCKET_ID:  X64_0x19_21_volsnap!VspFreeBitMap+3d

BUCKET_ID:  X64_0x19_21_volsnap!VspFreeBitMap+3d

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:x64_0x19_21_volsnap!vspfreebitmap+3d

FAILURE_ID_HASH:  {189b9ea3-05ac-4460-8e69-9d1f7014d79c}

Followup: MachineOwner
---------

January 12th, 2015 8:25pm

I've seen the Bad Pool Header 19 on my DPM servers, mostly those running DPM 2010. My BSODs would usually occur at the same time each night while the protection group was getting backed up. I've never been able to find the root cause, but changing the protection group backup time and then finding that the BSODs followed that time led me to believe that there was an issue with one of the servers in that protection group. On the protected servers, I've found things like DPM could not communicate with the server, or disk space issues either on the protected server or on the DPM storage pool itself. I've also thought that SQL Server 2008 was running out of resources. One other thing, balancing the workload helps. For example, having multiple protection groups with fewer protected servers and setting up separate protection groups for system state and SQL database backups really helped.

To mitigate the problem, I configured the Dell Open Manage Server Administration utility to restart the computer when it senses a blue screen condition. This works well because the server gets restarted and backups continue throughout the night. Good luck with your issue and be sure to report back if you find a solution.



  • Edited by Jarens Monday, January 12, 2015 9:49 PM
Free Windows Admin Tool Kit Click here and download it now
January 12th, 2015 9:42pm

I've seen the Bad Pool Header 19 on my DPM servers, mostly those running DPM 2010. My BSODs would usually occur at the same time each night while the protection group was getting backed up. I've never been able to find the root cause, but changing the protection group backup time and then finding that the BSODs followed that time led me to believe that there was an issue with one of the servers in that protection group. On the protected servers, I've found things like DPM could not communicate with the server, or disk space issues either on the protected server or on the DPM storage pool itself. I've also thought that SQL Server 2008 was running out of resources. One other thing, balancing the workload helps. For example, having multiple protection groups with fewer protected servers and setting up separate protection groups for system state and SQL database backups really helped.

To mitigate the problem, I configured the Dell Open Manage Server Administration utility to restart the computer when it senses a blue screen condition. This works well because the server gets restarted and backups continue throughout the night. Good luck with your issue and be sure to report back if you find a solution.



  • Edited by Jarens Monday, January 12, 2015 9:49 PM
January 12th, 2015 9:42pm

I've seen the Bad Pool Header 19 on my DPM servers, mostly those running DPM 2010. My BSODs would usually occur at the same time each night while the protection group was getting backed up. I've never been able to find the root cause, but changing the protection group backup time and then finding that the BSODs followed that time led me to believe that there was an issue with one of the servers in that protection group. On the protected servers, I've found things like DPM could not communicate with the server, or disk space issues either on the protected server or on the DPM storage pool itself. I've also thought that SQL Server 2008 was running out of resources. One other thing, balancing the workload helps. For example, having multiple protection groups with fewer protected servers and setting up separate protection groups for system state and SQL database backups really helped.

To mitigate the problem, I configured the Dell Open Manage Server Administration utility to restart the computer when it senses a blue screen condition. This works well because the server gets restarted and backups continue throughout the night. Good luck with your issue and be sure to report back if you find a solution.



  • Edited by Jarens Monday, January 12, 2015 9:49 PM
Free Windows Admin Tool Kit Click here and download it now
January 12th, 2015 9:42pm

I dont understand this is issue with volume of datasource this DPM is protecting or secondary DPM which is backing up this affected DPM. 

How I can find that particular volume/datasource which is causing issue. I see hotfix for DPM 2010 but not for DPM 2012.

  • Edited by Vijay MC Tuesday, January 13, 2015 8:36 PM
January 13th, 2015 6:43pm

I dont understand this is issue with volume of datasource this DPM is protecting or secondary DPM which is backing up this affected DPM. 

How I can find that particular volume/datasource which is causing issue. I see hotfix for DPM 2010 but not for DPM 2012.

  • Edited by Vijay MC Tuesday, January 13, 2015 8:36 PM
Free Windows Admin Tool Kit Click here and download it now
January 13th, 2015 6:43pm

I dont understand this is issue with volume of datasource this DPM is protecting or secondary DPM which is backing up this affected DPM. 

How I can find that particular volume/datasource which is causing issue. I see hotfix for DPM 2010 but not for DPM 2012.

  • Edited by Vijay MC Tuesday, January 13, 2015 8:36 PM
January 13th, 2015 6:43pm

I know its a while back now, but did you ever get to the bottom of this issue? I have an identical issue with our Windows 2008 R2 server running DPM 2012 R2 . Same error messages, same exact version of volsnap.sys
Free Windows Admin Tool Kit Click here and download it now
August 5th, 2015 6:30am

Since then I've rebuilt the servers with DPM 2012 R2 with the latest updates and have not seen this problem in the last six months.
August 5th, 2015 9:11am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics