Exchange Average Disk Queue Length
My Microsoft Exchange server 2003 will "radomly" slow down to the point where Outlook clients will intermittently receive the message "Outlook is trying to retrieve data from the Microsoft Exchange server". When I run performance monitor on the server
at these times, Average Disk Queue Length is pegged at 100%. We had an Exchange health check performed and was told everything was fine.
we have about 800 users with 3 storage groups and 11 databases with 80 GB each. We used RAID 5 for DB and RAID 10 for log on the SAN. Does anyone know of anything that would cause this spike of activity and slow down Outlook clients?
Thank you for your help.
February 12th, 2011 1:55pm
You need to get off RAID5 and use RAID10 like you're doing for your logs. There is too much IO write penalty in RAID5 and is not recommended for database partitions. Focus on the physical disk\disk seconds per read and disk seconds per write rather than
the disk queue length. It should be under 10ms. Over 20ms consistently you will start to experience noticble performance issues from the clients.
If you want to do the math to determine if it's disk bottleneck (which it likley is), you will have to calculate how much IOPS per user is and then determine how much IOPS your disks can support in your RAID5 configuration. The article below goes over how
to calculate these values.
A few basic concepts in disk sizing
http://msexchangeteam.com/archive/2004/10/11/240868.aspx
James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
February 12th, 2011 2:20pm
I just added physical disk\disk seconds per read and disk seconds per write for each disk on the performance monitor:
some disks shows physical disk\disk seconds per read as 38 on average, max 312 and some shows 6 on avearge;
and disk seconds per write for all disks are under 10.
So, what's the measure of average valueon these counters? ms?
also, MSExchangeIS\RPC Average Latency is about 11
Thanks.
February 12th, 2011 3:34pm
38 is high and max at 312 is very high. Keep in mind it's the weekend also, I'm not sure if that means less user concurrency where you are. You want to take these measurements during peak load business hours.
PhysicalDisk\Average Disk sec/Read < 10ms with spikes <50ms
PhysicalDisk\Average Disk sec/Write <10ms with spikes <50ms
The whitepaper below has all the perf metrics.
Troubleshooting Microsoft Exchange Server Performance
http://www.microsoft.com/downloads/en/details.aspx?familyid=8679f6bd-7ff0-41f5-bdd0-c09019409fc0&displaylang=enJames Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
February 12th, 2011 4:49pm
On Sat, 12 Feb 2011 19:20:12 +0000, Jamestechman wrote:
>You need to get off RAID5 and use RAID10 like you're doing for your logs.
I was going to say "switch the databases to RAID10 and the logs to
RAID1". RAID10 seems like overkill for log files.
---
Rich Matheisen
MCSE+I, Exchange MVP
--- Rich Matheisen MCSE+I, Exchange MVP
February 12th, 2011 4:55pm
Good point.James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
February 12th, 2011 4:57pm
>PhysicalDisk\Average Disk sec/Read < 10msthat should be shown as 0.01 on average from performance graph?right? thanks.
February 12th, 2011 10:16pm
Can we use the RAID 5 for logs?
BTW, we only have 9 disks on the SAN for exchange and also shared with Apps too.
will more disks help too?
Thank you.
Free Windows Admin Tool Kit Click here and download it now
February 12th, 2011 10:19pm
James,
Thanks for the link.
I enabled write caching on the disk and do you know how I should enable the disk cache on disk controllers?
Is it what I did?
Thank you.
February 12th, 2011 11:05pm
On Sun, 13 Feb 2011 03:19:46 +0000, John JY wrote:
>
>
>Can we use the RAID 5 for logs?
>
>BTW, we only have 9 disks on the SAN for exchange and also shared with Apps too.
>
>will more disks help too?
Maybe. If your IOPS requirement needs, say, 2000 IOPS and you ony have
disks enough to deliver 1,500 then you'll still have a deficit and a
problem.
---
Rich Matheisen
MCSE+I, Exchange MVP
--- Rich Matheisen MCSE+I, Exchange MVP
Free Windows Admin Tool Kit Click here and download it now
February 13th, 2011 1:17am
Yes that's 0.01 would be 10ms on the graph. Yes write caching should be enabled which is why you see typically see lower latency times well below 10ms.James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
February 13th, 2011 12:17pm
due to exchange 2003, only 4 storage groups. also, recommended to use less than 100 GB for each database.
We have 3 SG and 11 DBs with 100 GB less and on SG with PUB.
Will less DBs with 0ver 100 GB help?
Thank you.
Free Windows Admin Tool Kit Click here and download it now
February 13th, 2011 4:22pm
>Yes write caching should be enabled which is why you see typically see lower latency times well below 10ms.
do you know whether I enable through disk management or BIOS?
Thank you.
February 13th, 2011 4:36pm
i think is :
>PhysicalDisk\Average Disk sec/Read < 20ms 0,02 in perfmon is a problem
>PhysicalDisk\Average Disk sec/Read < 50 ms 0,05 inperfmon is a big problemmcse 200x + mesaging 2000 2003 2007 2010
Free Windows Admin Tool Kit Click here and download it now
February 13th, 2011 5:03pm
Typically through the controller array management software. Less DBs over 100GB doesn't equate to less IOPS. IOPS is based on I\O activity not the size of the DB so it won't help.James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
February 13th, 2011 6:39pm
On Sun, 13 Feb 2011 21:22:22 +0000, John JY wrote:
>
>
>due to exchange 2003, only 4 storage groups. also, recommended to use less than 100 GB for each database.
>
>We have 3 SG and 11 DBs with 100 GB less and on SG with PUB.
>
>Will less DBs with 0ver 100 GB help?
IOPS isn't a function of database size. I/O activity is a result of
the activity in the database.
A smaller database will typically have fewer mailboxes in it so it's
reasonable to assume that fewer mailboxes would result in less
activity. But you'd still need to have an adequate number of disks to
support the IOPS.
---
Rich Matheisen
MCSE+I, Exchange MVP
--- Rich Matheisen MCSE+I, Exchange MVP
Free Windows Admin Tool Kit Click here and download it now
February 13th, 2011 6:55pm
I monitored so far and we do have high disk latency than 20ms.
Another obsevation: If users send large attchamnt (>100MB), the server just queued up and the total (paged + nonpaged pool )reaches over 200MB and server could not process any message and very slow. We end up to reboot the server to allow the messages
in the queue to process. We have 4 GB RAM and with /3GB /USERVA=3030 enabled.
Is this normal behavior?
Thank you.
February 17th, 2011 10:25am
Yes that would be expected, sending that big can cause log buffer stalls and your version store memory gets backed up leading to the typicall event ID 623.James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
February 17th, 2011 11:47am
What speed hard drives? you will want to be using 15k RPM and not 7.2's...
February 17th, 2011 11:54am
On Thu, 17 Feb 2011 15:25:25 +0000, John JY wrote:
>I monitored so far and we do have high disk latency than 20ms.
>
>Another obsevation: If users send large attchamnt (>100MB), the server just queued up and the total (paged + nonpaged pool )reaches over 200MB and server could not process any message and very slow. We end up to reboot the server to allow the messages
in the queue to process. We have 4 GB RAM and with /3GB /USERVA=3030 enabled.
>
>Is this normal behavior?
Were you monitoring the performance at the time that 100MB message was
sent? Looking at an "all-day average" might show you that everything
looks okay, but if you narrow the window and frame the time the
message was sent and the time the message was received you may see
something different.
Is this "normal"? well, I don't thingk 100MB messages are normal. :-)
If you're seeing paged/non-paged memory pools increasing during that
time I'd suspect I/O problems. I/O operations consume those pools
while the buffers are being written. Only when the I/O completes can
the control structures for the I/O be removed. Having lots of
outstanding I/O operations can be a real problem if you run out of
paged/non-paged memory pool space.
Also, have a look at whatever Exchange-aware anti-virus you're
running. Those messages have to scanned by the A-V software, too.
---
Rich Matheisen
MCSE+I, Exchange MVP
--- Rich Matheisen MCSE+I, Exchange MVP
Free Windows Admin Tool Kit Click here and download it now
February 17th, 2011 11:00pm
It's just very difficult to set size limit here. Can you share how big of attachment you are limiting to send? (by default, it's 10MB)
We try to make a 50 MB limit. But, is it still too big?
Thank you.
February 22nd, 2011 3:04pm
The avg between organizations or isps is 10-20M. 50MB is pretty big, email should not really be used as file transfers, opt for other solutions if you need to send that big such as ftp.James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
February 22nd, 2011 3:13pm
Do you know whether there is a way to limit internal message size which internal users send each other instead of outbound and inbound message size?
Thank you.
February 22nd, 2011 5:42pm
Yes, you would have to increase each recipients send restriction higher than the global limit.James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
February 22nd, 2011 5:53pm
If I set global limit 10MB, users can be set to send larger than 10MB?
What's the differerence between the size limit set on the connector and SMTP virtual server?
Can the size limit set on the connector overwrite the global limit?
Thank you.
February 22nd, 2011 9:22pm
Messages can traverse an SMTP connector but not necessarily a connector.
No message limits on the connector cannot overwrite the global limit. If the larger limit message leaves a connector and tries to get out the internet the global limit will stop it.
How to set size limits for messages in Exchange Server
http://support.microsoft.com/kb/322679James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
February 23rd, 2011 2:29pm
What if you put the base OS on its own drive(RAID 1), logs on their own(RAID1), and the DBs on their own(RAID 5)?
February 28th, 2011 9:34pm
Other good reads:
Ruling Out Disk-Bound Problems (Exchange 2003)Monitoring Mailbox Servers (Exchange 2007)
Mike Crowley | MVP
My Blog --
Planet Technologies
Free Windows Admin Tool Kit Click here and download it now
June 19th, 2012 11:35am