2012 SP 2 Issue

I have a warehouse that was running perfectly under 2012 SP1, it uses CDC with net enabled on the source tables to do incremental loads into the warehouse.

We migrated the databases over to a new SQL 2012 SP2 build (also later applied the hot fix for index rebuild), we are using SSIS and the CDC components for our integration.

So once we migrated a few of our tables were hanging on the CDC Source component beyond our timeout (around 10 minutes), typical CDC source time is 15 seconds.

I then tested the query output in the sis output in management studio and boom it works in under 15 seconds as expected,

I then did a Perlman on it and captured the real queries from SSIS, and the get_net_changes TVF call was wrapped in an sp_execute call, I copied it out and ran it in a SSMS window and it hangs, so I've eliminated everything but the query engine and how it behaves with sp_executesql, if I modify the sp_Executesql to include option (recompile) the sp_executesql for the TVF works in under 15 seconds as well.

Running a dbcc freeproccache just prior to running sp_executesql has no effect either.

What I'm believing is that either plan cache utilization or the plan being created is so horrid that the query never finishes, I've let ssms run for over 30 minutes and still not got a plan back.

Since I can't change the SSIS component to use option (recompile) nor is option recompile valid in a TVF (inline) as they are currently written, I'm more concerned that the behavior is the sign of a larger problem in the optimizer with SP2.

Any guidance in resolving this would be appreciated, at the moment we are considering downgrading the new box to SP1.

Additional information:

Tables giving us trouble are fairly wide, have fairly large PK (from a JDE system), and are fairly high volume change wise, if there are no records or very low records in the CT tables then the functions still perform fast, they also perform as expected when using the ALL instead of Net option.

Thanks

July 19th, 2014 7:21am

As with all performance we would need to see the query and the query plans. Preferrably, the actual ones, but it appears that your patience does not permit you get the actual plans for the problematic query. Any chance you can let it run overnight? But at least you can share the actual plan when you use OPTION (RECOMPILE).

It is not practical to post the plans here (we want the XML, not the graphics), but maybe you can find some place on Skydrive, Dropbox or similar?

Free Windows Admin Tool Kit Click here and download it now
July 19th, 2014 8:32am

By the way, this is issue may or may not be related to SP2. It may be as simple that it is because you restarted SQL Server the plan cache was flushed, and your statistics was not up to date. So try UPDATE STATISTICS, preferably WITH FULLSCAN, on all the involved tables.

July 19th, 2014 8:34am

Here are a few other things I've tried:

1) different database same table layouts

2) different table name, i.e. a fresh table

3) updating statitstics, rebuilding indexes etc.

4) I let it run all night last night and after 15 hours of running (I had thought the same thing just let it run long enough), it still hadn't completed, I cancelled the query in SSMS and it had returned 3 rows.

5) ran the query with the optimize recompile option took under 2 minutes (the longer its not run properly the more transactions build up in the ct table) and it returned just over 20k rows

6) did the update statistics again just for giggles as asked, and kicked it off its now been 15 minutes and not a single row returned yet.

It's worth noting we have somewhere around 60 table sin this warehouse, and only 2 or 3 are giving bad performance, but all the other tables work fine since they return a low row count every time, when the customer does a restatement of their GL it generates a large volume of balance transactions, even with optimizing out the ones where balances don't change.

Also worth noting, wiping stats on the SP1 etc. and it still performs very snappy.

Thanks for your time.

File:

https://onedrive.live.com/redir?resid=3BCF8EAFC423E576!29506&authkey=!ABIizCmyztyakqk&ithint=file%2c.xml

Free Windows Admin Tool Kit Click here and download it now
July 19th, 2014 2:25pm

So if I get this right, the full batch is:

  EXEC sp_excutesql N'SELECT ...',
  N'@cs varchar(varchar(22)), @ce varchar(22), @mode tinyint',
  '0x000C09D5000001CE0016', '0x000C2625000001F80007', @mode

Since the estimates are good for this plan, statistics are likely to be good. And if this is the batch, the optimizer can sniff these values to make the estimates. But there is one thing it can do when OPTION(RECOMPILE) is in effect, which it can't do without: and that is to shortcut any parts of the query that depends on @mode (for instance CASE @mode THEN...), as it most produce a plan that works for all values of @mode.

Well, there is also the possibility that it does not perform computation of those convert() expressions, leading to crazy estimates.

So for kicks, you could try:

* Pass @mode as a constant.
* Change the query, so that it uses binary(10) parameters and pass the binary values.

I am sure that it will not help you, and since you have a working situation in SP1 to compare with, it seems opening a case with Microsoft may be called for.

July 19th, 2014 7:49pm

Mode is a nvarchar, this is the standard CDC functions MS generates when you enable CDC for a table.

I did some parameter testing and here is the results:

1) adding an additional fake parameter (and including it in the column list so it doesn't get optimized out, was no effect.

2) changing mode to be static and removing it as a parameter was no effect

3) changing CE (the top bound of the query) to static was no effect

4) changing CS (the start of the bound for the query) to static fixed the query.

I have also included the SP1 query plan here for the same table, just different LSN's for CS and CE

https://onedrive.live.com/redir?resid=3BCF8EAFC423E576!29507&authkey=!AD9wEHkxR3eNBnk&ithint=file%2c.txt

Guess we will open a ticket with MS on Monday assuming they have a support contract going at the moment.

Everything were doing here is basic out of the box CDC with SQL enterprise and SSIS components, if I had written it all custom without the SP_ExecuteSQL as the CDC Source uses then we would of never seen this issue, lol.

John

Free Windows Admin Tool Kit Click here and download it now
July 19th, 2014 10:51pm

The two plans are entirely different. And it is interesting to note that the SP1 plan suffers from a misestimate. I thinks the @ce/@cs condition will hit one row, but it produces 6000 and the sort of these rows (which appears multiple times in the plan) spills to disk.

There is also another difference: the execution of the SP1 query produces 1000 rows, whereas the SP2 query produces 20227 rows.

So I am wondering if this cannot be a statistics thing after all. Have you updated the statistics on the table cdc.dbo_F092_MIN_CT?

Can you upload the estimated plan for the query on SP2? That is, the query that does not complete? This may not work with sp_executesql, but you could put the code in a temporary stored procedure and get the estimated plan for the procedure.

July 20th, 2014 7:57am

QP = Query Plan

QP SP2 performed a INDEX SCAN + KEY LOOKUP instead of CLUSTERED INDEX SEEK (QP SP1).

I'll start a research to check it out, but is a good idea to open a MS Case.

Free Windows Admin Tool Kit Click here and download it now
July 20th, 2014 7:37pm

I'll try to get one open tomorrow assuming the client has a support contract and all.  In SP1 given the nature of CDC we have had cycles of incremental loads take longer than expected due to unusual load level, but with SP2 the query never returns, 15+ hours and it has resolved 3 rows, their is no other user on the system than me and this one query, and the entire table should fit into memory, that's just amazingly bad.

The QP for sp2 where I did the option recompile was kicked off just after a statistics update fullscan,  I will later try to capture the query plan while it is running if the dmv will give it up with sp_whoisactive.

The reason the SP1 has only 1000 rows is that I am not feeding SP1 changes anymore, so I manually modified 1000 rows in the CDC source table to give it data, in SP2 anything over somewhere between 500-1000 rows the query won't return anymore.

But I agree if I can get a MSFT support person in a gotomeeting they can gather more info, I have full SQL DIag traces from while it was running until the query was cancelled, and perfmon trace as well, oddly enough the server doesn't go crazy for 15 hours, it never maxxes CPU but is frequently giving up its time slice and flopping between that and completion IO waits.

John

July 20th, 2014 9:48pm

and the entire table should fit into memory, that's just amazingly bad.

Yes, but if the optimizer thinks that it will only get one row to sort, it will set the memory grant according, and if the sort does not fit, it will spill to disk.

Free Windows Admin Tool Kit Click here and download it now
July 20th, 2014 9:56pm

We have opened a support incident with MS, I will let you know how it resolves as long as we get it resolved.

John

July 21st, 2014 3:27pm

We are now in the RFC stage for the development team, for us we had to downgrade to SP1 CU11 to avoid the issue until an official resolution can be determined, the problem was fully reproducible on MS servers so now its a waiting game.

Thanks again for your help,

Free Windows Admin Tool Kit Click here and download it now
July 24th, 2014 2:59pm

I'm following up the case .......898.

Let's wait for the sol

July 24th, 2014 3:02pm

We are now in the RFC stage for the development team, for us we had to downgrade to SP1 CU11 to avoid the issue until an official resolution can be determined, the problem was fully reproducible on MS servers so now its a waiting game.

Thanks for keeping us updated!

Free Windows Admin Tool Kit Click here and download it now
July 24th, 2014 10:19pm

Hello all,

As per the discussion we have had with the Product Group, the issue is When executing this SP_EXECUTESQL statement on a table valued function that can return a varying amount of rows the compiler uses a bad plan causing the performance of the query to almost never return.

The fix will be made avilable in SQL 2012 SP2 Cumulative update 2. Most likely the CU 2 will be made available in second week of September 2014.

For now the work around available are:

1. Downgrade SQL Server to SP1 latest Cumulative update
2. Use the Planguide with "Option (Recompile)" query hint for the query which is timing out.

Disclaimer: This content is provided "as-is" and without warranties of any kind, either express or implied. You should therefore verify any information contained in the content before acting on it.

August 20th, 2014 1:55pm

Hi all,

I am running into the same issue in a production environment that is currently running SQL 2012 SP2 11.2.5058.0. I took a look at the CU 2 Update and didn't see any reported fixes for this issue. Has this issue been resolved yet?

Thanks!

Free Windows Admin Tool Kit Click here and download it now
April 29th, 2015 9:02pm

I now notice that the issue was fixed in CU 3.

Thank you.

April 30th, 2015 5:20pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics