Help needed in Performance Tuning

Version : 2008 R2

Hi,
I am trying to build a salable proc which does the computation logic and currently i have 50k user. I am thinking in future perspective and targeting this logic for 500k users.

Sample table schema and test data

create table Comp_Detail(IDcompDetail int identity(1,1) primary key,IDcompAFR int,CompID int default 1050,UserId bigint,
TransferAmount money, processStatus bit default 0,AmtTransferDate datetime);


--===== Create and populate the balance table on-the-fly
;WITH
cteRowSource1 AS
(
 SELECT TOP 500000
        N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
   FROM      master.sys.all_columns ac1
  CROSS JOIN master.sys.all_columns ac2
)
INSERT into Comp_Detail(IDcompAFR,CompID,UserId,TransferAmount)

 SELECT  1000, 1050,
 UserId           = ISNULL(N,0)
        ,TransferAmount   = N*10
     FROM cteRowSource1

--	 select * from Comp_Detail

--===== Create and populate the balance table on-the-fly

Create table User_bank(IDUserBank int identity(1,1) primary key, UserId bigint,Amount_Pend money,Amount_Available money,
LastModDt  datetime);

;WITH
cteRowSource AS
(
 SELECT TOP 500000
        N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
   FROM      master.sys.all_columns ac1
  CROSS JOIN master.sys.all_columns ac2
)
Insert into User_bank(UserId,Amount_Pend,Amount_Available)

 SELECT UserId           = ISNULL(N,0)
        ,PendingAmount   = N*10
        ,AvailableAmount = N*2
   FROM cteRowSource

;-- select * from member_balance;

update Comp_Detail set IDcompAFR = 1001 where IDcompDetail > 10000 and IDcompDetail < 20000 ;

update Comp_Detail set IDcompAFR = 1002 where IDcompDetail > 20000 and IDcompDetail < 30000;

update Comp_Detail set IDcompAFR = 1003 where IDcompDetail > 30000 and IDcompDetail < 40000;

update Comp_Detail set IDcompAFR = 1004 where IDcompDetail > 40000 and IDcompDetail <50000;

update Comp_Detail set IDcompAFR = 1005 where IDcompDetail > 50000 and IDcompDetail < 60000;

My logic below,

Declare @CompID int = 1050;

BEGIN
	-- Check if any data available to be processed
	IF EXISTS (
			SELECT TOP 1 IDcompAFR
			FROM Comp_Detail 
			
			WHERE CompID = @CompID
				AND coalesce(processStatus, 0) = 0
			)
	BEGIN
		BEGIN TRY
			-- Set it so if the first UPDATE fails,  we won't even start the second update.This really says "If we're in a transaction
			-- and something fails, stop processing the transaction and do a rollback if we can".
			SET XACT_ABORT ON;

			-- temp variable to hold the actual data. this will be used to get IDcompAFR once the balance updated
			DECLARE @ActualData TABLE (
				UserId BIGINT
				,IDcompAFR BIGINT
				,ProcessingAmount MONEY
				);
			-- table variable to capture the Affected UserId's
			DECLARE @AffecedRecords TABLE (UserId BIGINT);

			BEGIN TRANSACTION;

			-- Get the whole data to be processed. 
			INSERT INTO @ActualData (
				UserId
				,IDcompAFR
				,ProcessingAmount
				)
			SELECT UserId
				,IDcompAFR
				,ProcessingAmount = COALESCE(TransferAmount, 0)
			FROM  Comp_Detail 
			WHERE CompID = @CompID
				AND coalesce(processStatus, 0) = 0
				;
			
			-- Aggregare the ProcessingAmount based on UserId
			WITH AggregateData
			AS (
				SELECT UserId
					,ProcessingAmount = SUM(COALESCE(ProcessingAmount, 0))
				FROM @ActualData
				GROUP BY UserId
				)
			--Do the Amount update and capture the UserId that are affected.
			UPDATE UB
			SET UB.Amount_Available = COALESCE(UB.Amount_Available, 0) + AD.ProcessingAmount
				,UB.Amount_Pend = COALESCE(UB.Amount_Pend, 0) - AD.ProcessingAmount
				,LastModDt = getdate()
			OUTPUT deleted.UserId
			INTO @AffecedRecords(UserId)
			FROM User_bank UB
			INNER JOIN AggregateData AD ON UB.UserId = AD.UserId;

			--===== Using the captured UserId get the IDcompAFR from @ActualData temp variable
			-- and then update the processStatus = 1 
			---  means OFR processed for the trip . 
			UPDATE Comp_Detail
			SET processStatus = 1
				,AmtTransferDate = getdate()
			WHERE IDcompAFR IN (
					SELECT DISTINCT AD.IDcompAFR
					FROM @ActualData AD
					INNER JOIN @AffecedRecords AR ON (AD.UserId = AR.UserId)
					)
				AND processStatus = 0;

			COMMIT TRANSACTION;
		END TRY

		BEGIN CATCH
			DECLARE @ErrorMessage NVARCHAR(4000);
			DECLARE @ErrorSeverity INT;
			DECLARE @ErrorState INT;

			SELECT @ErrorMessage = ERROR_MESSAGE()
				,@ErrorSeverity = ERROR_SEVERITY()
				,@ErrorState = ERROR_STATE();

			ROLLBACK TRANSACTION;

			RAISERROR (
					@ErrorMessage
					,@ErrorSeverity
					,@ErrorState
					);
		END CATCH;
	END
END
GO
the query logic takes 20 + minutes and it keeps on running. not sure what mistake i did. any suggestion to improve the speed please

January 25th, 2015 12:30pm

First step is to see what the Estimated Execution Plan is for this (I would convert to a query with variables).  What operation is taking the greatest amount of resources and time.
Free Windows Admin Tool Kit Click here and download it now
January 25th, 2015 1:03pm

You need to take some care when you design your tables.

1) For each column, make a concious decision whether you permit NULL or not. If you are considering to permit NULL, ask yourself whether you have a good understanding of what NULL means for that column. For instance, you work with processStatus which is a bit column. Is there a difference between 0 and NULL? If there is, what it is? If it is not, don't permit NULL.

2) You need to find which columns to index. I don't know that all this means, but CompId and UserId both appears to be foreign keys and so is IDcompAFR. Not the least should you make a decision what should be the clustered index. Currently, it is on the IDENTITY column which is good for concurrent inserts, but it is not an equally good choice for this operation. Maybe the clustered index should have CompID as the leading column. On the other hand, if processStatus = 0 will be a rare condition (because only new rows have this status), maybe the index on CompID should be filtered.

3) The story of when to use table variables and when to use temp tables is a long and confusing one. But a rule of thumb is that if you expect high volumes, temp tables usually work out better, because they have statistics. For table variables, SQL Server may estimate a single row which can be very wrong if they have lots of them.

4) The first two points do not apply only to your permanent tables, but just as well to the work tables.

5) I don't see much point with the table variable @ActualData. Possibly it would be a good idea to persist the aggregation you have in the CTE AggregateData.

6) The table variable @AffectedRecords could be created in this way:

   DECLARE @AffecedRecords TABLE
     (UserId BIGINT PRIMARY KEY UserID WITH (IGNORE_DUP_KEY = ON));

IGNORE_DUP_KEY is a funny option, but this is one of the few places it makes sense. This will reduce the size of the table variable.

January 25th, 2015 1:25pm

Hi Russ,

Thanks for your reply. 

First point, i would like to know, Do you have any comment on the way i wrote the script ? Is it scalable?

I tried with 100k users and it ran in 6 seconds. 200k records ran in 13 seconds.

I couldn't show the execution plan as xml as it is big data. Below is the break up,

Nested Loop Inner join : 33%

Index scan(non clustered] comp_detail.processstatus : 66%

rest all 0%.

any suggestion how to improve this please.

 

Free Windows Admin Tool Kit Click here and download it now
January 25th, 2015 1:45pm

First point, i would like to know, Do you have any comment on the way i wrote the script ? Is it scalable?

Again, having the indexes right is the most important.

Also, use temp tables rather than table variables.

The script is likely to be scalable up to some point where you exhaust the memory in the server and operations need to spill to disk. A common approach to avoid this is to divide the operation into batches.

January 25th, 2015 2:43pm

Hi Erland,

thanks for your reply and it would be great if you could post the modified  code which increase the speed.

Thanks for your time on this. Please help me 

Free Windows Admin Tool Kit Click here and download it now
January 25th, 2015 4:12pm

So this is the story, I started to modify the code you had posted. But then I decided that it was too much of a mockup of you real-world case. Particularly, I felt an uncertainty about the data distribution in a real-world situation. This is by no means unimportant when deciding on index strategies.

Also, I found the lack of NULL/NOT NULL constraints to be disturbing.

Thus, I decided that I would only post some general pieces of advice that you could work from.

January 25th, 2015 4:35pm

Hi Erland,

I understand your concern. Isprocessed is a not null column. I tried to replace the temp variable with temp table and for 500k records it tool 31 seconds. Here is my latest try,

Declare @CompID int = 1050;

BEGIN
	-- Check if any data available to be processed
	IF EXISTS (
			SELECT TOP 1 IDcompAFR
			FROM Comp_Detail 
			
			WHERE CompID = @CompID
				AND coalesce(processStatus, 0) = 0
			)
	BEGIN
		BEGIN TRY
			-- Set it so if the first UPDATE fails,  we won't even start the second update.This really says "If we're in a transaction
			-- and something fails, stop processing the transaction and do a rollback if we can".
			SET XACT_ABORT ON;

			--Create a table to remember the rows we updated.
				IF OBJECT_ID('tempdb..#ActualData') IS NOT NULL
				BEGIN
					DROP TABLE #ActualData;
				END
				IF OBJECT_ID('tempdb..#AffecedRecords') IS NOT NULL
				BEGIN
					DROP TABLE #AffecedRecords;
				END

				CREATE TABLE #ActualData(UserId BIGINT
				,IDcompAFR BIGINT
				,ProcessingAmount MONEY);

				Create table #AffecedRecords (UserId BIGINT);

			-- temp variable to hold the actual data. this will be used to get IdcompanyOFR once the balance updated
			--DECLARE @ActualData TABLE (
			--	UserId BIGINT
			--	,IDcompAFR BIGINT
			--	,ProcessingAmount MONEY
			--	);
			-- table variable to capture the Affected UserId's
			--DECLARE @AffecedRecords TABLE (UserId BIGINT);

			BEGIN TRANSACTION;

			-- Get the whole data to be processed. 
			INSERT INTO #ActualData (
				UserId
				,IDcompAFR
				,ProcessingAmount
				)
			SELECT UserId
				,IDcompAFR
				,ProcessingAmount = COALESCE(TransferAmount, 0)
			FROM  Comp_Detail 
			WHERE CompID = @CompID
				AND coalesce(processStatus, 0) = 0
				;
			
			-- Aggregare the ProcessingAmount based on UserId
			WITH AggregateData
			AS (
				SELECT UserId
					,ProcessingAmount = SUM(COALESCE(ProcessingAmount, 0))
				FROM #ActualData
				GROUP BY UserId
				)
			--Do the balance update and capture the UserId that are affected.
			UPDATE UB
			SET UB.Amount_Available = COALESCE(UB.Amount_Available, 0) + AD.ProcessingAmount
				,UB.Amount_Pend = COALESCE(UB.Amount_Pend, 0) - AD.ProcessingAmount
				,LastModDt = getdate()
			OUTPUT deleted.UserId
			INTO #AffecedRecords(UserId)
			FROM User_bank UB
			INNER JOIN AggregateData AD ON UB.UserId = AD.UserId;

			--===== Using the captured UserId get the IDcompAFR from @ActualData temp variable
			-- and then update the processStatus = 1 
			---  means OFR processed for the trip . 
			UPDATE Comp_Detail
			SET processStatus = 1
				,AmtTransferDate = getdate()
			WHERE IDcompAFR IN (
					SELECT DISTINCT AD.IDcompAFR
					FROM #ActualData AD
					INNER JOIN #AffecedRecords AR ON (AD.UserId = AR.UserId)
					)
				AND processStatus = 0;

			COMMIT TRANSACTION;
		END TRY

		BEGIN CATCH
			DECLARE @ErrorMessage NVARCHAR(4000);
			DECLARE @ErrorSeverity INT;
			DECLARE @ErrorState INT;

			DROP TABLE #ActualData;
			DROP TABLE #AffecedRecords;
			SELECT @ErrorMessage = ERROR_MESSAGE()
				,@ErrorSeverity = ERROR_SEVERITY()
				,@ErrorState = ERROR_STATE();

			ROLLBACK TRANSACTION;

			RAISERROR (
					@ErrorMessage
					,@ErrorSeverity
					,@ErrorState
					);
		END CATCH;
	END
END
GO




--select * from Comp_Detail

--select * from User_bank

Can you please help me if we could still speed up the query? I would still request to post the modified code if you have any other way to tune this logic.

thanks

Free Windows Admin Tool Kit Click here and download it now
January 25th, 2015 4:50pm

I understand your concern. Isprocessed is a not null column.

In that case, you don't need the coalesce.

Also, if this is a stored procedure, you don't need to drop the temp tables. If this is a loose script, I would drop the temp tables at the end.

There are a still a few suggestions from my first post, that you don't seem to have considered. And again, I stress the importance of indexes. Playing with different indexes may be more important for speed than the code itself.

January 25th, 2015 5:45pm

Thanks Erland,

I removed the coalesce, but still not improvement. Other then index, do you have any other suggestion to improve the speed? If you get time, could you please play with the sample i provided with the changes you prescribed?I would like to know the execution time on your side. I will be more happy if we could reduce the execution time.

Sorry i am bothering you today. thanks for your time.

Free Windows Admin Tool Kit Click here and download it now
January 25th, 2015 7:08pm

1.
Can u take out all statements related to transaction and see if there is any change in performance?

2.
Also change your above query:

... SELECT DISTINCT AD.IDcompAFR
into #temp1...
FROM #ActualData AD
INNER JOIN #AffecedRecords AR ON (AD.UserId = AR.UserId)

and use it as:

UPDATE Comp_Detail
SET processStatus = 1
,AmtTransferDate = getdate()
WHERE IDcompAFR IN (
select IDcompAFR from #temp1
)
AND processStatus = 0;
gl


January 25th, 2015 11:57pm

I removed the coalesce, but still not improvement.

That's expected. Removing the coalesce as such has no effect, but it could serve prevent a good index to be used.

Other then index, do you have any other suggestion to improve the speed?

Other than index? Not really. I have suggested some rewrites, but without proper indexing they will not have any effect.

Free Windows Admin Tool Kit Click here and download it now
January 26th, 2015 3:50am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics