Wednesday, March 7, 2012

New idea to use the EXCEPT operator to paging

Last night, I have this idea, but can not know if it can have a good performance and efficiency when the @.CurrentPage is big.

--------------------

--DATABASE:AdventureWorks

DECLARE @.CurrentPage int
DECLARE @.PageSize int
DECLARE @.OrderExpression nvarchar(100)
DECLARE @.Sql nvarchar(500)

SET @.CurrentPage = 1
SET @.PageSize = 10
SET @.OrderExpression = N' employeeid '

DECLARE @.BigTop int
DECLARE @.SmallTop int
SET @.BigTop = @.CurrentPage * @.PageSize
SET @.SmallTop = (@.CurrentPage -1) * @.PageSize

DECLARE @.StartTime datetime
SET @.StartTime = GETDATE()

SET @.Sql = N' SELECT TOP (' + CAST(@.BigTop AS nvarchar(10)) + ') * FROM humanresources.Employee '
+ ' EXCEPT '
+ ' SELECT TOP (' + CAST(@.SmallTop AS nvarchar(10)) + ') * FROM humanresources.Employee ORDER BY ' + @.OrderExpression

EXEC sp_executesql @.Sql

DECLARE @.EndTime datetime
SET @.EndTime = GETDATE()

SELECT DATEPART(s,@.EndTime-@.StartTime)
SELECT DATEPART(ms,@.EndTime-@.StartTime)

GO

--------------------

It does get progressively slower as the number of records you are dealing with increases.

:(

|||

What's really needed in t-sql is this:

select ROWS 50 to 75 * from some_table order by some_column

It would remove the need for so many additional passes thru the data that other approaches require.

(And if someone knows a trick to do that, speak up now!)

|||

I have done some tests, comparee several different stored procedures, found this has a very good performance when the @.CurrentPage is less than 500 and @.PageSize = 10, in general use, this is enough.

dose anybody have interesting to do more test and post the result?

|||

Another way, by using the operator IN, can have an average speed when the @.CurrentPage is smaller and bigger, but when the @.CurrentPage is smaller, it is slower than not use "IN"
...

...

SELECT * FROM Employee WHERE employeeid IN (
SELECT TOP (@.BigTop) employeeid FROM Employee
EXCEPT
SELECT TOP (@.SmallTop) employeeid FROM Employee ORDER BY employeeid
)

...

...

|||

I made a new test,I copy the HumanResources.Employee table of AdventureWorks database to a new table, and add to about 200,000 rows,

test the performance of the following 3 different ways:

------1 start----------

DECLARE @.CurrentPage int
DECLARE @.PageSize int
DECLARE @.OrderExpression nvarchar(100)
DECLARE @.Sql nvarchar(500)

SET @.CurrentPage = 300
SET @.PageSize = 20
SET @.OrderExpression = N' employeeid '

DECLARE @.BigTop int
DECLARE @.SmallTop int
SET @.BigTop = @.CurrentPage * @.PageSize
SET @.SmallTop = (@.CurrentPage -1) * @.PageSize

/*
SET @.Sql = N' SELECT TOP (' + CAST(@.BigTop AS nvarchar(10)) + ') * FROM Employee '
+ ' EXCEPT '
+ ' SELECT TOP (' + CAST(@.SmallTop AS nvarchar(10)) + ') * FROM Employee ORDER BY ' + @.OrderExpression
*/

DECLARE @.StartTime datetime
SET @.StartTime = GETDATE()

SELECT TOP (@.BigTop) * FROM Employee
EXCEPT
SELECT TOP (@.SmallTop) * FROM Employee ORDER BY modifieddate

DECLARE @.EndTime datetime
SET @.EndTime = GETDATE()

SELECT DATEPART(s,@.EndTime-@.StartTime)
SELECT DATEPART(ms,@.EndTime-@.StartTime)

GO

------1 end----------

------2 start----------

DECLARE @.CurrentPage int
DECLARE @.PageSize int
DECLARE @.OrderExpression nvarchar(100)
DECLARE @.Sql nvarchar(500)

SET @.CurrentPage = 300
SET @.PageSize = 20
SET @.OrderExpression = N' employeeid '

DECLARE @.BigTop int
DECLARE @.SmallTop int
SET @.BigTop = @.CurrentPage * @.PageSize
SET @.SmallTop = (@.CurrentPage -1) * @.PageSize

/*
SET @.Sql = N' SELECT TOP (' + CAST(@.BigTop AS nvarchar(10)) + ') * FROM Employee '
+ ' EXCEPT '
+ ' SELECT TOP (' + CAST(@.SmallTop AS nvarchar(10)) + ') * FROM Employee ORDER BY ' + @.OrderExpression
*/

DECLARE @.StartTime datetime
SET @.StartTime = GETDATE()
SELECT * FROM Employee WHERE employeeid IN (
SELECT TOP (@.BigTop) employeeid FROM Employee
EXCEPT
SELECT TOP (@.SmallTop) employeeid FROM Employee ORDER BY employeeid
)

DECLARE @.EndTime datetime
SET @.EndTime = GETDATE()

SELECT DATEPART(s,@.EndTime-@.StartTime)
SELECT DATEPART(ms,@.EndTime-@.StartTime)

GO

------2 end----------

------3 start----------

DECLARE @.StartTime datetime
SET @.StartTime = GETDATE()
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY employeeid DESC)
AS Row, * FROM Employee)
AS LogWithRowNumbers
WHERE Row >= 6010 AND Row <= 6020

DECLARE @.EndTime datetime
SET @.EndTime = GETDATE()

SELECT DATEPART(s,@.EndTime-@.StartTime)
SELECT DATEPART(ms,@.EndTime-@.StartTime)

GO

------3 end----------

When the database has about 200,000 rows,

When @.CurrentPage*@.PageSize <=6000, the first way has good performance

When @.CurrentPage*@.PageSize >6000, the second way has good performance

The 3rd way had the worst performance.

ex. When @.CurrentPage = 100, @.PageSize = 10,

the 1st: 30ms,

the 2nd: 500ms

the 3rd: 4s and 400ms

-------------

But when I made another test that the database has about 30,000 rows,

the 3rd is the fastest, the 1st is the slowest


sorry to all for my bad english ;-p

|||

New test script

------Start--------

DECLARE @.CurrentPage int
DECLARE @.PageSize int

SET @.CurrentPage = 3000
SET @.PageSize = 10

DECLARE @.BigTop int
DECLARE @.SmallTop int
SET @.BigTop = @.CurrentPage * @.PageSize
SET @.SmallTop = (@.CurrentPage -1) * @.PageSize

DECLARE @.StartTime datetime
SET @.StartTime = GETDATE()

IF(SELECT COUNT(SalesOrderId) FROM Sales.SalesOrderHeader) < 40000 /* IF rows less than 40000 */
BEGIN
/*
WITH Rows AS
(
SELECT ROW_NUMBER() OVER (ORDER BY SalesOrderId DESC) AS Row, * FROM Sales.SalesOrderHeader
)
SELECT * FROM Rows
WHERE Row >= @.SmallTop AND Row <=@.BigTop
*/

SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY SalesOrderId DESC)
AS Row, * FROM Sales.SalesOrderHeader)
AS Rows
WHERE Row BETWEEN @.SmallTop AND @.BigTop

END
ELSE
BEGIN
IF @.BigTop < 5000 /* IF @.CurrentPage*@.PageSize < 5000 Rows */
BEGIN
SELECT TOP (@.BigTop) * FROM Sales.SalesOrderHeader
EXCEPT
SELECT TOP (@.SmallTop) * FROM Sales.SalesOrderHeader ORDER BY SalesOrderId
END
ELSE
BEGIN
SELECT * FROM Sales.SalesOrderHeader WHERE SalesOrderId IN (
SELECT TOP (@.BigTop) SalesOrderId FROM Sales.SalesOrderHeader
EXCEPT
SELECT TOP (@.SmallTop) SalesOrderId FROM Sales.SalesOrderHeader ORDER BY SalesOrderId
)
END
END

DECLARE @.EndTime datetime
SET @.EndTime = GETDATE()

SELECT DATEPART(s,@.EndTime-@.StartTime)
SELECT DATEPART(ms,@.EndTime-@.StartTime)

GO

------End--------

No comments:

Post a Comment