Wednesday, March 28, 2012

new to cursors

Can anyone point me to a good resource for learning cursors in MSSQL?

Thanks

DaveDave,
Try www.TechnicalVideos.net. The videos on triggers and Stored
Procedures will help you a lot. There is also a video specifically dealing
with cursors.

Hope this helps,
Chuck Conover
www.TechnicalVideos.net

"Dave Anderson" <anderdw2@.cvn.net> wrote in message
news:ELORb.112$uM2.98@.newsread1.news.pas.earthlink .net...
> Can anyone point me to a good resource for learning cursors in MSSQL?
> Thanks
> Dave|||Remember that you should generally try to avoid using cursors at all. They
are not usually a good solution to a problem in SQL because of their
performance and resource constraints compared to the set-based alternatives.

My view is that the majority of cursors fall into one of three categories:
Procedural administrative tasks (a reasonable use of a cursor); Written by
procedural programmers who don't know SQL; Used to work around a poor data
design (e.g. lack of keys in tables). There are other cases but they are few
and far between in my experience.

--
David Portas
SQL Server MVP
--|||David,
I'm not disagreeing exactly, but I would be interested in your opinion.
Typically I use cursors to load data from an outside source. Since you
can't control what that outside data looks like, I will do the following:

- load data into a temp table
- cursor thru the temp table, verifying data types and generally massaging
the data, if needed
- insert into our production table(s) if the data passes examination (done
in the cursor)

I'm not certain if you'd consider this an administrative task. Some of the
data checking we do would be hard or impossible to do using a set-based
structure, I think.

Appreciate your views.
Best regards,
Chuck

"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:JbWdnTdWGp4je4rd4p2dnA@.giganews.com...
> Remember that you should generally try to avoid using cursors at all. They
> are not usually a good solution to a problem in SQL because of their
> performance and resource constraints compared to the set-based
alternatives.
> My view is that the majority of cursors fall into one of three categories:
> Procedural administrative tasks (a reasonable use of a cursor); Written by
> procedural programmers who don't know SQL; Used to work around a poor data
> design (e.g. lack of keys in tables). There are other cases but they are
few
> and far between in my experience.
> --
> David Portas
> SQL Server MVP
> --|||"Chuck Conover" <cconover@.commspeed.net> wrote in
news:1075323791.146752@.news.commspeed.net:

> David,
> I'm not disagreeing exactly, but I would be interested in your
> opinion. Typically I use cursors to load data from an outside
> source. Since you can't control what that outside data looks like,
> I will do the following:
> - load data into a temp table> - cursor thru the temp table,
verifying data types and generally
> massaging the data, if needed
> - insert into our production table(s) if the data passes
> examination (done in the cursor)

Here's what I do:

1) bcp the data to load into a staging table
2) using set-based logic to validate against master tables putting the
'clean' results into a 'good' table and 'bad' data into another
(for later reporting)
3) once all the validation is done, a single insert from the 'good'
table is used to slam all the data into the target table.
--
Pablo Sanchez - Blueoak Database Engineering, Inc
http://www.blueoakdb.com|||As Pablo says, many data cleansing and transformation operations can be
performed using set-based statements, even if you don't have keys in your
source data. For the rest I would use a dedicated ETL tool. DTS is the
component that ships with SQLServer but there are also other packages
specifically designed to automate the type of data prep you are describing
and then load the cleansed data into your database. This approach has lots
of advantages over hand-coded conversions in an RDBMS that isn't really
designed and optimised for the job.

http://pervasive.datajunction.com/djcosmos/
http://www.embarcadero.com/products/dtstudio/index.html
http://www.informatica.com

--
David Portas
SQL Server MVP
--|||David Portas (REMOVE_BEFORE_REPLYING_dportas@.acm.org) writes:
> My view is that the majority of cursors fall into one of three
> categories: Procedural administrative tasks (a reasonable use of a
> cursor); Written by procedural programmers who don't know SQL; Used to
> work around a poor data design (e.g. lack of keys in tables). There are
> other cases but they are few and far between in my experience.

I'll add one more case: you already have a stored procedures that
performs some complex logic, and this procedure accepts its input
in scalar parameters, and you want to apply that logic to entire
set of data.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland Sommarskog <sommar@.algonet.se> wrote in
news:Xns947F2060989AYazorman@.127.0.0.1:

> David Portas (REMOVE_BEFORE_REPLYING_dportas@.acm.org) writes:
> I'll add one more case: you already have a stored procedures that
> performs some complex logic, and this procedure accepts its input
> in scalar parameters, and you want to apply that logic to entire
> set of data.

I'll counter the above with the fact that you're using the wrong
tool for the job. The SP was written to handle single row inserts
therefore if you plan on doing batch inserts, then the SQL should be
written and tuned accordingly. Unless you wish this side of sucky
performance. :)
--
Pablo Sanchez - Blueoak Database Engineering, Inc
http://www.blueoakdb.com|||>> My view is that the majority of cursors fall into one of three
categories:
Procedural administrative tasks (a reasonable use of a cursor);
Written by procedural programmers who don't know SQL; Used to work
around a poor data design (e.g. lack of keys in tables). There are
other cases but they are few and far between in my experience. <<

The only other one that comes to mind is an NP complete problem
(traveling salesman, etc.) where you can use the first near-optimal
answer. Being a set-oriented language, SQL tends to find the **entire
set** of solutions and that can take a **lot** of time. Thank god you
don't run intot hem very often.|||Pablo Sanchez (honeypot@.blueoakdb.com) writes:
> Erland Sommarskog <sommar@.algonet.se> wrote in
> news:Xns947F2060989AYazorman@.127.0.0.1:
>> I'll add one more case: you already have a stored procedures that
>> performs some complex logic, and this procedure accepts its input
>> in scalar parameters, and you want to apply that logic to entire
>> set of data.
> I'll counter the above with the fact that you're using the wrong
> tool for the job. The SP was written to handle single row inserts
> therefore if you plan on doing batch inserts, then the SQL should be
> written and tuned accordingly. Unless you wish this side of sucky
> performance. :)

I'm not talking of simple SPs that inserts a single row into a table.
I'm talking about stored procedures with more than 1500 lines of code,
and which calls plenty of over procedures, activating another 1500 lines
of code. Those lines of code include important business rules, that you
don't want to have duplicated in a scalar version of the procedure and
a table-oriented one. And when many of the calls to the procedure for
busieness reasons are in fact one off, there may be a performance penalty
of the procedure is rewritten to be table-oriented.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Most of our data comes from outside sources in which we have no
control over the quality or accuracy of the data. As a business, we
felt that it was more important to get 99% of the data into our
systems, then deal with those rows that fail afterwards.

Since I was unable to figured out a way to do this as sets (since a
set either all works or all fails together) I have had to cursor
processing all over the place. I also make use of stored procedures
which (as far as I know) cannot be passed in data sets, only scalar
values.|||Another thought (more of a question).

When you need to provide logging (auditing) of each transaction the
easiest way is to run through each row. Do the action, then log it. Go
to next row.

Am I wrong? What are the alternatives?

I provide a real life situation as an example. Derivatives (futures,
options, and future options) are stocks that expire at some date.
After their expiry date, I need to mark the record in our Security
Master table as being inactive. I want a log of each security that was
marked as inactive. If I do not use a cursor, how else can I do it?
The only other way I can think our is do perform the query twice (one
select, then one update) inside a begin / end transaction. The select
statement would give me info to output to a log file and the update
would mark it as inactive...|||Erland Sommarskog <sommar@.algonet.se> wrote in
news:Xns94803655A9EEYazorman@.127.0.0.1:

> I'm not talking of simple SPs that inserts a single row into a

Nope, I didn't assume that you were ... I assumed complicated beasts.

> Those lines of code include important business rules, that you don't
> want to have duplicated in a scalar version of the procedure and a
> table-oriented one.

If the batch processing is important, you do. If it's not something
that requires to be loaded within a time constraint, I completely
agree with you.

> And when many of the calls to the procedure for busieness reasons
> are in fact one off, there may be a performance penalty of the
> procedure is rewritten to be table-oriented.

For example?
--
Pablo Sanchez - Blueoak Database Engineering, Inc
http://www.blueoakdb.com|||> After their expiry date, I need to mark the record in our Security
> Master table as being inactive.

Why? If the expiry date is recorded in your system then you already *know*
whether a stock has expired or not based on the current date and time. An
active / inactive column would just be redundant data. Put the status in a
view if you like - not in a table.

> The only other way I can think our is do perform the query twice (one
> select, then one update)

Yes. But you would need two statements whether you do it in a cursor or not.
It should be quicker without a cursor.

> inside a begin / end transaction. The select

If you really wanted to do it you don't need a transaction but I don't see
the point of the AuditLog unless Stocks are going to change Inactive ->
Active as well as Active -> Inactive. On limited info here's a guess:

INSERT INTO AuditLog (status, col1, col2, ... )
SELECT 'inactive', col1, col2, ...
FROM Stocks
WHERE expirydate <= CURRENT_TIMESTAMP

UPDATE Stocks
SET status = 'inactive'
WHERE status = 'active'
AND EXISTS
(SELECT *
FROM AuditLog
WHERE status = 'inactive'
AND pkcol = Stocks.pkcol)

--
David Portas
SQL Server MVP
--|||> Most of our data comes from outside sources in which we have no
> control over the quality or accuracy of the data. As a business, we
> felt that it was more important to get 99% of the data into our
> systems, then deal with those rows that fail afterwards.

See my reply earlier in this thread. There are tools specifically designed
to solve this problem.

--
David Portas
SQL Server MVP
--|||Pablo Sanchez (honeypot@.blueoakdb.com) writes:
> Erland Sommarskog <sommar@.algonet.se> wrote in
> news:Xns94803655A9EEYazorman@.127.0.0.1:
>> And when many of the calls to the procedure for busieness reasons
>> are in fact one off, there may be a performance penalty of the
>> procedure is rewritten to be table-oriented.
> For example?

We did take to task to rewrite one of our procedures to be table-oriented.
We did this, because this one creates an account transaction, updates
positions, balances and a whole lot more things. The scope for the database
transaction for this may be a singe account transaction, for instance a
simple deposit of money. The scope may also be over 50000 account
transactions, for instance capitalization of interest, or a corporate
action in a major company like Ericsson.

The outcome of this adventure is that we can now rewrite the multi-
transaction updaets to be set-based and be a lot faster than before.
But anything that is still one-by-one due to legacy is now slower,
say one second instead of 200 ms. Instead of having single values
in variables, it is now in 43 table variables, and that is of course
slower.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||David Portas (REMOVE_BEFORE_REPLYING_dportas@.acm.org) writes:
>> After their expiry date, I need to mark the record in our Security
>> Master table as being inactive.
> Why? If the expiry date is recorded in your system then you already *know*
> whether a stock has expired or not based on the current date and time. An
> active / inactive column would just be redundant data. Put the status in a
> view if you like - not in a table.

It isn't that easy. I don't know Jason's business, but I know my own.

When an instrument has expired, it should indeed be inactivated, or
deregistered to use the terminology in our system. But you cannot
deregister if there are still are positions or unsettled trades. Even
if the instrument has expired, you may still have to register transactions
in it. For instance, you may not until now discover that you have
registered a trade for the wrong account, and have to cancel and
create a replacement note. So even if the instrument is expired, it
should still be fully valid in transactions - but of course there
should be validation that you don't specify a trade date after
expiration.

Once everything has been cleared up, all trades resulting from expiration
has been registered, and all unused options has been booked out, you
can deregister the instrument.

But there is of course no reason to use a cursor just because of this.
A very simple-minded approach is:

INSERT #temp(id)
SELECT id
FROM instruments
WHERE -- should be deregistered

UPDATE instruments
SET deregdate = getdate()
FROM instruments ins
JOIN #temp t ON ins.id = t.id

INSERT audirlog (...)
SELECT ...
FROM #temp ...
--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland Sommarskog <sommar@.algonet.se> wrote in
news:Xns94814EAFE04FYazorman@.127.0.0.1:

> The outcome of this adventure is that we can now rewrite the
> multi- transaction updaets to be set-based and be a lot faster
> than before. But anything that is still one-by-one due to legacy
> is now slower, say one second instead of 200 ms. Instead of having
> single values in variables, it is now in 43 table variables, and
> that is of course slower.

[ I don't know if you want to pursue it further so if you don't
respond, I'll assume not. ]

The fact that there's some legacy sounds like that legacy code needs
to be refactored as well. It's only natural that that's what needs
to be done when taking row-at-a-time and converting to set-based.
--
Pablo Sanchez - Blueoak Database Engineering, Inc
http://www.blueoakdb.com|||Pablo Sanchez (honeypot@.blueoakdb.com) writes:
> The fact that there's some legacy sounds like that legacy code needs
> to be refactored as well. It's only natural that that's what needs
> to be done when taking row-at-a-time and converting to set-based.

Needs to is a relative item.

I don't know how much work it took to do rewrite that particular
stored procedure, but I seem to recall that my time estimate was 100
hours. In a company like hours, 100 hours is not something you take out
of thin air.

Some of these jobs running one-by-one have been rewritten, rest assured.
But there is at least one process where a rewrite will take another
100 hours, maybe more.) (This one is however not keeping all calls in one
database transaction.) And I would not expect this to happen to either
we have someone coughing up money for it, or a customer yelling loud
enough about the performance. (The latter is of course much more likely
than the former.)

Anyway, this particular procedure that we rewrote had to be rewritten, for
our system being able to scale. But I have another case, where most calls
are one at a time, and where only one functions make sucessive calls, and
this function is used by one customer only. I'd be cautious before I init
a rewrite here.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thanks for the imput guys. I thought about auditing everything as a
step and then performing the action as another step but I felt it was
somehow "wrong".

Erland summed it up nicely about the inactive status and unprocessed
trades. To add to that, there are other status's (like Halted) that
can be applied to a security. That is another reason for the Status
column.

No comments:

Post a Comment