Wednesday, March 7, 2012

New indexing algorithm?

I just got done reading an article from Q4, 2004 DB2 Magazine about
CopperEye Indexing. The article states, "CopperEye indexing as
implemented as a DataBlade has shown a level of performance up to
twenty-five times faster than could be achieved with Informix's
standard B-tree indexing." It also states that using their algorithm
for indexing on databases that are considered large that thier
algorithm is 100 times faster than b-tree indexing.
While this new indexing algorithm has only been developed for
Informix, if it is as good as it sounds, I'm sure other DBMS's will
start using it. Has anyone else heard of this? And, does anyone know
if SQL Server will be coming up with a better indexing algorithm than
the tried and true b-tree? Just curious and hoping that Kalen Delaney
might weigh in on this.
--Lori
SQL Server 2005 doesn't have technology like this built in. I can't
speculate on the technology itself or comment on whether a future version of
SQL Server would incorporate anything like it (it would be my team that
would do it so no-one else is in a better position to answer :-)
Regards.
Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
This posting is provided "AS IS" with no warranties, and confers no rights.
"LoriB" <loribrown48@.hotmail.com> wrote in message
news:848869d7.0411101440.7e3e60f6@.posting.google.c om...
> I just got done reading an article from Q4, 2004 DB2 Magazine about
> CopperEye Indexing. The article states, "CopperEye indexing as
> implemented as a DataBlade has shown a level of performance up to
> twenty-five times faster than could be achieved with Informix's
> standard B-tree indexing." It also states that using their algorithm
> for indexing on databases that are considered large that thier
> algorithm is 100 times faster than b-tree indexing.
> While this new indexing algorithm has only been developed for
> Informix, if it is as good as it sounds, I'm sure other DBMS's will
> start using it. Has anyone else heard of this? And, does anyone know
> if SQL Server will be coming up with a better indexing algorithm than
> the tried and true b-tree? Just curious and hoping that Kalen Delaney
> might weigh in on this.
> --Lori
|||Hi Lori
I've never heard of it ... until now.
And since I write about the way SQL Server works, and SQL Server doesn't use
this type of indexing, I'm unlikely to be writing about it any time soon.
:-)
HTH
Kalen Delaney
SQL Server MVP
www.SolidQualityLearning.com
"LoriB" <loribrown48@.hotmail.com> wrote in message
news:848869d7.0411101440.7e3e60f6@.posting.google.c om...
>I just got done reading an article from Q4, 2004 DB2 Magazine about
> CopperEye Indexing. The article states, "CopperEye indexing as
> implemented as a DataBlade has shown a level of performance up to
> twenty-five times faster than could be achieved with Informix's
> standard B-tree indexing." It also states that using their algorithm
> for indexing on databases that are considered large that thier
> algorithm is 100 times faster than b-tree indexing.
> While this new indexing algorithm has only been developed for
> Informix, if it is as good as it sounds, I'm sure other DBMS's will
> start using it. Has anyone else heard of this? And, does anyone know
> if SQL Server will be coming up with a better indexing algorithm than
> the tried and true b-tree? Just curious and hoping that Kalen Delaney
> might weigh in on this.
> --Lori
|||While their website doesn't explain the specifics, they certianly make some
bold claims.
www.coppereye.com
-Paul Nielsen, SQL Server MVP
SQL Server 2000 Bible, Wiley Press
Enterprise Data Architect, www.Compassion.com
"LoriB" <loribrown48@.hotmail.com> wrote in message
news:848869d7.0411101440.7e3e60f6@.posting.google.c om...
>I just got done reading an article from Q4, 2004 DB2 Magazine about
> CopperEye Indexing. The article states, "CopperEye indexing as
> implemented as a DataBlade has shown a level of performance up to
> twenty-five times faster than could be achieved with Informix's
> standard B-tree indexing." It also states that using their algorithm
> for indexing on databases that are considered large that thier
> algorithm is 100 times faster than b-tree indexing.
> While this new indexing algorithm has only been developed for
> Informix, if it is as good as it sounds, I'm sure other DBMS's will
> start using it. Has anyone else heard of this? And, does anyone know
> if SQL Server will be coming up with a better indexing algorithm than
> the tried and true b-tree? Just curious and hoping that Kalen Delaney
> might weigh in on this.
> --Lori
|||Lori,
I took a look at a couple of the CopperEye whitepapers, and
while they don't give details, they make some valid points. A
SQL Server clustered index is optimized for range retrievals at
the expense of slower performance for data modification.
SQL Server's indexing flexibility, once index keys are decided, is
relatively limited - FILLFACTOR and clustered vs. nonclustered
is basically the only parameter that can trade retrieval performance
against modification performance. That's not to say that good
performance for both is impossible, but it's likely to involve a
combination of decisions from the database design, to indexing,
to frequency and details of index and file maintenance, and more.
My best guess at what underlies the CopperEye tables and
indexes is a good idea, but not a breakthrough. If I'm right,
what they are doing can also be done to a large extent in SQL
Server by indexing SQL Server tables over carefully designed
computed columns. CopperEye probably figured out some
good choices for such computed column formulas and packaged
them up in a way that is relatively transparent - where the
designer sees the same data as usual and has some additional
parameters to choose when indexing.
Below is a repro for something like this. It uses an indexed view
just because I found it easier to coax a decent query plan that
way, but you could probably do the same thing without one. The
strange indexing provides two benefits - one is sort of an automatic
partitioning by quarter, and the other is that the precision of the
index is reduced, making inserts into the clustered index easier,
but the precision isn't reduced to zero, so you still get good
performance for searches.
Since SQL Server doesn't provide this kind of functionality
directly, I have to write an explicit join against the view in
the query, but if you look at the estimated plans and the
actual statistics for the three queries, you'll see that you
don't lose as much as you might think from the relaxed
indexing. What I don't show here (but it should be true)
is the advantage this provides in avoiding "hot spots" and
major page splits if this table were frequently modified.
Some of CopperEye's claims are easier to believe than others.
Their design probably implements the kind of thing I do
below smoothly and transparently, and with some nice
sliders. But I don't see much evidence in the white paper
for 25-fold or better improvements for anything. They
they also claim their indexes are better for pattern
matches, and while that can be done in SQL Server, too,
to some extent, it might be hard to optimize everything
at once, and I'm not sure CopperEye can do that either.
It's worth noting that SQL Server 2005 has some new
features that will provide more options for this kind of
tweaking. These include the possibility of adding
non-indexed columns to indexes and new partitioning
features.
Here's the repro:
create table T (
LastName varchar(30) not null,
FirstName varchar(30) not null,
IssueDate datetime not null,
i int not null,
moredata varchar(700)
)
go
-- For data integrity:
alter table T
add constraint T_pk
primary key nonclustered (LastName, FirstName, IssueDate)
go
-- This allows "automatic partitioning"
-- and semi-precise indexing, both of
-- which can reduce the work needed for
-- data modification (inserts/updates/deletes).
-- Expect fewer page splits and hot spots,
-- without losing all the index selectivity.
alter table T
add QuarterInitial as
convert(char(6),IssueDate,112)/3*3000
+ascii(upper(substring(LastName,1,1)))
create clustered index T_uci
on T(QuarterInitial)
go
-- This basically helps force a good query
-- plan when a precise search is needed.
create view T_quarters with schemabinding as
select
QuarterInitial,
ascii(LastName) as LastAscii,
count_big(*) as nRows
from dbo.T
group by QuarterInitial, ascii(LastName)
go
create unique clustered index T_quarters_q
on T_quarters(QuarterInitial)
go
create index T_quarters_LastName
on T_quarters(LastAscii)
go
-- Insert sample data
insert into T
select
O.CustomerID,
rtrim(O.OrderID),
dateadd(day,binary_checksum(O.ShipAddress)%1200+E. EmployeeID,'20010101'),
O.EmployeeID,
replicate(O.CustomerID,O.OrderID%111)
from Northwind..Orders O
cross join Northwind..Employees E
go
set statistics io on
go
-- This is no longer as efficient as it would
-- be with a precise index
select * from T
where LastName = 'ERNSH'
go
-- This comes closer, and it may get a
-- good plan with Enterprise Edition
select T.*
from T join T_quarters
on T_quarters.QuarterInitial = T.QuarterInitial
where T_quarters.LastAscii = ascii('E')
and T.LastName = 'ERNSH'
go
-- The (noexpand) hint tells the optimizer
-- to use the materialized view (the view index).
select T.*
from T join T_quarters with (noexpand)
on T_quarters.QuarterInitial = T.QuarterInitial
where T_quarters.LastAscii = ascii('E')
and T.LastName = 'ERNSH'
go
set statistics io off
go
drop view T_quarters
drop table T
-- Steve Kass
-- Drew University
-- Ref: 112D6A0C-0D4C-4E06-B99D-8AE87928FA1F
LoriB wrote:

>I just got done reading an article from Q4, 2004 DB2 Magazine about
>CopperEye Indexing. The article states, "CopperEye indexing as
>implemented as a DataBlade has shown a level of performance up to
>twenty-five times faster than could be achieved with Informix's
>standard B-tree indexing." It also states that using their algorithm
>for indexing on databases that are considered large that thier
>algorithm is 100 times faster than b-tree indexing.
>While this new indexing algorithm has only been developed for
>Informix, if it is as good as it sounds, I'm sure other DBMS's will
>start using it. Has anyone else heard of this? And, does anyone know
>if SQL Server will be coming up with a better indexing algorithm than
>the tried and true b-tree? Just curious and hoping that Kalen Delaney
>might weigh in on this.
>--Lori
>
|||Lori,
I finally see where CopperEye comes up with some of their
biggest claims, and they simply aren't reasonable. Check out
this "calculator" on the CopperEye web site:
http://www.coppereye.com/roi_explain...lculator+--%3E
The values it pops up with show CopperEye to be many times better than
B-tree indexing, but if you look at the default values for the
calculation, one in
particular sticks out: they used an "Index memory cache" size of 10 MB.
It turns out that one matters a great deal. Ten MB might be reasonable
for a handheld running SQL Server CE, but I can't imagine anyone running
a business with only 10 MB available to cache critical indexes. If you
raise
the cache size to 512 Mbytes, suddenly CopperEye saves only 6 minutes a day
instead of 54 hours a day. Their defaults are unrealistic at best, and
certainly
misleading.
It's very easy to find reasonable values for which CopperEye is a
very bad choice. Try these:
Avg transactions/day: 20 Million
Total Days stored 15 Days
Number of Indexed keys: 4
Average key size: 300 Bytes
Index memory cache. 4192 Mbytes
Disk Bandwidth 5ms seek time, 200 Seeks/second
Parallelism: 1 stream
Index partitions: 30
Now B-trees are 6 times faster.
Now I feel like CopperEye is urging me to give up
my bicycle for a turtle, because a turtle is many times
faster. Yeah, the turtle is faster ... climbing a 45-degree
hill in the rain. I'll stick with my bike.
SK
LoriB wrote:

>I just got done reading an article from Q4, 2004 DB2 Magazine about
>CopperEye Indexing. The article states, "CopperEye indexing as
>implemented as a DataBlade has shown a level of performance up to
>twenty-five times faster than could be achieved with Informix's
>standard B-tree indexing." It also states that using their algorithm
>for indexing on databases that are considered large that thier
>algorithm is 100 times faster than b-tree indexing.
>While this new indexing algorithm has only been developed for
>Informix, if it is as good as it sounds, I'm sure other DBMS's will
>start using it. Has anyone else heard of this? And, does anyone know
>if SQL Server will be coming up with a better indexing algorithm than
>the tried and true b-tree? Just curious and hoping that Kalen Delaney
>might weigh in on this.
>--Lori
>

No comments:

Post a Comment