Thursday, August 28, 2008

Materialized Views, Logs and Context Indexes (Oracle 10.2.0.2).

I have been playing around with Materialized Views (shouldn't that be 'Materialised Views'? I'm an Aussie!) over the past couple of days. I've been building a rather complex join of several base tables, to work towards allowing 2 major tables of data to have fast search capabilities.
So, the resulting MV construction syntax looks a little bit like this:

CREATE MATERIALIZED VIEW CLIENTNAMEADDR
...<storage>...
BUILD IMMEDIATE
USING INDEX
REFRESH FAST ON COMMIT
WITH PRIMARY KEY USING DEFAULT LOCAL ROLLBACK SEGMENT
DISABLE QUERY REWRITE
AS
SELECT cli.clientId,
cn.firstName ' ' cn.secondName ' ' cn.familyName ' '
addr.streetNum ' ' addr.streetName ' ' addr.streetType ' '
addr.suburb ' ' addr.state ' ' addr.postcode as clientFullText,
cli.rowid cliRowid, cn.rowid cnRowid, ca.rowid caRowid, addr addrRowid
FROM clients cli, client_Names cn, client_Addresses ca, addresses addr
WHERE cli.clientId = cn.clientId
AND cli.clientId = ca.clientId
AND ca.addressId = addr.addressId;


The theory is that this will allow me to create a Context index on the clientFullText field on the MV to allow end users to search on any combination of name and address within the database. In reality, for the query shown above, I don't really need to join the client table into the query, but for my purposes I have another AND clause and will be filtering some of them out.

There are a fair number of restrictions you need to keep in mind when creating a materialized view, most of which are especially important when your MV contains a complex query - and let's face it, why else would you be creating an MV if not to contain a query with a large number of joins and/or filters.

After searching around on the net I found a list of things to consider when creating a Materialized View:
* The defining query of the materialized view cannot contain any non-repeatable expressions (ROWNUM, SYSDATE, non-repeatable PL/SQL functions, and so on).
* The query cannot contain any references to RAW or LONG RAW datatypes or object REFs.
* If the defining query of the materialized view contains set operators (UNION, MINUS, and so on), rewrite will use them for full text match rewrite only.
* If the materialized view was registered as PREBUILT, the precision of the columns must match the precision of the corresponding SELECT expressions unless overridden by the WITH REDUCED PRECISION clause.
* If the materialized view contains the same table more than once, it is possible to do a general rewrite, provided the query has the same aliases for the duplicate tables as the materialized view.
* If a query has both local and remote tables, only local tables will be considered for potential rewrite.
* Neither the detail tables nor the materialized view can be owned by SYS.
* SELECT and GROUP BY lists, if present, must be the same in the query of the materialized view.
* Aggregate functions must occur only as the outermost part of the expression. That is, aggregates such as AVG(AVG(x)) or AVG(x)+ AVG(x) are not allowed.
* CONNECT BY clauses are not allowed.

If you specify REFRESH FAST, Oracle performs further verification of the query definition to ensure that fast refresh can be performed if any of the detail tables change. These additional checks are:
* A materialized view log must be present for each detail table.
* The rowids of all the detail tables must appear in the SELECT list of the materialized view query definition.
* If there are no outer joins, you may have arbitrary selections and joins in the WHERE clause. However, if there are outer joins, the WHERE clause cannot have any selections. Further, if there are outer joins, all the joins must be connected by ANDs and must use the equality (=) operator.
* If there are outer joins, unique constraints must exist on the join columns of the inner table. For example, if you are joining the fact table and a dimension table and the join is an outer join with the fact table being the outer table, there must exist unique constraints on the join columns of the dimension table.

If some of these restrictions are not met, you can create the materialized view as REFRESH FORCE to take advantage of fast refresh when it is possible. If one of the tables did not meet all of the criteria, but the other tables did, the materialized view would still be fast refreshable with respect to the other tables for which all the criteria are met.

In order to support REFRESH FAST on the materialized view (have the MV automatically refresh after each change to the base tables), an MV LOG table has to be created for each of the base tables of the MV and include the columns that are used in the MV, so:

CREATE MATERIALIZED VIEW LOG
ON client_Names
WITH ROWID, SEQUENCE(clientId,
firstName,
secondName,
familyName)
INCLUDING NEW VALUES;
/

CREATE MATERIALIZED VIEW LOG
ON clients
WITH ROWID, SEQUENCE(clientID);
/
CREATE MATERIALIZED VIEW LOG
ON client_Addresses
WITH ROWID, SEQUENCE(clientId,
addressId)
INCLUDING NEW VALUES;
/
CREATE MATERIALIZED VIEW LOG
ON addresses
WITH ROWID, SEQUENCE(addressId,
streetNum,
streetType,
streetName,
suburb,
state,
postcode)
INCLUDING NEW VALUES;
/

During the creation of the MV, I made use of a number of utilities made available as part of the Oracle rdbms. As SYS, I created the REWRITE_TABLE table ($ORACLE_HOME/rdbms/admin/utlxrw.sql) - as well as a public synonym to the same - and granted permissions to public. As SYS, I also ensured the DBMS_ADVISOR package was made made available to particular users as required. As a standard user who was creating the MV and LOG tables, I created the MV_CAPABILITIES_TABLE table ($ORACLE_HOME/rdbms/admin/utlxmv.sql). This is only really needed during the query tuning phase.

Ok, on to the Context index. Context Indexes are an excellent way to provide fast text searches on large data tables. Think of them as a LIKE operator on steriods. You can search a VARCHAR2 column for portions of text and it can also be configured to be score-based, where your search criteria can be allowed to be 'close to' the actual data.
First off, make sure your friendly DBA grants you EXECUTE permissions to the CTXSYS.CTX_DDL package. This may be needed if you want your index to sit in a non-default tablespace, or if you want to apply a 'stop-list' to the index (ie search words to ignore).
To create a stop-list, syntax follows:
BEGIN
CTX_DDL.CREATE_STOPLIST('CLI_NAMES_IGNORE', 'BASIC_STOPLIST');
CTX_DDL.ADD_STOPWORD('CLI_NAMES_IGNORE', 'PTY');
CTX_DDL.ADD_STOPWORD('CLI_NAMES_IGNORE', 'PTY.');
CTX_DDL.ADD_STOPWORD('CLI_NAMES_IGNORE', 'LTD');
CTX_DDL.ADD_STOPWORD('CLI_NAMES_IGNORE', 'LTD.');
CTX_DDL.ADD_STOPWORD('CLI_NAMES_IGNORE', 'THE');
CTX_DDL.ADD_STOPWORD('CLI_NAMES_IGNORE', 'OTHERS');
END;
/

To create a specific index storage syntax clause:
BEGIN
-- CTX_DDL.DROP_PREFERENCE('CLINAMEADDR_STORE_PREFS');
CTX_DDL.CREATE_PREFERENCE('CLINAMEADDR_STORE_PREFS', 'BASIC_STORAGE');
CTX_DDL.SET_ATTRIBUTE('CLINAMEADDR_STORE_PREFS', 'I_INDEX_CLAUSE', 'tablespace large_idx compress 2');
CTX_DDL.SET_ATTRIBUTE('CLINAMEADDR_STORE_PREFS', 'I_TABLE_CLAUSE', 'tablespace large_idx');
CTX_DDL.SET_ATTRIBUTE('CLINAMEADDR_STORE_PREFS', 'K_TABLE_CLAUSE', 'tablespace large_idx');
CTX_DDL.SET_ATTRIBUTE('CLINAMEADDR_STORE_PREFS', 'R_TABLE_CLAUSE', 'tablespace large_idx LOB(DATA) STORE AS (CACHE)');
CTX_DDL.SET_ATTRIBUTE('CLINAMEADDR_STORE_PREFS', 'N_TABLE_CLAUSE', 'tablespace large_idx');
END;
/

Then, to create the Context index, using the stop-list and index storage preferences from the above two steps:
CREATE INDEX CLINAMEADDR_CONTEXT1 ON CLINAMEADDR(CLIENTFULLTEXT)
INDEXTYPE IS CTXSYS.CTXCAT
PARAMETERS('STOPLIST CLI_NAMES_IGNORE STORAGE CLINAMEADDR_STORE_PREFS')
;


Once this is done, you can use the new context index:

SELECT clientID, clientFullText
FROM cliNameAddr
WHERE CATSEARCH(clientFullText,''REPLACE(:search_criteria,' ',' NEAR ')'',NULL) > 0;

Bear in mind that I have noticed errors when passing in :search_criteria containing double spaces. So, you might want to ensure that whitepace is kept to a minimum - maybe create a little pre-parser for the data contained in :search_criteria.

After a few days of using this just to search for data, I eventually tried to perform a standard update on one of the base tables. That's when I noticed the performance problems. The database in which I am experimenting with the MV and Context concepts contains the following (active) row counts:
CLIENT_NAMES 7.5 million
CLIENT_ADDRESSES 6.7 million
ADDRESSES 8.4 million
CLIENTS 6.9 million

On updating a single row in the CLIENT_NAMES table, I noticed a 15 to 20 second wait time before response came back from the commit operation.
So I experimented a little - I tried an update on 4700 rows and noticed it took about 130 seconds to perform the update, plus another 15 MINUTES to perform the commit.
I used TOAD to investigate where it was getting stuck and it was definitely a combination of applying changes to both the MV and the Context index.
Without full knowledge on how to improve this ghastly performance problem, I may have to either retire the MV and Context solution, or stick with a refresh of the MV ON DEMAND instead of ON COMMIT.

A colleague of mine also tried a more substantial update on 300,000 rows and eventually got kicked out with
some internal error - the basis of which was a 'snapshot too old' type of error. Oracle has admitted a bug exists in this circumstance and has issued a one-off patch (5530043), but recommends upgrading to 10.2.0.4 or 11g.
I am waiting for our DBAs to upgrade one of our databases to 10.2.0.4 to see if the fix addresses some of the performance problems I noticed during my experiments, although I am doubtful of a successful outcome.

6 comments:

DomBrooks said...

Have you done any testing on how performance on an update is effected without the context index?

Marc Thompson said...

I have had enough time up my sleeve to do as much testing as I wanted. My colleague who also encountered the performance problems did not have context indexes on their Materialized Views.
However, they did have 7 other standard indexes on there.

So, it might pay for me to try out the scenario again with just 1 standard index on it, and see how it goes.

Will publish the results as soon as I am able.

Pete Scott said...

With on commit refresh you have to be careful about the rate of commit compared to the time to refresh... often on demand or interval refreshes are easier to manage - also look at full refresh times, FAST is not the best name for the mechanism, it really is incremental - in some cases a full refresh may give better performance (and also removes the need for the mview logs)

Marc Thompson said...

Haha, I definitely agree - FAST seems to be a little misleading!

We will probably still stick with the MV and Context Index, but perform a full refresh overnight (on DEMAND).

Kubilay Tsil Kara said...

Great Post your extra MV REWRITE restriction:
"If the materialized view contains the same table more than once, it is possible to do a general rewrite, provided the query has the same aliases for the duplicate tables as the materialized view",
helped me a lot to REWRITE a query, which wouldn't rewrite after an upgrade from Oracle 11gR1 to 11gR2. Can I ask you where did you find out about this restriction? The Oracle Docs don't list it as a restriction.

Many Thanks!

Kubilay

Marc Thompson said...

I can't remember the exact website I found that gem, but a good resource for Query Rewrite can be found at: http://download-west.oracle.com/docs/cd/B12037_01/server.101/b10736/qr.htm#g1045551