It was bugging me all weekend and with the help of Mladen I came up with a solution.
The original SQL ( again part of an inline view in a much bigger piece of SQL ) looked like this:
SELECT
POC.PO_ID,
POC.LINE_ITEM_PO,
POC.COMMENT_PO,
POC.DATE_ADD_COMMENT
FROM SCHEMA_NAME.PO_COMMENT POC
WHERE (POC.PO_ID,
POC.LINE_ITEM_PO,
POC.ROW_TIMESTAMP)
IN
(SELECT
A.PO_ID,
A.LINE_ITEM_PO,
MAX(A.ROW_TIMESTAMP)
FROM SCHEMA_NAME.PO_COMMENT A /* The same table */
WHERE A.TYPE_PO_COMMENT='LE‘
GROUP BY
A.PO_ID,
A.LINE_ITEM_PO) )
WITH INNER AS
( SELECT PO_ID, LINE_ITEM_PO, COMMENT_PO,
DATE_ADD_COMMENT, ROW_TIMESTAMP,
MAX(ROW_TIMESTAMP) OVER (PARTITION BY PO_ID,LINE_ITEM_PO)
AS MAXROW
FROM
SCHEMA_NAME.PO_COMMENT
WHERE
TYPE_PO_COMMENT='LE' )
SELECT
PO_ID, LINE_ITEM_PO, COMMENT_PO, DATE_ADD_COMMENT FROM INNER WHERE ROW_TIMESTAMP = MAXROW
*** Still some weird things in the big SQL I do not like.
End result ... run time drops into 20 second range from 15 minute range.
After the rewrite optimizer able to push predicate into this inline view and use PO_ID ... an existing multi column index on PO_ID, LINE_ITEM_PO and TYPE_PO_COMMENT can now be used.
I am doing a powerpoint on this for NEOOUG will put up link to it eventually. Powerpoint will include SQL monitor output for bad run and fixed version showing stats and execution plan.
UPDATE: Kim posted an alternative analytic rewrite that looks promising also just a little different than mine in syntax mostly ...
(
SELECT PO_ID
, LINE_ITEM_PO
, COMMENT_PO
, DATE_ADD_COMMENT
FROM (
SELECT POC.PO_ID
, POC.LINE_ITEM_PO
, POC.COMMENT_PO
, POC.DATE_ADD_COMMENT
, ROW_NUMBER() OVER (
PARTITION BY POC.PO_ID
, POC.LINE_ITEM_PO
ORDER BY POC.ROW_TIMESTAMP DESC
) RN
FROM PO_COMMENT POC
WHERE POC.TYPE_PO_COMMENT='LE'
)
WHERE RN = 1
) T193
Hi John,
ReplyDeleteI tried to make the point, although possibly not very well, that the strategies to deal with the requirement to get the "latest/oldest/newest/largest/least" X for a set of data depend very much on how much data you expect.
Both the strategies above are likely to be successful for larger sets of data.
For smaller sets of data, their success is possibly more reliant on whether Oracle chooses to merge / unnest / push predicates.
And that decision can be a potential performance threat.
For smaller sets of data (and here I mean the data that is driving the query, the data for which we are getting this most recent comment) then a possible alternative is the scalar subselect.
For example:
SELECT ...
, (SELECT MAX(comment) KEEP (DENSE_RANK FIRST ORDER BY row_timestamp DESC)
FROM po_comment poc
WHERE poc.po_id = id_from_main_query
AND poc.line_item = line_item_from_main_query
AND poc.comment = 'LE') comment
, (SELECT MAX(date_add_comment) KEEP (DENSE_RANK FIRST ORDER BY row_timestamp DESC)
FROM po_comment poc
WHERE poc.po_id = id_from_main_query
AND poc.line_item = line_item_from_main_query
AND poc.comment = 'LE') comment
....
FROM ...
A couple of things to mention:
Firstly, this example is a little unfortunately in that it's getting two attributes from the latest comment. There are ways to do this in a single subselect but it's probably not worth complicating my comment here.
Secondly, a columnar/scalar subselect forces the code to do nested loop type lookup. There are volume-related downsides to this.
Hope this helps.
Cheers,
Dominic