Wednesday, December 19, 2012

fighting OLTP fires with analytic function

In my prior post I showed a bad inline view that was causing a huge performance hit.  Running in an OLTP environment no less but well thats kind of outside of the solution.

It was bugging me all weekend and with the help of Mladen I came up with a solution.

The original SQL ( again part of an inline view in a much bigger piece of SQL ) looked like this:


SELECT POC.PO_ID,
       POC.LINE_ITEM_PO,
       POC.COMMENT_PO,
       POC.DATE_ADD_COMMENT
       FROM SCHEMA_NAME.PO_COMMENT POC
       WHERE (POC.PO_ID,
              POC.LINE_ITEM_PO,
              POC.ROW_TIMESTAMP)
       IN
       
       (SELECT A.PO_ID,
               A.LINE_ITEM_PO,
               MAX(A.ROW_TIMESTAMP)
       FROM SCHEMA_NAME.PO_COMMENT A /* The same table */
      
       WHERE A.TYPE_PO_COMMENT='LE‘
       GROUP BY A.PO_ID,
                A.LINE_ITEM_PO) ) 

After a rewrite with analytic function ( MAX(column) OVER ( original inner group by ) it now looks like this ...


WITH INNER AS
( SELECT PO_ID, LINE_ITEM_PO, COMMENT_PO, DATE_ADD_COMMENT, ROW_TIMESTAMP,
      
 MAX(ROW_TIMESTAMP) OVER (PARTITION BY PO_ID,LINE_ITEM_PO) AS MAXROW
     
 FROM SCHEMA_NAME.PO_COMMENT
 WHERE   TYPE_PO_COMMENT='LE' )
SELECT PO_ID, LINE_ITEM_PO, COMMENT_PO, DATE_ADD_COMMENT FROM INNER WHERE ROW_TIMESTAMP = MAXROW

*** Still some weird things in the big SQL I do not like.

End result ... run time drops into 20 second range from 15 minute range.

After the rewrite optimizer able to push predicate into this inline view and use PO_ID ... an existing multi column index on PO_ID, LINE_ITEM_PO and TYPE_PO_COMMENT can now be used.

I am doing a powerpoint on this for NEOOUG will put up link to it eventually.  Powerpoint will include SQL monitor output for bad run and fixed version showing stats and execution plan.

UPDATE: Kim posted an alternative analytic rewrite that looks promising also just a little different than mine in syntax mostly ...

(
SELECT PO_ID
, LINE_ITEM_PO
, COMMENT_PO
, DATE_ADD_COMMENT
FROM (
SELECT POC.PO_ID
, POC.LINE_ITEM_PO
, POC.COMMENT_PO
, POC.DATE_ADD_COMMENT
, ROW_NUMBER() OVER (
PARTITION BY POC.PO_ID
, POC.LINE_ITEM_PO
ORDER BY POC.ROW_TIMESTAMP DESC
) RN
FROM PO_COMMENT POC
WHERE POC.TYPE_PO_COMMENT='LE'
)
WHERE RN = 1
) T193

1 comment:

  1. Hi John,

    I tried to make the point, although possibly not very well, that the strategies to deal with the requirement to get the "latest/oldest/newest/largest/least" X for a set of data depend very much on how much data you expect.

    Both the strategies above are likely to be successful for larger sets of data.

    For smaller sets of data, their success is possibly more reliant on whether Oracle chooses to merge / unnest / push predicates.

    And that decision can be a potential performance threat.

    For smaller sets of data (and here I mean the data that is driving the query, the data for which we are getting this most recent comment) then a possible alternative is the scalar subselect.

    For example:
    SELECT ...
    , (SELECT MAX(comment) KEEP (DENSE_RANK FIRST ORDER BY row_timestamp DESC)
    FROM po_comment poc
    WHERE poc.po_id = id_from_main_query
    AND poc.line_item = line_item_from_main_query
    AND poc.comment = 'LE') comment
    , (SELECT MAX(date_add_comment) KEEP (DENSE_RANK FIRST ORDER BY row_timestamp DESC)
    FROM po_comment poc
    WHERE poc.po_id = id_from_main_query
    AND poc.line_item = line_item_from_main_query
    AND poc.comment = 'LE') comment
    ....
    FROM ...

    A couple of things to mention:

    Firstly, this example is a little unfortunately in that it's getting two attributes from the latest comment. There are ways to do this in a single subselect but it's probably not worth complicating my comment here.

    Secondly, a columnar/scalar subselect forces the code to do nested loop type lookup. There are volume-related downsides to this.

    Hope this helps.

    Cheers,
    Dominic

    ReplyDelete