Seemingly Silly SQL
Recently I had to fix a report that was broken from some changes we made while implementing a new financial tracking process for our customer. A previous development team had implemented work order related cost tracking without using Maximo’s out-of-box financial tracking functionality. They created a table which stored certain dollar amounts associated with their process. There is a row per dollar amount entered. The customer wanted to use more of the out-of-box financial tracking capabilites and expand their usage of the application. As a part of these changes, we stopped using this custom table, but preserved the capability to enter in a sequence of dollar amounts (pending a larger process overhaul) by using attributes on the work order table itself.
This change had the side effect of breaking a report that used this custom table. The top portion of the report was trivial to change to pull its values from the new places, but the bottom portion has a table which displayed rows from that custom table we no longer use. We did not want to change the structure or functionality of the report, so I now had to find a way to output attributes that were now a part of a single row/record, in multiple rows in order to populate the table on the report.
Take this list of attributes as an example
wonum, fundingnum, fund0, fund1, fund2, fund3
To output this one row, as four rows, I used UNION
1 | SELECT |
It sure looks silly, but it outputs the data in rows to populate the table on the report properly and Oracle’s explain plan said the cost was 40, great!
Then I got curious, how bad would the cost get if I put the WHERE
clause on the outer SELECT
instead of on the individual SELECT
statements.
1 | SELECT |
This increased the query cost dramatically. The cost according to the explain plan was somewhere in the millions, while the cost of the other version was just 40. The reason for this, is that in the expensive query each SELECT
is pulling in the entire workorder table, and each UNION
is de-duplicating the results of each SELECT
. This query has to do this four times before it can finally filter down the results to the one record needed. In the performant query, each SELECT
is only returning one row, and since the row returned by each SELECT
is different, the UNION
doesn’t have to de-duplicate. Plus the dataset is already much smaller. Imagine how bad it would be if I had used UNION ALL
instead of UNION
!
The lesson here is that it matters where you put your WHERE
clauses in queries. You have to keep in mind the size of the datasets you are pulling in, and how much work every operation is going to have to do.
Also, you aren’t always going to have ideal data to work with and….sometimes…you may have to do silly things like balloon one row into four…
I did put in a comment saying something to the effect of /* I know this looks dumb but...*/