Thursday, January 24, 2013

Not In vs Not Exists

My last post was about IN and EXISTS but what about the NOT versions of these?  Are they the same also?  Nothing bets a test so let's take a look.

This time around I want to find out if there are any departments without any employees assigned.  First let's give it a go with the NOT IN.

select * from dept where deptno not in (select deptno from emp ) ;

This seems like a good idea, basicly we are saying give me all the dapartments that are in the DEPT but not in a list of departmets from the EMP table.  We get this results:

no rows selected

Which looks fine, we think there aren't any.   But what if I change it to a NOT EXISTS.

select * from dept d where not exists (select deptno from emp where deptno = d.deptno);

now I get:

         DEPTNO DNAME          LOC
--------------- -------------- -------------
             40 OPERATIONS     BOSTON

1 row selected.


Well that's not very nice!  It looks like I got wrong results here somewhere.  And I did.  The first one was incorrect, why?  Let's take a look at the plans for each, the NOT IN gives me this plan:

--------------------------------------------------------
| Id  | Operation                     | Name           |
--------------------------------------------------------
|   0 | SELECT STATEMENT              |                |
|   1 |  MERGE JOIN ANTI NA           |                |
|   2 |   SORT JOIN                   |                |
|   3 |    TABLE ACCESS BY INDEX ROWID| DEPT           |
|   4 |     INDEX FULL SCAN           | DEPT_DEPTNO_PK |
|*  5 |   SORT UNIQUE                 |                |
|   6 |    TABLE ACCESS FULL          | EMP            |
--------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("DEPTNO"="DEPTNO")
       filter("DEPTNO"="DEPTNO")


The NOT EXISTS gives me this plan:

-------------------------------------------
| Id  | Operation          | Name         |
-------------------------------------------
|   0 | SELECT STATEMENT   |              |
|   1 |  NESTED LOOPS ANTI |              |
|   2 |   TABLE ACCESS FULL| DEPT         |
|*  3 |   INDEX RANGE SCAN | EMP_DEPT_IDX |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPTNO"="D"."DEPTNO")
       filter("DEPTNO" IS NOT NULL)




What gives?  Why these to different plans and what is that "ANTI NA" thing in the NOT IN plan.  It's a logic thing that is the culprit. With the NOT IN if there is a null in the inner set the entire set is nullified.  In the EMP table, the employee King (the president, what a great name for the company president don't ya think?) is not in a department so his DEPTNO is NULL that ends up nullifying the entire set and we get no results. 

The "NA" in the join step stands for "Null Aware", which kicks in this logical rule that once there is a null in there, it's over.


So what to do?  Well if you want to use NOT IN, then make sure you do something about the nulls. Like this:


select * from dept 
where deptno 
not in (select deptno from emp where deptno is not null) ;


Now you get the same results and teh exact same plan as NOT EXISTS.  Try and see for yourself.

So IN and EXISTS are really the same, and NOT IN and NOT EXISTS can be, so long as you do something about those nulls.


One question that this query is able to answer is a question that has plagued many Oracle professions for many years, why is it that is company has never made any money?  We now know, there is no one in the operations department!



1 comment: