Thursday, February 13, 2014

Does the LIMIT clause on a BULK COLLECT select really matter?

Mostly I teach SQL optimization because SQL has the most impact on performance, with performance being from TIME perspective.  But there is more to performance then time.  Scalability is really at the heart of performance.  And your SQL might not scale well for a other things not directly related to time.  Like memory usage.  If your SQL consumes a lot of memory, then it may run fast but could scale horribly and hence not run well under pressure.

In PL/SQL land there is a great feature to read in a set of rows from a table (or tables) called BULK COLLECT.  It's a great way to read in a set of rows at a time, then process then rather than trying to get one row at a time.  And there is a LIMIT clause that can be used to limit the number of rows coming back.  If working with a few rows (less than 100 or so) then you can likely not use this and be fine, but if you're working with a larger set of rows you really should set this to something realistic.

Here is a simple example to show what happens with the setting of LIMIT.

First, I have a routine to get PGA space used:

CREATE OR REPLACE FUNCTION pga_memory_used
RETURN NUMBER
AS
    l_used_memory  NUMBER;
BEGIN
    SELECT ms.value
      INTO l_used_memory
      FROM v$mystat ms,
           v$statname sn
     WHERE ms.statistic# = sn.statistic#
       AND sn.name = 'session pga memory';
    RETURN l_used_memory;
END;
/


Now I'll run this little code block to illustrate the effects of the LIMIT clause. I'll run it twice once with limit set to 5000 and then again will it set to 1000.  The table BIG_TAB has about 2,868,512 rows and the average row length is 115.  (NOTE: The call to bigtab_process after the fetch call is to a stub procedure in this case, it does nothing.)

DECLARE
   l_start_mem  number;
   l_start_time pls_integer;
   l_start_cpu  pls_integer;
   CURSOR bigtab_cur IS
   SELECT * FROM big_tab;
   TYPE big_tab_t IS TABLE OF big_tab%ROWTYPE;
   bigtab big_tab_t;
BEGIN
   l_start_mem  := pga_memory_used;
   l_start_time := DBMS_UTILITY.GET_TIME;
   l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
   OPEN bigtab_cur;
   LOOP
     FETCH bigtab_cur
     BULK COLLECT INTO bigtab LIMIT 5000;    
     EXIT WHEN bigtab.COUNT = 0;
     bigtab_process;
   END LOOP;
   DBMS_OUTPUT.put_line ('----------------------------');
   DBMS_OUTPUT.put_line ('--- BULK COLLECTION DEMO ---');
   DBMS_OUTPUT.put_line ('The limit clause set to 5000');
   DBMS_OUTPUT.put_line ('Times in hundredths of a second');
   DBMS_OUTPUT.put_line ('**** TIME   - '||to_char(DBMS_UTILITY.get_time - l_start_time));
   DBMS_OUTPUT.put_line ('**** CPU    - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
   DBMS_OUTPUT.put_line ('**** Memory '||to_char(((pga_memory_used - l_start_mem))/1024,'9,999,999')||' KB');
END;
/


And now the run results:


----------------------------
--- BULK COLLECTION DEMO ---
The limit clause set to 5000
Times in hundredths of a second
**** TIME   - 548
**** CPU    - 520
**** Memory      7,552 KB

----------------------------
--- BULK COLLECTION DEMO ---
The limit clause set to 1000
Times in hundredths of a second
**** TIME - 537
**** CPU  - 487
**** Memory      2,112 KB



Notice that the time (wall clock time) doesn't really change much, the CPU is less in the second one but look at the memory usage.  A rather significant drop!  And yes this is consistent over multiple runs, the actually numbers vary but the second one always uses something like half or less memory and less CPU. 

You might say, hey 7M vs 2M what's the big deal?  I have gigabytes of PGA space!  Sure on a one run bases this isn't a big deal. But multiple this by a few 1000 users.  Now it might make a difference.

If you're using BULK COLLECT (which you should be!) then check the LIMIT clause, could save some memory and could scale your application much better.

2 comments:

  1. But how do you decide what number to use in the LIMIT clause?

    ReplyDelete
  2. HI Prasanta, That is a bit of art. For the most part it depends on two things, how much memory do you have and how big are the rows you're working with. If you're unsure and don't want to do testing like I show in the post, 1000 is a decent number to work with. Of course if you're returning much larger rows you may want to reduce that to say 500. This example the rows were only 115 bytes on average.

    My advice is to test-test-test. Use 1000 as a start point then increase or decrease from there.

    Hope that helps!

    ReplyDelete