Monday, May 21, 2018

The Data Block, the center of the cloud


From my point of view the data block is really the magic of Oracle.  Without this basic part of the system all the super cool stuff within Oracle just wouldn’t be possible or would be much more complex.   An example of this is row locking. 

The locking model in Oracle at its core is pretty simple.  You lock rows while you are changing them.  This could be an insert update or delete.  And when you lock a row that is exactly what you lock, just that row.  If you lock every row in the table except one, you have a lock on every row except one.  Oracle never escalates a lock; there isn’t a “block lock” in Oracle. 

There is block pinning which is a latching of the block while you’re in the block.  That is held for the very brief moment of time while you’re in the block doing something (even a select).  This is done to protect the block while changes happen to the block but is released as soon as you’re done in the block.  Any rows locked while you were there stay locked after you release the pin.  This pinning has to be done even in a select because you might do some maintenance on the block during the select.

So how does Oracle do this?  The locking is built into the block and rows.  In the header of the block are ITL (Interested Transaction List) entries.   When we lock one or more rows in the block we take out an ITL.  Within each row is a lock byte, it’s the second byte of each row.  This lock byte points to the ITL that has that row lock, or did have it lock at some point.  It may point to an ITL that is competed (a committed transaction).

The ITL in turn points to the Rollback (Undo) segment header that holds the transactional information.   In this way each row is able to be locked by different transactions within the same block and over many blocks with no additional overhead.   This diagram gives the basic overall view of what happens for locking data.


Of course there are limits.  The block at the very max can hold only 255 ITL entries, which is highly unlikely that you will get to this.  Each one is about 23 bytes, and takes up space in the header of the block.   For a table there is 1 ITL in the block when it’s first created and can go up to 255.  There are parameters you can set INITTRANS and MAXTRANS.  INITTRANS is how many to allocate to the blocks when first added to the table and MAXTRANS is the max it will go to.   Since version 10, MAXTRANS ignored, so even if you set it to say 100, it will be set to the max of 255 anyway. 


Since you can only have so many ITL entries in a block it is possible that someone will want to lock rows in a particular block and there aren’t enough ITL entries and there isn’t enough free space to allocate a new one.  When this happens you’ll see this wait event “enq: TX - allocate ITL entry”.  If this is happening a lot for a given table, consider recreating the table with a higher INITTRANS.  Altering the table to add a higher INITTRANS will only cause new blocks in the table to have more, you need to recreate the table with the change to have it applied to all blocks of the table.


There is not an easy way to see how many ITL entries there are in a block.  You have to dump the block, and then look at the block header information to see the ITL entries.  They aren’t had to see, they look something like this:
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0001.01f.00007821  0x01400236.1fdc.10  ----    3  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

The block dump command looks like this (this example just dumps block 12 of datafile 6):
ALTER SYSTEM DUMP DATAFILE 6 BLOCK MIN 12 BLOCK MAX 12;

Doing this across a table to get an idea of how many ITL entries there are is rather tedious.   And in the end likely more work than it’s worth.    Better is to get an idea of how many independent transactions there are on a hot table then raise it to that plus some to give some wiggle room.   So you know that about 30 transactions are likely going on in the table at once, a setting of 35-40 is likely good.   

Just keep in mind that each entry is permanently there and never goes away, even when dynamically allocated.   It’s around 23 bytes of space and that is space not available in the block for data.  This means that raising it to a higher value could make the table bigger.  This space trade off could well be worth it if there if this event is a serious performance problem.

To wrap up, this is magic of Oracle.  Way back at version 6 time is when this basic block and row structure was put in place.  It hasn’t changed much really since then.   I tip my hat to those folks who all those years ago that made all this possible.   In this post I focused on the row locking inside the block which is a huge thing but not the only thing that makes Oracle’s block structure so powerful.  In the future I plan to have other posts on other parts.

Saturday, April 21, 2018

ANSI joins and QB_NAME in the cloud or not

I’m a huge fan of the QB_NAME hint.  It allows you to name your query blocks and can help with debugging.  Your queries wouldn’t run faster or slower with this hint but it certainly helps when you have many subqueries and/or large UNION queries.  Also I’m an old dog so I still write my joins in the WHERE clause not the way new hip way of writing them in the FROM clause.  Because of this I didn’t know of a problem with QB_NAME until just this past week.

When using the QB_NAME hint and the new FROM clause join (often called ANSI joins), the QB_NAME is not used.  Below I have an example that illustrates this issue.  The query is an outer join between two tables.  Oracle does a rewrite and make an in line view of the count on the big_tab table and then joins it to the allusers_tab table. 

Of interest is that in the “classic syntax” even though the QB_NAME is mauled by the rewrite; it does survive to be used as the table alias, look below the plans in the Query Block Name / Object Alias section.  However when the join is in the FROM (the newer ANSI syntax) the query is also rewritten but this time the QB_NAME is completely gone.

EDIT: The database used for this test: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 

I did a 10053 trace (a trace of a hard parse) on the ANSI join query.  At the top of the trace the query block signature gets changed.  The QB_NAME starts as:
signature (): qb_name=MAIN nbfros=1 flg=0

Then right after is this line:
signature (): qb_name=SEL$1 nbfros=2 flg=0

I’m not sure what nbfros is but I’m guessing that this is some sort of “level” and that at a 2 this is a rewrite of the query or some such, and it takes precedence over the 1.

Also from the 10053 trace I found it interestingly that the join was actually converted back into the classic syntax.  I saw this in the trace:
Stmt: ******* UNPARSED QUERY IS *******
SELECT /*+ QB_NAME ("MAIN") */ "A"."USERNAME" "USERNAME",COUNT("B"."OWNER") "CNT" FROM "OP"."ALLUSERS_TAB" "A","OP"."BIG_TAB" "B" WHERE "A"."USERNAME"="B"."OWNER"(+) GROUP BY "A"."USERNAME" ORDER BY "A"."USERNAME"

However this is not the query that is parsed, the one below is the one that was parsed.  Notice it still has the classic outer join syntax in it and it has the QB_NAME hint, but it's effectively ignored.

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT /*+ QB_NAME ("MAIN") */ "A"."USERNAME" "USERNAME",TO_NUMBER(TO_CHAR(NVL("VW_GBC_5"."ITEM_2",0))) "CNT" FROM  (SELECT "B"."OWNER" "ITEM_1",COUNT("B"."OWNER") "ITEM_2" FROM "OP"."BIG_TAB" "B" GROUP BY "B"."OWNER") "VW_GBC_5","OP"."ALLUSERS_TAB" "A" WHERE "A"."USERNAME"="VW_GBC_5"."ITEM_1"(+) ORDER BY "A"."USERNAME"


OK now here is the test, first the code I used for my test.  The file name is ansi_join_test1.sql

set serveroutput off

select /*+ qb_name(MAIN) */ a.username, count(owner) cnt
from allusers_tab a, big_tab b
where a.username = b.owner(+)
group by a.username
order by a.username
/

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'TYPICAL IOSTATS LAST ALIAS'))
/

select /*+ qb_name(MAIN) */ a.username, count(owner) cnt
from allusers_tab a left outer join big_tab b on a.username = b.owner
group by a.username
order by a.username
/

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'TYPICAL IOSTATS LAST ALIAS'))
/


The sample run -------------------------------------------------------------------------------------------

SQL> @ansi_join_test1

USERNAME                                             CNT
---------------------------------------- ---------------
ANONYMOUS                                              0
APEX_040200                                        96672

There are 48 total rows returned, delete most to conserve space. 

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------

SQL_ID  0yy8murg5phnw, child number 0
-------------------------------------
select /*+ qb_name(MAIN) */ a.username, count(owner) cnt from
allusers_tab a, big_tab b where a.username = b.owner(+) group by
a.username order by a.username

Plan hash value: 2416054041

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |      1 |        |       | 11755 (100)|          |     48 |00:00:00.34 |   42681 |
|   1 |  MERGE JOIN OUTER     |             |      1 |     48 |  1344 | 11755   (2)| 00:00:01 |     48 |00:00:00.34 |   42681 |
|   2 |   INDEX FULL SCAN     | USERNAME_PK |      1 |     48 |   432 |     1   (0)| 00:00:01 |     48 |00:00:00.01 |       5 |
|*  3 |   SORT JOIN           |             |     48 |     35 |   665 | 11754   (2)| 00:00:01 |     34 |00:00:00.34 |   42676 |
|   4 |    VIEW               | VW_GBC_5    |      1 |     35 |   665 | 11753   (2)| 00:00:01 |     35 |00:00:00.34 |   42676 |
|   5 |     HASH GROUP BY     |             |      1 |     35 |   210 | 11753   (2)| 00:00:01 |     35 |00:00:00.34 |   42676 |
|   6 |      TABLE ACCESS FULL| BIG_TAB     |      1 |   2422K|    13M| 11657   (1)| 00:00:01 |   2422K|00:00:00.16 |   42676 |
-------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$85E9EE9D
   2 - SEL$85E9EE9D / A@MAIN                         Notice MAIN is still here
   4 - SEL$E0E6E493 / VW_GBC_5@SEL$ED2A7381
   5 - SEL$E0E6E493
   6 - SEL$E0E6E493 / B@MAIN

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."USERNAME"="ITEM_1")
       filter("A"."USERNAME"="ITEM_1")


USERNAME                                             CNT
---------------------------------------- ---------------
ANONYMOUS                                              0
APEX_040200                                        96672

There are 48 total rows returned, delete most to conserve space. 

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------


SQL_ID  b9bvggwsyx6uy, child number 0
-------------------------------------
select /*+ qb_name(MAIN) */ a.username, count(owner) cnt from
allusers_tab a left outer join big_tab b on a.username = b.owner group
by a.username order by a.username

Plan hash value: 2416054041

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |      1 |        |       | 11755 (100)|          |     48 |00:00:00.33 |   42681 |
|   1 |  MERGE JOIN OUTER     |             |      1 |     48 |  1344 | 11755   (2)| 00:00:01 |     48 |00:00:00.33 |   42681 |
|   2 |   INDEX FULL SCAN     | USERNAME_PK |      1 |     48 |   432 |     1   (0)| 00:00:01 |     48 |00:00:00.01 |       5 |
|*  3 |   SORT JOIN           |             |     48 |     35 |   665 | 11754   (2)| 00:00:01 |     34 |00:00:00.33 |   42676 |
|   4 |    VIEW               | VW_GBC_5    |      1 |     35 |   665 | 11753   (2)| 00:00:01 |     35 |00:00:00.33 |   42676 |
|   5 |     HASH GROUP BY     |             |      1 |     35 |   210 | 11753   (2)| 00:00:01 |     35 |00:00:00.33 |   42676 |
|   6 |      TABLE ACCESS FULL| BIG_TAB     |      1 |   2422K|    13M| 11657   (1)| 00:00:01 |   2422K|00:00:00.15 |   42676 |
-------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$6A6A6CC9
   2 - SEL$6A6A6CC9 / A@SEL$1                         Notice that MAIN is gone.
   4 - SEL$8D772734 / VW_GBC_5@SEL$920000A1
   5 - SEL$8D772734
   6 - SEL$8D772734 / B@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."USERNAME"="ITEM_1")
       filter("A"."USERNAME"="ITEM_1")

SQL>

 To get the 10053 trace I used:
exec dbms_sqldiag.dump_trace(p_sql_id=>'b9bvggwsyx6uy',  p_child_number=>0, p_component=>'Optimizer');

To create the tables used I used this:

drop table big_tab ;
drop table allusers_tab ;

create table big_tab as select * from all_objects;
insert /*+ append */ into big_tab select * from big_tab;
commit;
insert /*+ append */ into big_tab select * from big_tab;
commit;
insert /*+ append */ into big_tab select * from big_tab;
commit;
insert /*+ append */ into big_tab select * from big_tab;
commit;
insert /*+ append */ into big_tab select * from big_tab;

create index big_idx on big_tab(object_id);
create index big_objtype_idx on big_tab(object_type);

create table allusers_tab as select * from all_users ;

alter table allusers_tab
add constraint username_pk
primary key (username) ;

EXEC DBMS_STATS.SET_TABLE_PREFS('OP','BIG_TAB','METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT');
EXEC DBMS_STATS.SET_TABLE_PREFS('OP','ALLUSERS_TAB','METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT');

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'BIG_TAB',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'ALLUSERS_TAB',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')