Feeds:
Posts
Comments

count distinct

To know whether one external_item_key could have more than one subvendorid:

select external_item_key, count(distinct subvendorid)  from dbo.ETL_BULKINSERT_TARGET_HISTORY_POWERSHELL group by external_item_key;

select distinct count(distinct subvendorid)  from dbo.ETL_BULKINSERT_TARGET_HISTORY_POWERSHELL group by external_item_key;

$a='aa','aa1','aa2','bb','bb1','bb2'
$hashT=@{}

foreach ($i in $a)
{
 $re= $i -match '^(..)'
 $type=$matches[1]
 if (!$hashT.ContainsKey($type)){
      $hashT[$type]=@()  ##********   this is the key to treat the element as an array, otherwise it treats it like string
    }
  $hashT[$type]=$hashT[$type]+$i
 }

foreach ($i in $hashT['aa']){
   $i
 }

 foreach ($i in $hashT['bb']){
   $i
 }


In the previous post I said if sql*net is the bottleneck,  then the cpu usage of oracle server is low.

http://mysqloracle9.wordpress.com/2009/12/23/how-much-cpu-will-full-table-scan-use/

The test I did was on solaris T2000 (Both server and client are T2000 but on different machine).  What I observed is the sql*net has a low throughput, but the root cause is that the client side app (sqlplus) used up 100% cpu.  If run sqlplus on PC server, the throughput of sql*net will be much higher and the cpu usage of oracle server also increase to about 60%. The lesson learned here is:  client side could also be the bottleneck. It’s too easy to tune the oracle server and network but forget client side completely.

And for batch job which can’t run in parallel, T2000 is a bad choice, PC server is much better.


It depends, but we can analyse two extremes. Only user mode cpu usage is put here, sys mode cpu is in the output of prstat shown later.
1: If oracle needs to scan lots of blocks for each fetch.

Such as select count(*) using FTS or the where clause filter out 99% of the data read by oracle,  then the bottle neck is at the database side.
A: If most of the blocks are in SGA and little PIO is required.

Then one session could use nearly 100% cpu.
B: If most of the blocks are not in SGA and lots of PIO is required.

Then the session spends most of it’s time on IO. Depending on the storage performance and how much data oracle read in (controlled by DB_FILE_MULTIBLOCK_READ_COUNT), the cpu used varies. In my test with HDS USPV storage, the average cpu usage is about 30% if DB_FILE_MULTIBLOCK_READ_COUNT is 16 and cpu is about 35% if DB_FILE_MULTIBLOCK_READ_COUNT is 128.

2: If oracle returns all the data in a block to client side.

Such as select * from tab_name (of cause the blocks  should not be empty, such as after delete a huge table), then the bottleneck is sql*net. Oracle will spend a little time on cpu to read the data, and then spend more time to wait on sql*net, and then spend time on cpu, and then wait on sql*net, etc. The average cpu usage is low. In my test, the cpu used is only about 5~6%. [updated: if sql*net is the bottleneck, storage capacity is not likely a problem ]

Test 1.A


SQL> select count(*) from test_net1;
COUNT(*)
----------
100000
SQL>
SQL> select a.SPID
2    from v$process a
3   where a.ADDR =
4         (select b.PADDR
5            from v$session b
6           where b.SID = (select sid from v$mystat where rownum = 1));
SPID
------------
27956
SQL>
SQL> declare
2    a int;
3  begin
4    for i in 1..10000
5    loop
6      select count(*) into a from test_net1;
7    end loop;
8  end;
9  /
^Cdeclare
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
ORA-06512: at line 6

TEST-qihua$> prstat -mLc  -p 27956 1 100|grep 27956

27956 oracle    98 0.1 0.0 0.0 0.0 0.0 1.6 0.0   3   5 107   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   1   7  92   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6 102   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   4  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   1   5  92   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   4  94   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   4  94   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   4  96   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   6  94   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   5  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   1   5  92   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6 100   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0   6  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   5  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0   6  91   0 oracle/1
27956 oracle    80 0.0 0.0 0.0 0.0 0.0  20 0.0   4   6  91   1 oracle/1           ==> cancelled. 
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1

Test 1.B & DB_FILE_MULTIBLOCK_READ_COUNT=16


SQL>  alter session set DB_FILE_MULTIBLOCK_READ_COUNT=16;
Session altered.
SQL> select count(*) from huge_table;
^Cselect count(*) from huge_table
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
TEST-qihua$> !!

prstat -mLc -p 27956 1 100 | grep 27956

27956 oracle   6.0 0.2 0.0 0.0 0.0 0.0  94 0.0  11   0  16   0 oracle/1
27956 oracle    21 5.0 0.1 0.0 0.0 0.0  74 0.3 382   4 382   0 oracle/1
27956 oracle    25 5.9 0.1 0.0 0.0 0.0  69 0.4 450   3 450   0 oracle/1
27956 oracle    20 4.9 0.1 0.0 0.0 0.0  74 0.3 367   3 369   0 oracle/1
27956 oracle    24 5.8 0.1 0.0 0.0 0.0  70 0.4 435   5 435   0 oracle/1
27956 oracle    20 4.7 0.1 0.0 0.0 0.0  75 0.3 361   2 361   0 oracle/1
27956 oracle    22 5.1 0.1 0.0 0.0 0.0  73 0.3 387   2 387   0 oracle/1
27956 oracle    22 5.1 0.1 0.0 0.0 0.0  73 0.3 395   2 395   0 oracle/1
27956 oracle    23 5.4 0.1 0.0 0.0 0.0  71 0.3 420   3 420   0 oracle/1
27956 oracle    28 6.6 0.1 0.0 0.0 0.0  65 0.4 510   4 512   0 oracle/1
27956 oracle    25 5.9 0.1 0.0 0.0 0.0  68 0.4 454   4 454   0 oracle/1
27956 oracle    24 5.6 0.1 0.0 0.0 0.0  70 0.3 430   5 430   0 oracle/1
27956 oracle    26 6.1 0.1 0.0 0.0 0.0  67 0.4 473   5 473   0 oracle/1
27956 oracle    26 6.2 0.1 0.0 0.0 0.0  67 0.4 474   5 474   0 oracle/1
27956 oracle    30 6.7 0.1 0.0 0.0 0.0  63 0.4 524   5 524   0 oracle/1
27956 oracle    27 6.3 0.1 0.0 0.0 0.0  66 0.4 489   4 491   0 oracle/1
27956 oracle    24 5.6 0.1 0.0 0.0 0.0  70 0.4 432   5 432   0 oracle/1
27956 oracle    29 6.8 0.1 0.0 0.0 0.0  63 0.4 530   4 530   0 oracle/1
27956 oracle    28 6.3 0.1 0.0 0.0 0.0  65 0.4 477   2 477   0 oracle/1
27956 oracle    27 6.3 0.1 0.0 0.0 0.0  66 0.4 488   3 488   0 oracle/1
27956 oracle    28 6.5 0.1 0.0 0.0 0.0  65 0.4 508   4 508   0 oracle/1
27956 oracle    33 7.1 0.1 0.0 0.0 0.0  60 0.4 544   3 547   0 oracle/1
27956 oracle    29 6.0 0.1 0.0 0.0 0.0  64 0.4 457   6 456   0 oracle/1
27956 oracle    29 6.0 0.1 0.0 0.0 0.0  65 0.4 464   5 464   0 oracle/1
27956 oracle    29 6.3 0.1 0.0 0.0 0.0  64 0.4 476   7 476   0 oracle/1
27956 oracle    24 5.1 0.1 0.0 0.0 0.0  70 0.3 394   3 394   0 oracle/1
27956 oracle    27 5.9 0.1 0.0 0.0 0.0  67 0.4 452   5 452   0 oracle/1
27956 oracle    32 7.1 0.1 0.0 0.0 0.0  61 0.4 549   4 551   0 oracle/1
27956 oracle    28 6.5 0.1 0.0 0.0 0.0  64 0.4 499   3 499   0 oracle/1
27956 oracle    23 5.2 0.1 0.0 0.0 0.0  71 0.3 403   4 403   0 oracle/1
27956 oracle    28 5.6 0.1 0.0 0.0 0.0  66 0.4 431   3 431   0 oracle/1
27956 oracle    26 4.7 0.1 0.0 0.0 0.0  68 0.3 363   4 363   0 oracle/1

Test 1.B & DB_FILE_MULTIBLOCK_READ_COUNT=128


SQL> alter session set DB_FILE_MULTIBLOCK_READ_COUNT=128;
Session altered.
SQL> select count(*) from huge_table;
^Cselect count(*) from huge_table
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

prstat -mLc -p 27956 1 100 | grep 27956

27956 oracle   6.4 0.4 0.0 0.0 0.0 0.0  93 0.0  21   0  26   0 oracle/1
27956 oracle    28 4.3 0.0 0.0 0.0 0.0  68 0.1  73   2  73   0 oracle/1
27956 oracle    32 5.1 0.0 0.0 0.0 0.0  63 0.1  84   6  86   0 oracle/1
27956 oracle    33 5.0 0.0 0.0 0.0 0.0  62 0.1  85   5  86   0 oracle/1
27956 oracle    32 4.8 0.0 0.0 0.0 0.0  63 0.1  83   4  82   0 oracle/1
27956 oracle    29 4.5 0.0 0.0 0.0 0.0  66 0.1  75   4  75   0 oracle/1
27956 oracle    33 5.1 0.0 0.0 0.0 0.0  62 0.1  87   5  87   0 oracle/1
27956 oracle    34 5.2 0.0 0.0 0.0 0.0  61 0.1  89   3  89   0 oracle/1
27956 oracle    33 5.1 0.0 0.0 0.0 0.0  62 0.1  86   6  89   0 oracle/1
27956 oracle    32 4.8 0.0 0.0 0.0 0.0  63 0.1  83   2  83   0 oracle/1
27956 oracle    30 4.5 0.0 0.0 0.0 0.0  65 0.1  77   5  77   0 oracle/1
27956 oracle    28 4.4 0.0 0.0 0.0 0.0  67 0.1  74   3  73   0 oracle/1
27956 oracle    22 3.4 0.0 0.0 0.0 0.0  74 0.1  58   4  58   0 oracle/1
27956 oracle    29 4.4 0.0 0.0 0.0 0.0  66 0.1  72   9  72   0 oracle/1
27956 oracle    33 5.0 0.0 0.0 0.0 0.0  62 0.1  85   3  87   0 oracle/1
27956 oracle    26 4.0 0.0 0.0 0.0 0.0  70 0.1  68   5  68   0 oracle/1
27956 oracle    26 3.9 0.0 0.0 0.0 0.0  70 0.1  63   7  63   0 oracle/1
27956 oracle    34 4.6 0.0 0.0 0.0 0.0  61 0.1  78   5  78   0 oracle/1
27956 oracle    31 4.2 0.0 0.0 0.0 0.0  65 0.1  70   5  70   0 oracle/1
27956 oracle    34 5.0 0.0 0.0 0.0 0.0  61 0.1  83   4  84   0 oracle/1
27956 oracle    29 4.4 0.0 0.0 0.0 0.0  66 0.1  67   5  68   0 oracle/1
27956 oracle    27 4.1 0.0 0.0 0.0 0.0  68 0.1  68   3  68   0 oracle/1
27956 oracle    30 4.5 0.0 0.0 0.0 0.0  65 0.1  76   4  76   0 oracle/1
27956 oracle    30 4.5 0.0 0.0 0.0 0.0  65 0.1  76   5  76   0 oracle/1
27956 oracle    37 4.7 0.0 0.0 0.0 0.0  58 0.1  80   5  80   0 oracle/1
27956 oracle    39 5.5 0.0 0.0 0.0 0.0  55 0.1  87   5  90   0 oracle/1
27956 oracle    37 5.3 0.0 0.0 0.0 0.0  57 0.1  92   3  91   0 oracle/1
27956 oracle    33 4.8 0.0 0.0 0.0 0.0  62 0.1  81   4  81   0 oracle/1
27956 oracle    34 4.3 0.0 0.0 0.0 0.0  61 0.1  71   4  71   0 oracle/1
27956 oracle    41 4.9 0.0 0.0 0.0 0.0  54 0.1  80   4  81   0 oracle/1
27956 oracle    33 4.5 0.0 0.0 0.0 0.0  63 0.1  74   4  73   0 oracle/1
27956 oracle    34 4.7 0.0 0.0 0.0 0.0  61 0.1  78   4  80   0 oracle/1
27956 oracle    32 4.2 0.0 0.0 0.0 0.0  64 0.1  72   5  72   0 oracle/1
27956 oracle    33 4.2 0.0 0.0 0.0 0.0  62 0.1  67   6  67   0 oracle/1
27956 oracle    32 3.8 0.0 0.0 0.0 0.0  65 0.1  61   4  61   0 oracle/1
27956 oracle    35 4.2 0.0 0.0 0.0 0.0  61 0.1  72   5  72   0 oracle/1
27956 oracle    28 4.3 0.0 0.0 0.0 0.0  68 0.1  74   5  74   0 oracle/1
27956 oracle    23 3.9 0.0 0.0 0.0 0.0  73 0.1  65   4  67   0 oracle/1
27956 oracle    27 4.3 0.0 0.0 0.0 0.0  68 0.1  72   4  72   0 oracle/1
27956 oracle    21 3.4 0.0 0.0 0.0 0.0  75 0.1  58   2  58   0 oracle/1
27956 oracle    30 4.2 0.0 0.0 0.0 0.0  66 0.1  69   4  69   0 oracle/1
27956 oracle    29 4.2 0.0 0.0 0.0 0.0  67 0.1  72   3  73   0 oracle/1
27956 oracle    25 4.0 0.0 0.0 0.0 0.0  71 0.1  69   5  68   0 oracle/1
27956 oracle    24 3.9 0.0 0.0 0.0 0.0  72 0.1  62   2  64   0 oracle/1
27956 oracle    29 4.2 0.0 0.0 0.0 0.0  66 0.1  72   5  72   0 oracle/1
27956 oracle    31 4.0 0.0 0.0 0.0 0.0  65 0.1  66   3  66   0 oracle/1
27956 oracle    27 4.3 0.0 0.0 0.0 0.0  69 0.1  71   4  71   0 oracle/1

Test 2: oracle returns all the data in a block to client side, sql*net is the bottle neck


SQL> set autotrace traceonly stat
SQL> select * from test_net1;
100000 rows selected.
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
11481  consistent gets
0  physical reads   ==> No PIO, if PIO involved, then cpu usage will be even lower.
0  redo size
34895444  bytes sent via SQL*Net to client
73814  bytes received via SQL*Net from client
6668  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

prstat -mLc -p 27956 1 100 | grep 27956

27956 oracle   2.8 0.5 0.0 0.0 0.0 0.0  97 0.1  77   1 629   0 oracle/1
27956 oracle   5.4 0.9 0.1 0.0 0.0 0.0  94 0.2 180   2  1K   0 oracle/1
27956 oracle   5.4 1.0 0.1 0.0 0.0 0.0  93 0.1 172   0  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.0 0.1 0.0 0.0 0.0  93 0.1 173   2  1K   0 oracle/1
27956 oracle   5.7 1.1 0.1 0.0 0.0 0.0  93 0.2 173   0  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 174   2  1K   0 oracle/1
27956 oracle   5.4 1.1 0.1 0.0 0.0 0.0  93 0.1 170   2  1K   0 oracle/1
27956 oracle   5.4 1.2 0.1 0.0 0.0 0.0  93 0.1 170   2  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 171   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 174   1  1K   0 oracle/1
27956 oracle   5.6 1.2 0.1 0.0 0.0 0.0  93 0.1 170   3  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 171   3  1K   0 oracle/1
27956 oracle   5.4 1.0 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.4 1.0 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   3  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.7 1.1 0.1 0.0 0.0 0.0  93 0.2 171   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 172   1  1K   0 oracle/1
27956 oracle   5.5 1.0 0.1 0.0 0.0 0.0  93 0.1 171   1  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.2 172   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 173   2  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 173   1  1K   0 oracle/1
27956 oracle   5.4 1.3 0.1 0.0 0.0 0.0  93 0.1 170   1  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 171   3  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 173   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   1  1K   0 oracle/1
27956 oracle   5.5 1.3 0.1 0.0 0.0 0.0  93 0.2 170   4  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.5 1.4 0.1 0.0 0.0 0.0  93 0.1 170   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 172   1  1K   0 oracle/1
27956 oracle   5.6 1.1 0.1 0.0 0.0 0.0  93 0.2 173   0  1K   0 oracle/1
27956 oracle   5.7 1.1 0.1 0.0 0.0 0.0  93 0.2 172   1  1K   0 oracle/1
27956 oracle   2.4 0.4 0.0 0.0 0.0 0.0  97 0.1  65   0 517   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1


If a table is full and insert a new row, oracle will bump up the HWM, format the blocks in memory, and insert the data into the formatted block. During the whole process, no disk IO is required. Oracle doesn’t need to read the block above the HWM from disk as it knows the block above HWM is logically empty. It’s a waste of resource to read block from disk if the block content is already known (empty in this case) to oracle.


And we can also pre-allocate some extent before insert rows into a full table, the blocks in the extent will be put below HWM. But oracle has to go to disk for the block when insert a new row as oracle doesn’t know whether the block is empty or not (if oracle is sure it is empty, oracle could just create the block in memory without touch the disk, and flush the block back to disk when necessary). So in this case, disk IO is required.

1: test 1 (without pre allocate space)


SQL> create table t (a char(200)) pctfree 99 pctused 1;
Table created.
SQL> insert into t select rpad(‘x’,100,’x’) from dual connect by rownum<1000;
999 rows created.
SQL>

Checking the 10046 raw trace file, there is no disk IO wait.
2: test 2 (pre-allocate)

SQL> alter table t allocate extent (size 400M instance 1);
Table altered.
SQL> alter system flush buffer_cache;    — The block could be cached in SGA
System altered.
SQL>  alter session set events ’10046 trace name context forever, level 8′ ;
Session altered.
SQL> insert into t values(rpad(‘x’,100,’x’));
1 row created.
SQL> /
1 row created.
SQL> /
1 row created.
SQL>

TEST$> grep “db file”  TEST_ora_21015.trc


WAIT #1: nam='db file sequential read' ela= 2947 file#=1 block#=141527 blocks=1 obj#=489208 tim=8943453664111
WAIT #1: nam='db file sequential read' ela= 349 file#=201 block#=12133 blocks=1 obj#=489208 tim=8943453664744
WAIT #2: nam='db file sequential read' ela= 1032 file#=201 block#=12134 blocks=1 obj#=489208 tim=8943454366833

We could see there are disk IO in this case. Sometimes we pre-allocate extent to alleviate the HWM enqeue contention issue (another option to alleviate the HWM enqeue contention is to increase _bump_highwater_mark_count ),  but this will increase the disk IO request. If the IO is the bottleneck, pre-allocate might not be  a good choice.


For OLTP, oracle has been proved to be the best choice regarding the performance and scalability, exadata just enhances its leadership in this direction. But for data warehouse, could exadata compete and succees against teradata by supporting more IOPS, larger cache and interconnect throughput?


Teradata has the built-in parallelism and it’s cheaper to run in parallel because of it’s share nothing architecture as there is less coordination overhead. For oracle to run in parallel, slaves in different slave sets have to exchange message. The total memory for exchanging message is X*degree*degree*parallel_execution_message_size. X is 3 for non-RAC and X is 4 for RAC. Huge teredata databases could have more than 2,000 nodes. To get oracle reach the same parallelism, how much memory does oracle need to exchange message between slaves? If set parallel_execution_message_size to 64K, the maximum size we could set (for large query,  the more parallel_execution_message_size the better), so oracle needs 3*2000*2000*64k=768G memory for non-RAC,  or 4*2000*2000*64k=1T memory for RAC.  Typically exadata is used as RAC, so it needs 1T memory purely for message exchange. And what’s the overhead of manage this 1T memory? I have no box to test it out.  Another problem is that when the producer generates message faster than the consumer consumes the message, then the buffer (parallel_execution_message_size) for message will be full, and the producer has to wait until there are free space in the buffer. This a not problem if producer is always faster than consumer, but it’s a problem if sometime producer is faster and sometime consumer is faster.

At any time at most two slave set can work at the same time, and each slave set works for the same step in an execution plan,  if a sql runs in degree of 1000 and one slave is slow,  all other 999 slaves have to wait for it to complete before execute the next step. For a hash join with 10 tables,  totally we have 9 joins, each join will complete until the slowest slave completed. Oracle doc suggests to use denormalized for data warehouse, and teradata doc said denormalization is not needed as join in teradata is cheap. Do you like denormalization? More often than not, we have to do the denormalization because of performance issue, if the performance is not a problem, why use denormalization?

If not use partition table, oracle has to hash every row in the table to know which slave the row should go to, if use partition table AND the join column is the partition key, then oracle could use partition wise join or partial partition wise join, in this way oracle doesn’t need to hash every row. Teradata also has to distribute the rows in one node to other nodes if the join column is not the primary index (which is used to determine which node the row should sit in), but the mechanism behind is simple and efficient: distribute the rows in one node to all others node based on the join column and put distributed rows in a pool, there is no limit as parallel_execution_message_size in oracle. The pool teradata used is just like temporary tablespace which needs physical IO to perform against it,  you might say memory is faster than disk, but for large query, who expects the data could be cached fully in memory?

We have a huge partition table (25 partitions) on db1, and need to copy it to another database db2. The straightforward way I could think of is to insert in parallel over network. So run the command on db2:

insert /*+ append parallel(a,4) */ into tab1 a select /*+ parallel(b,4) */ * from tab1@db1;

At the remote DB side db1, I could see 4 slaves are waiting for “PX Deq Credit: send blkd”, and the coordinator is waiting for “more data to client”.  That means 4 slaves are waiting for the coordinator to consume the data, and the coordinator is waiting for data to be transferred over network. The bottleneck here is the network. To improve the transfer speed over network, I need kick off more non-parallel sql instead of 1 sql with parallel.

For a partition table with 25 partitions, if I use 5 separate non-parallel sql (each load 5 partitions) to load the data , then we could have 5 inter-database connections instead of 1 inter-database connection, in this way we could get better network throughput.  (update 25 Dec, 2009:  I saw many cases where the client side used 100% of a single cpu, so the network throughput can’t increase even there are still lots of free bandwidth, increase sessions could help)

So what need to do is:

1: Remove the hint of parallel(b,4). This hint is of no use, because the speed to read the data from the table is quicker than the speed to send the data to network. FTS is not the bottleneck.

2: one sql statement inserts data from one partition, and kick off 5 sessions, each session run 5 of the following sql.

insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part00);
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part01);
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part02);
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part03);
…………………
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part24);

After the change, the data transferred per second increases a lot,

Follow

Get every new post delivered to your Inbox.