Feeds:
Posts
Comments

Archive for December, 2009

Client could be the bottleneck.


In the previous post I said if sql*net is the bottleneck,  then the cpu usage of oracle server is low.

https://mysqloracle9.wordpress.com/2009/12/23/how-much-cpu-will-full-table-scan-use/

The test I did was on solaris T2000 (Both server and client are T2000 but on different machine).  What I observed is the sql*net has a low throughput, but the root cause is that the client side app (sqlplus) used up 100% cpu.  If run sqlplus on PC server, the throughput of sql*net will be much higher and the cpu usage of oracle server also increase to about 60%. The lesson learned here is:  client side could also be the bottleneck. It’s too easy to tune the oracle server and network but forget client side completely.

And for batch job which can’t run in parallel, T2000 is a bad choice, PC server is much better.

Advertisements

Read Full Post »


It depends, but we can analyse two extremes. Only user mode cpu usage is put here, sys mode cpu is in the output of prstat shown later.
1: If oracle needs to scan lots of blocks for each fetch.

Such as select count(*) using FTS or the where clause filter out 99% of the data read by oracle,  then the bottle neck is at the database side.
A: If most of the blocks are in SGA and little PIO is required.

Then one session could use nearly 100% cpu.
B: If most of the blocks are not in SGA and lots of PIO is required.

Then the session spends most of it’s time on IO. Depending on the storage performance and how much data oracle read in (controlled by DB_FILE_MULTIBLOCK_READ_COUNT), the cpu used varies. In my test with HDS USPV storage, the average cpu usage is about 30% if DB_FILE_MULTIBLOCK_READ_COUNT is 16 and cpu is about 35% if DB_FILE_MULTIBLOCK_READ_COUNT is 128.

2: If oracle returns all the data in a block to client side.

Such as select * from tab_name (of cause the blocks  should not be empty, such as after delete a huge table), then the bottleneck is sql*net. Oracle will spend a little time on cpu to read the data, and then spend more time to wait on sql*net, and then spend time on cpu, and then wait on sql*net, etc. The average cpu usage is low. In my test, the cpu used is only about 5~6%. [updated: if sql*net is the bottleneck, storage capacity is not likely a problem ]

Test 1.A


SQL> select count(*) from test_net1;
COUNT(*)
----------
100000
SQL>
SQL> select a.SPID
2    from v$process a
3   where a.ADDR =
4         (select b.PADDR
5            from v$session b
6           where b.SID = (select sid from v$mystat where rownum = 1));
SPID
------------
27956
SQL>
SQL> declare
2    a int;
3  begin
4    for i in 1..10000
5    loop
6      select count(*) into a from test_net1;
7    end loop;
8  end;
9  /
^Cdeclare
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
ORA-06512: at line 6

TEST-qihua$> prstat -mLc  -p 27956 1 100|grep 27956

27956 oracle    98 0.1 0.0 0.0 0.0 0.0 1.6 0.0   3   5 107   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   1   7  92   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6 102   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   4  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   1   5  92   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   4  94   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   4  94   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   4  96   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   3   6  94   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   5  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   1   5  92   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   6 100   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0   6  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   2   5  93   0 oracle/1
27956 oracle   100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0   6  91   0 oracle/1
27956 oracle    80 0.0 0.0 0.0 0.0 0.0  20 0.0   4   6  91   1 oracle/1           ==> cancelled. 
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1

Test 1.B & DB_FILE_MULTIBLOCK_READ_COUNT=16


SQL>  alter session set DB_FILE_MULTIBLOCK_READ_COUNT=16;
Session altered.
SQL> select count(*) from huge_table;
^Cselect count(*) from huge_table
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
TEST-qihua$> !!

prstat -mLc -p 27956 1 100 | grep 27956

27956 oracle   6.0 0.2 0.0 0.0 0.0 0.0  94 0.0  11   0  16   0 oracle/1
27956 oracle    21 5.0 0.1 0.0 0.0 0.0  74 0.3 382   4 382   0 oracle/1
27956 oracle    25 5.9 0.1 0.0 0.0 0.0  69 0.4 450   3 450   0 oracle/1
27956 oracle    20 4.9 0.1 0.0 0.0 0.0  74 0.3 367   3 369   0 oracle/1
27956 oracle    24 5.8 0.1 0.0 0.0 0.0  70 0.4 435   5 435   0 oracle/1
27956 oracle    20 4.7 0.1 0.0 0.0 0.0  75 0.3 361   2 361   0 oracle/1
27956 oracle    22 5.1 0.1 0.0 0.0 0.0  73 0.3 387   2 387   0 oracle/1
27956 oracle    22 5.1 0.1 0.0 0.0 0.0  73 0.3 395   2 395   0 oracle/1
27956 oracle    23 5.4 0.1 0.0 0.0 0.0  71 0.3 420   3 420   0 oracle/1
27956 oracle    28 6.6 0.1 0.0 0.0 0.0  65 0.4 510   4 512   0 oracle/1
27956 oracle    25 5.9 0.1 0.0 0.0 0.0  68 0.4 454   4 454   0 oracle/1
27956 oracle    24 5.6 0.1 0.0 0.0 0.0  70 0.3 430   5 430   0 oracle/1
27956 oracle    26 6.1 0.1 0.0 0.0 0.0  67 0.4 473   5 473   0 oracle/1
27956 oracle    26 6.2 0.1 0.0 0.0 0.0  67 0.4 474   5 474   0 oracle/1
27956 oracle    30 6.7 0.1 0.0 0.0 0.0  63 0.4 524   5 524   0 oracle/1
27956 oracle    27 6.3 0.1 0.0 0.0 0.0  66 0.4 489   4 491   0 oracle/1
27956 oracle    24 5.6 0.1 0.0 0.0 0.0  70 0.4 432   5 432   0 oracle/1
27956 oracle    29 6.8 0.1 0.0 0.0 0.0  63 0.4 530   4 530   0 oracle/1
27956 oracle    28 6.3 0.1 0.0 0.0 0.0  65 0.4 477   2 477   0 oracle/1
27956 oracle    27 6.3 0.1 0.0 0.0 0.0  66 0.4 488   3 488   0 oracle/1
27956 oracle    28 6.5 0.1 0.0 0.0 0.0  65 0.4 508   4 508   0 oracle/1
27956 oracle    33 7.1 0.1 0.0 0.0 0.0  60 0.4 544   3 547   0 oracle/1
27956 oracle    29 6.0 0.1 0.0 0.0 0.0  64 0.4 457   6 456   0 oracle/1
27956 oracle    29 6.0 0.1 0.0 0.0 0.0  65 0.4 464   5 464   0 oracle/1
27956 oracle    29 6.3 0.1 0.0 0.0 0.0  64 0.4 476   7 476   0 oracle/1
27956 oracle    24 5.1 0.1 0.0 0.0 0.0  70 0.3 394   3 394   0 oracle/1
27956 oracle    27 5.9 0.1 0.0 0.0 0.0  67 0.4 452   5 452   0 oracle/1
27956 oracle    32 7.1 0.1 0.0 0.0 0.0  61 0.4 549   4 551   0 oracle/1
27956 oracle    28 6.5 0.1 0.0 0.0 0.0  64 0.4 499   3 499   0 oracle/1
27956 oracle    23 5.2 0.1 0.0 0.0 0.0  71 0.3 403   4 403   0 oracle/1
27956 oracle    28 5.6 0.1 0.0 0.0 0.0  66 0.4 431   3 431   0 oracle/1
27956 oracle    26 4.7 0.1 0.0 0.0 0.0  68 0.3 363   4 363   0 oracle/1

Test 1.B & DB_FILE_MULTIBLOCK_READ_COUNT=128


SQL> alter session set DB_FILE_MULTIBLOCK_READ_COUNT=128;
Session altered.
SQL> select count(*) from huge_table;
^Cselect count(*) from huge_table
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

prstat -mLc -p 27956 1 100 | grep 27956

27956 oracle   6.4 0.4 0.0 0.0 0.0 0.0  93 0.0  21   0  26   0 oracle/1
27956 oracle    28 4.3 0.0 0.0 0.0 0.0  68 0.1  73   2  73   0 oracle/1
27956 oracle    32 5.1 0.0 0.0 0.0 0.0  63 0.1  84   6  86   0 oracle/1
27956 oracle    33 5.0 0.0 0.0 0.0 0.0  62 0.1  85   5  86   0 oracle/1
27956 oracle    32 4.8 0.0 0.0 0.0 0.0  63 0.1  83   4  82   0 oracle/1
27956 oracle    29 4.5 0.0 0.0 0.0 0.0  66 0.1  75   4  75   0 oracle/1
27956 oracle    33 5.1 0.0 0.0 0.0 0.0  62 0.1  87   5  87   0 oracle/1
27956 oracle    34 5.2 0.0 0.0 0.0 0.0  61 0.1  89   3  89   0 oracle/1
27956 oracle    33 5.1 0.0 0.0 0.0 0.0  62 0.1  86   6  89   0 oracle/1
27956 oracle    32 4.8 0.0 0.0 0.0 0.0  63 0.1  83   2  83   0 oracle/1
27956 oracle    30 4.5 0.0 0.0 0.0 0.0  65 0.1  77   5  77   0 oracle/1
27956 oracle    28 4.4 0.0 0.0 0.0 0.0  67 0.1  74   3  73   0 oracle/1
27956 oracle    22 3.4 0.0 0.0 0.0 0.0  74 0.1  58   4  58   0 oracle/1
27956 oracle    29 4.4 0.0 0.0 0.0 0.0  66 0.1  72   9  72   0 oracle/1
27956 oracle    33 5.0 0.0 0.0 0.0 0.0  62 0.1  85   3  87   0 oracle/1
27956 oracle    26 4.0 0.0 0.0 0.0 0.0  70 0.1  68   5  68   0 oracle/1
27956 oracle    26 3.9 0.0 0.0 0.0 0.0  70 0.1  63   7  63   0 oracle/1
27956 oracle    34 4.6 0.0 0.0 0.0 0.0  61 0.1  78   5  78   0 oracle/1
27956 oracle    31 4.2 0.0 0.0 0.0 0.0  65 0.1  70   5  70   0 oracle/1
27956 oracle    34 5.0 0.0 0.0 0.0 0.0  61 0.1  83   4  84   0 oracle/1
27956 oracle    29 4.4 0.0 0.0 0.0 0.0  66 0.1  67   5  68   0 oracle/1
27956 oracle    27 4.1 0.0 0.0 0.0 0.0  68 0.1  68   3  68   0 oracle/1
27956 oracle    30 4.5 0.0 0.0 0.0 0.0  65 0.1  76   4  76   0 oracle/1
27956 oracle    30 4.5 0.0 0.0 0.0 0.0  65 0.1  76   5  76   0 oracle/1
27956 oracle    37 4.7 0.0 0.0 0.0 0.0  58 0.1  80   5  80   0 oracle/1
27956 oracle    39 5.5 0.0 0.0 0.0 0.0  55 0.1  87   5  90   0 oracle/1
27956 oracle    37 5.3 0.0 0.0 0.0 0.0  57 0.1  92   3  91   0 oracle/1
27956 oracle    33 4.8 0.0 0.0 0.0 0.0  62 0.1  81   4  81   0 oracle/1
27956 oracle    34 4.3 0.0 0.0 0.0 0.0  61 0.1  71   4  71   0 oracle/1
27956 oracle    41 4.9 0.0 0.0 0.0 0.0  54 0.1  80   4  81   0 oracle/1
27956 oracle    33 4.5 0.0 0.0 0.0 0.0  63 0.1  74   4  73   0 oracle/1
27956 oracle    34 4.7 0.0 0.0 0.0 0.0  61 0.1  78   4  80   0 oracle/1
27956 oracle    32 4.2 0.0 0.0 0.0 0.0  64 0.1  72   5  72   0 oracle/1
27956 oracle    33 4.2 0.0 0.0 0.0 0.0  62 0.1  67   6  67   0 oracle/1
27956 oracle    32 3.8 0.0 0.0 0.0 0.0  65 0.1  61   4  61   0 oracle/1
27956 oracle    35 4.2 0.0 0.0 0.0 0.0  61 0.1  72   5  72   0 oracle/1
27956 oracle    28 4.3 0.0 0.0 0.0 0.0  68 0.1  74   5  74   0 oracle/1
27956 oracle    23 3.9 0.0 0.0 0.0 0.0  73 0.1  65   4  67   0 oracle/1
27956 oracle    27 4.3 0.0 0.0 0.0 0.0  68 0.1  72   4  72   0 oracle/1
27956 oracle    21 3.4 0.0 0.0 0.0 0.0  75 0.1  58   2  58   0 oracle/1
27956 oracle    30 4.2 0.0 0.0 0.0 0.0  66 0.1  69   4  69   0 oracle/1
27956 oracle    29 4.2 0.0 0.0 0.0 0.0  67 0.1  72   3  73   0 oracle/1
27956 oracle    25 4.0 0.0 0.0 0.0 0.0  71 0.1  69   5  68   0 oracle/1
27956 oracle    24 3.9 0.0 0.0 0.0 0.0  72 0.1  62   2  64   0 oracle/1
27956 oracle    29 4.2 0.0 0.0 0.0 0.0  66 0.1  72   5  72   0 oracle/1
27956 oracle    31 4.0 0.0 0.0 0.0 0.0  65 0.1  66   3  66   0 oracle/1
27956 oracle    27 4.3 0.0 0.0 0.0 0.0  69 0.1  71   4  71   0 oracle/1

Test 2: oracle returns all the data in a block to client side, sql*net is the bottle neck


SQL> set autotrace traceonly stat
SQL> select * from test_net1;
100000 rows selected.
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
11481  consistent gets
0  physical reads   ==> No PIO, if PIO involved, then cpu usage will be even lower.
0  redo size
34895444  bytes sent via SQL*Net to client
73814  bytes received via SQL*Net from client
6668  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

prstat -mLc -p 27956 1 100 | grep 27956

27956 oracle   2.8 0.5 0.0 0.0 0.0 0.0  97 0.1  77   1 629   0 oracle/1
27956 oracle   5.4 0.9 0.1 0.0 0.0 0.0  94 0.2 180   2  1K   0 oracle/1
27956 oracle   5.4 1.0 0.1 0.0 0.0 0.0  93 0.1 172   0  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.0 0.1 0.0 0.0 0.0  93 0.1 173   2  1K   0 oracle/1
27956 oracle   5.7 1.1 0.1 0.0 0.0 0.0  93 0.2 173   0  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 174   2  1K   0 oracle/1
27956 oracle   5.4 1.1 0.1 0.0 0.0 0.0  93 0.1 170   2  1K   0 oracle/1
27956 oracle   5.4 1.2 0.1 0.0 0.0 0.0  93 0.1 170   2  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 171   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 174   1  1K   0 oracle/1
27956 oracle   5.6 1.2 0.1 0.0 0.0 0.0  93 0.1 170   3  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 171   3  1K   0 oracle/1
27956 oracle   5.4 1.0 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.4 1.0 0.1 0.0 0.0 0.0  93 0.1 172   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   3  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.7 1.1 0.1 0.0 0.0 0.0  93 0.2 171   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 172   1  1K   0 oracle/1
27956 oracle   5.5 1.0 0.1 0.0 0.0 0.0  93 0.1 171   1  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.2 172   2  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 173   2  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 173   1  1K   0 oracle/1
27956 oracle   5.4 1.3 0.1 0.0 0.0 0.0  93 0.1 170   1  1K   0 oracle/1
27956 oracle   5.5 1.2 0.1 0.0 0.0 0.0  93 0.1 171   3  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 173   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 171   1  1K   0 oracle/1
27956 oracle   5.5 1.3 0.1 0.0 0.0 0.0  93 0.2 170   4  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.1 172   1  1K   0 oracle/1
27956 oracle   5.5 1.4 0.1 0.0 0.0 0.0  93 0.1 170   1  1K   0 oracle/1
27956 oracle   5.5 1.1 0.1 0.0 0.0 0.0  93 0.2 172   1  1K   0 oracle/1
27956 oracle   5.6 1.1 0.1 0.0 0.0 0.0  93 0.2 173   0  1K   0 oracle/1
27956 oracle   5.7 1.1 0.1 0.0 0.0 0.0  93 0.2 172   1  1K   0 oracle/1
27956 oracle   2.4 0.4 0.0 0.0 0.0 0.0  97 0.1  65   0 517   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1
27956 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 oracle/1

Read Full Post »


If a table is full and insert a new row, oracle will bump up the HWM, format the blocks in memory, and insert the data into the formatted block. During the whole process, no disk IO is required. Oracle doesn’t need to read the block above the HWM from disk as it knows the block above HWM is logically empty. It’s a waste of resource to read block from disk if the block content is already known (empty in this case) to oracle.


And we can also pre-allocate some extent before insert rows into a full table, the blocks in the extent will be put below HWM. But oracle has to go to disk for the block when insert a new row as oracle doesn’t know whether the block is empty or not (if oracle is sure it is empty, oracle could just create the block in memory without touch the disk, and flush the block back to disk when necessary). So in this case, disk IO is required.

1: test 1 (without pre allocate space)


SQL> create table t (a char(200)) pctfree 99 pctused 1;
Table created.
SQL> insert into t select rpad(‘x’,100,’x’) from dual connect by rownum<1000;
999 rows created.
SQL>

Checking the 10046 raw trace file, there is no disk IO wait.
2: test 2 (pre-allocate)

SQL> alter table t allocate extent (size 400M instance 1);
Table altered.
SQL> alter system flush buffer_cache;    — The block could be cached in SGA
System altered.
SQL>  alter session set events ‘10046 trace name context forever, level 8’ ;
Session altered.
SQL> insert into t values(rpad(‘x’,100,’x’));
1 row created.
SQL> /
1 row created.
SQL> /
1 row created.
SQL>

TEST$> grep “db file”  TEST_ora_21015.trc


WAIT #1: nam='db file sequential read' ela= 2947 file#=1 block#=141527 blocks=1 obj#=489208 tim=8943453664111
WAIT #1: nam='db file sequential read' ela= 349 file#=201 block#=12133 blocks=1 obj#=489208 tim=8943453664744
WAIT #2: nam='db file sequential read' ela= 1032 file#=201 block#=12134 blocks=1 obj#=489208 tim=8943454366833

We could see there are disk IO in this case. Sometimes we pre-allocate extent to alleviate the HWM enqeue contention issue (another option to alleviate the HWM enqeue contention is to increase _bump_highwater_mark_count ),  but this will increase the disk IO request. If the IO is the bottleneck, pre-allocate might not be  a good choice.

Read Full Post »


For OLTP, oracle has been proved to be the best choice regarding the performance and scalability, exadata just enhances its leadership in this direction. But for data warehouse, could exadata compete and succees against teradata by supporting more IOPS, larger cache and interconnect throughput?


Teradata has the built-in parallelism and it’s cheaper to run in parallel because of it’s share nothing architecture as there is less coordination overhead. For oracle to run in parallel, slaves in different slave sets have to exchange message. The total memory for exchanging message is X*degree*degree*parallel_execution_message_size. X is 3 for non-RAC and X is 4 for RAC. Huge teredata databases could have more than 2,000 nodes. To get oracle reach the same parallelism, how much memory does oracle need to exchange message between slaves? If set parallel_execution_message_size to 64K, the maximum size we could set (for large query,  the more parallel_execution_message_size the better), so oracle needs 3*2000*2000*64k=768G memory for non-RAC,  or 4*2000*2000*64k=1T memory for RAC.  Typically exadata is used as RAC, so it needs 1T memory purely for message exchange. And what’s the overhead of manage this 1T memory? I have no box to test it out.  Another problem is that when the producer generates message faster than the consumer consumes the message, then the buffer (parallel_execution_message_size) for message will be full, and the producer has to wait until there are free space in the buffer. This a not problem if producer is always faster than consumer, but it’s a problem if sometime producer is faster and sometime consumer is faster.

At any time at most two slave set can work at the same time, and each slave set works for the same step in an execution plan,  if a sql runs in degree of 1000 and one slave is slow,  all other 999 slaves have to wait for it to complete before execute the next step. For a hash join with 10 tables,  totally we have 9 joins, each join will complete until the slowest slave completed. Oracle doc suggests to use denormalized for data warehouse, and teradata doc said denormalization is not needed as join in teradata is cheap. Do you like denormalization? More often than not, we have to do the denormalization because of performance issue, if the performance is not a problem, why use denormalization?

If not use partition table, oracle has to hash every row in the table to know which slave the row should go to, if use partition table AND the join column is the partition key, then oracle could use partition wise join or partial partition wise join, in this way oracle doesn’t need to hash every row. Teradata also has to distribute the rows in one node to other nodes if the join column is not the primary index (which is used to determine which node the row should sit in), but the mechanism behind is simple and efficient: distribute the rows in one node to all others node based on the join column and put distributed rows in a pool, there is no limit as parallel_execution_message_size in oracle. The pool teradata used is just like temporary tablespace which needs physical IO to perform against it,  you might say memory is faster than disk, but for large query, who expects the data could be cached fully in memory?

Read Full Post »

We have a huge partition table (25 partitions) on db1, and need to copy it to another database db2. The straightforward way I could think of is to insert in parallel over network. So run the command on db2:

insert /*+ append parallel(a,4) */ into tab1 a select /*+ parallel(b,4) */ * from tab1@db1;

At the remote DB side db1, I could see 4 slaves are waiting for “PX Deq Credit: send blkd”, and the coordinator is waiting for “more data to client”.  That means 4 slaves are waiting for the coordinator to consume the data, and the coordinator is waiting for data to be transferred over network. The bottleneck here is the network. To improve the transfer speed over network, I need kick off more non-parallel sql instead of 1 sql with parallel.

For a partition table with 25 partitions, if I use 5 separate non-parallel sql (each load 5 partitions) to load the data , then we could have 5 inter-database connections instead of 1 inter-database connection, in this way we could get better network throughput.  (update 25 Dec, 2009:  I saw many cases where the client side used 100% of a single cpu, so the network throughput can’t increase even there are still lots of free bandwidth, increase sessions could help)

So what need to do is:

1: Remove the hint of parallel(b,4). This hint is of no use, because the speed to read the data from the table is quicker than the speed to send the data to network. FTS is not the bottleneck.

2: one sql statement inserts data from one partition, and kick off 5 sessions, each session run 5 of the following sql.

insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part00);
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part01);
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part02);
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part03);
…………………
insert /*+ append  */ into tab1 a select * from tab1@db1 partition (part24);

After the change, the data transferred per second increases a lot,

Read Full Post »

We are using QIO to store oracle datafiles, the way to access the data is

1: from SGA

2: from SAN cache

3: from underlying disk

To know how short the latency for 2 is (the time between issuing a read PIO and the data return from san cache without access the underling disk), I force oracle to do a full table scan but with db_file_multiblock_read_count setting to 1. The 10046 raw trace file is as following (I trimmed something like “obj#=324855 tim=4046251273329” to save space).

We can see the latency on average is about 250us=0.25ms, that’s not bad, but not as good as I expected.

WAIT #1: nam='db file sequential read' ela= 253 file#=19 block#=934517 blocks=1
WAIT #1: nam='db file sequential read' ela= 258 file#=19 block#=934518 blocks=1
WAIT #1: nam='db file sequential read' ela= 252 file#=19 block#=934519 blocks=1
WAIT #1: nam='db file sequential read' ela= 253 file#=19 block#=934520 blocks=1
WAIT #1: nam='db file sequential read' ela= 1512 file#=19 block#=934521 blocks=1
WAIT #1: nam='db file sequential read' ela= 265 file#=19 block#=934522 blocks=1
WAIT #1: nam='db file sequential read' ela= 261 file#=19 block#=934523 blocks=1
WAIT #1: nam='db file sequential read' ela= 256 file#=19 block#=934524 blocks=1
WAIT #1: nam='db file sequential read' ela= 264 file#=19 block#=934525 blocks=1
WAIT #1: nam='db file sequential read' ela= 254 file#=19 block#=934526 blocks=1
WAIT #1: nam='db file sequential read' ela= 255 file#=19 block#=934527 blocks=1
WAIT #1: nam='db file sequential read' ela= 253 file#=19 block#=934528 blocks=1
WAIT #1: nam='db file sequential read' ela= 255 file#=19 block#=934529 blocks=1

Could the elapse time get even shorter? Yes. For the read in sequential above, our SAN will pre-fetch  blocks after read the first several blocks (depending on the SAN algorithm, it might pre-fetch right after read the first block, or will only do pre-fetch after SAN controller finds that the fetched blocks are in sequential), although pre-fetch is faster, but it still needs to take time.

To rule out the latency brought by pre-fetch (pre-fetch reduces the latency, but doesn’t avoid latency), I tested a sql to order by tons of data which needs temporary tablespace. After the temporary data stored in san cache, it is less likely to be aged out as the duration of the sql is not that long. Check again the raw trace file, this time we can see the elapse time is only about 10us, which means the latency is about 10 us, that’s pretty good value.


WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464112 block cnt=25
WAIT #25: nam='direct path read temp' ela= 15 file number=2002 first dba=464137 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464138 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464139 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464140 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464141 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464142 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464143 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464144 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464145 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464146 block cnt=1

Read Full Post »

status questions in mysql

mysqld processes 61k questions on one cpu per second. Is mysqld that powerful?

mysql> delimiter $$
mysql> create procedure simple_loop(in var1 int)
-> begin
->  declare counter int default 0;
->  while (counter < var1) do
->   set counter=counter+1;
->  end while;
-> end$$
Query OK, 0 rows affected (0.00 sec)
mysql> delimiter ;
mysql>
mysql> show status like '%question%';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| Questions     | 4200256 |
+---------------+---------+
1 row in set (0.00 sec)
mysql> call  simple_loop(1000000);
Query OK, 0 rows affected (3.27 sec)
mysql> show status like '%question%';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| Questions     | 6200260 |
+---------------+---------+
1 row in set (0.00 sec)
mysql> select (6200260-4200256)/3.27;
+------------------------+
| (6200260-4200256)/3.27 |
+------------------------+
|            611622.0183 |
+------------------------+
1 row in set (0.00 sec)

The database I used is of version 5.0.36. Per the following link, for mysql of verion <5.0.72,  Questions includes every statement (beside sql statement) in a procedure, and if that statement loops 1M times, the statistics of Questions will also increased by 1M.

http://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html#statvar_Questions

Read Full Post »

Older Posts »