Feeds:
Posts
Comments

We are using QIO to store oracle datafiles, the way to access the data is

1: from SGA

2: from SAN cache

3: from underlying disk

To know how short the latency for 2 is (the time between issuing a read PIO and the data return from san cache without access the underling disk), I force oracle to do a full table scan but with db_file_multiblock_read_count setting to 1. The 10046 raw trace file is as following (I trimmed something like “obj#=324855 tim=4046251273329” to save space).

We can see the latency on average is about 250us=0.25ms, that’s not bad, but not as good as I expected.

WAIT #1: nam='db file sequential read' ela= 253 file#=19 block#=934517 blocks=1
WAIT #1: nam='db file sequential read' ela= 258 file#=19 block#=934518 blocks=1
WAIT #1: nam='db file sequential read' ela= 252 file#=19 block#=934519 blocks=1
WAIT #1: nam='db file sequential read' ela= 253 file#=19 block#=934520 blocks=1
WAIT #1: nam='db file sequential read' ela= 1512 file#=19 block#=934521 blocks=1
WAIT #1: nam='db file sequential read' ela= 265 file#=19 block#=934522 blocks=1
WAIT #1: nam='db file sequential read' ela= 261 file#=19 block#=934523 blocks=1
WAIT #1: nam='db file sequential read' ela= 256 file#=19 block#=934524 blocks=1
WAIT #1: nam='db file sequential read' ela= 264 file#=19 block#=934525 blocks=1
WAIT #1: nam='db file sequential read' ela= 254 file#=19 block#=934526 blocks=1
WAIT #1: nam='db file sequential read' ela= 255 file#=19 block#=934527 blocks=1
WAIT #1: nam='db file sequential read' ela= 253 file#=19 block#=934528 blocks=1
WAIT #1: nam='db file sequential read' ela= 255 file#=19 block#=934529 blocks=1

Could the elapse time get even shorter? Yes. For the read in sequential above, our SAN will pre-fetch  blocks after read the first several blocks (depending on the SAN algorithm, it might pre-fetch right after read the first block, or will only do pre-fetch after SAN controller finds that the fetched blocks are in sequential), although pre-fetch is faster, but it still needs to take time.

To rule out the latency brought by pre-fetch (pre-fetch reduces the latency, but doesn’t avoid latency), I tested a sql to order by tons of data which needs temporary tablespace. After the temporary data stored in san cache, it is less likely to be aged out as the duration of the sql is not that long. Check again the raw trace file, this time we can see the elapse time is only about 10us, which means the latency is about 10 us, that’s pretty good value.


WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464112 block cnt=25
WAIT #25: nam='direct path read temp' ela= 15 file number=2002 first dba=464137 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464138 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464139 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464140 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464141 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464142 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464143 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464144 block cnt=1
WAIT #25: nam='direct path read temp' ela= 8 file number=2002 first dba=464145 block cnt=1
WAIT #25: nam='direct path read temp' ela= 9 file number=2002 first dba=464146 block cnt=1

mysqld processes 61k questions on one cpu per second. Is mysqld that powerful?

mysql> delimiter $$
mysql> create procedure simple_loop(in var1 int)
-> begin
->  declare counter int default 0;
->  while (counter < var1) do
->   set counter=counter+1;
->  end while;
-> end$$
Query OK, 0 rows affected (0.00 sec)
mysql> delimiter ;
mysql>
mysql> show status like '%question%';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| Questions     | 4200256 |
+---------------+---------+
1 row in set (0.00 sec)
mysql> call  simple_loop(1000000);
Query OK, 0 rows affected (3.27 sec)
mysql> show status like '%question%';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| Questions     | 6200260 |
+---------------+---------+
1 row in set (0.00 sec)
mysql> select (6200260-4200256)/3.27;
+------------------------+
| (6200260-4200256)/3.27 |
+------------------------+
|            611622.0183 |
+------------------------+
1 row in set (0.00 sec)

The database I used is of version 5.0.36. Per the following link, for mysql of verion <5.0.72,  Questions includes every statement (beside sql statement) in a procedure, and if that statement loops 1M times, the statistics of Questions will also increased by 1M.

http://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html#statvar_Questions

If use mysql memory engine combined with default memory allocator, we might found that after delete/drop a table, the memory allocated wasn’t released back to OS. Is it memory leak? No.

mysqld uses C function malloc/free to allocate/de-allocate memory, depending on the underling memory allocator it used, syscall sbrk or mmap is used to allocate memory. And if use mmap, munmap is possible to be used to release the memory to OS (the memory allocator has the choice of whether return the free memory to OS or not). But if use sbrk, it’s hard to release memory as the used memory is scattered around and the free space could be in the middle of the memory used (just like if we bump up high water mark in oracle table, we can’t cut down the high water mark easily).

Memory allocator hoard uses mmap, and it will return the free memory to OS if the size of memory reaches a threshold. The following test show that after delete a table, the allocated memory is released back to OS. So if using hoard, and there are tons of delete/insert, some overhead will occur (repeatedly allocate and de-allocate) in comparison with other memory allocators which don’t release back memory to OS. If this happen we could see big sys cpu used.

And I saw a case of hardware upgrade failure for mysql database with memory engine only. The only software change was switching the memory allocator from libc malloc to hoard for mysqld. After the mysqld was up for sometime, a delete job kicked in to delete old data to free space for new data to be inserted. Later the server was hanging there as sys cpu used jumped up to more than 80%. After switch the memory allocator back, everything was back to normal.



TEST--server1$> ps -ef|grep mysqld
oracle 20128 20089   0 12:04:44 pts/2       0:00 /oracle/TEST/home/products/5036x64/bin/mysqld --basedir=/oracle/TEST/ho
oracle 20130 20054   0 12:04:51 pts/2       0:00 grep mysqld
oracle 20089     1   0 12:04:44 pts/2       0:00 /bin/sh /oracle/TEST/home/products/5036x64/bin/mysqld_safe --core-file
Total memory used when mysql is up.
TEST--server1$> pmap -x 20128 | grep anon | awk 'BEGIN {total=0}{total=total+$2}END {print total}'
39768
mysql> delimiter $$
mysql> CREATE PROCEDURE t_proc1(IN var1 INT)
-> BEGIN
->  DECLARE counter INT DEFAULT 0;
->  WHILE (counter < var1) DO
->   insert into t values (rand()*100000,rand()*100000,'a');
->   SET counter=counter+1;
->  END WHILE;
-> END$$
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> delimiter ;
insert 200k rows
mysql>  call t_proc1(200000);
Query OK, 1 row affected (3.04 sec)
mysql>  show table status like 't'\G;
*************************** 1. row ***************************
Name: t
Engine: MEMORY
Version: 10
Row_format: Fixed
Rows: 200000
Avg_row_length: 3011
Data_length: 608108064
Max_data_length: 12562051583
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: NULL
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
ERROR:
No query specified

checking the memory used using pmap


TEST--server1$>  pmap -x 20128 | grep anon | awk 'BEGIN {total=0}{total=total+$2}END {print total}'
635580
mysql> delete from t;   (I tested only delete 100000 rows, memory also reduced.)
Query OK, 200000 rows affected (0.29 sec)
mysql>  show table status like 't'\G;
*************************** 1. row ***************************
Name: t
Engine: MEMORY
Version: 10
Row_format: Fixed
Rows: 0
Avg_row_length: 3011
Data_length: 0                => size is 0 now
Max_data_length: 12562051583
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: NULL
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
ERROR:
No query specified

checking the memory used again using pmap


pmap -x 20128 | grep anon | awk 'BEGIN {total=0}{total=total+$2}END {print total}'
39836 ==>used memory reduced now.