Feeds:
Posts
Comments

Posts Tagged ‘teradata’


For OLTP, oracle has been proved to be the best choice regarding the performance and scalability, exadata just enhances its leadership in this direction. But for data warehouse, could exadata compete and succees against teradata by supporting more IOPS, larger cache and interconnect throughput?


Teradata has the built-in parallelism and it’s cheaper to run in parallel because of it’s share nothing architecture as there is less coordination overhead. For oracle to run in parallel, slaves in different slave sets have to exchange message. The total memory for exchanging message is X*degree*degree*parallel_execution_message_size. X is 3 for non-RAC and X is 4 for RAC. Huge teredata databases could have more than 2,000 nodes. To get oracle reach the same parallelism, how much memory does oracle need to exchange message between slaves? If set parallel_execution_message_size to 64K, the maximum size we could set (for large query,  the more parallel_execution_message_size the better), so oracle needs 3*2000*2000*64k=768G memory for non-RAC,  or 4*2000*2000*64k=1T memory for RAC.  Typically exadata is used as RAC, so it needs 1T memory purely for message exchange. And what’s the overhead of manage this 1T memory? I have no box to test it out.  Another problem is that when the producer generates message faster than the consumer consumes the message, then the buffer (parallel_execution_message_size) for message will be full, and the producer has to wait until there are free space in the buffer. This a not problem if producer is always faster than consumer, but it’s a problem if sometime producer is faster and sometime consumer is faster.

At any time at most two slave set can work at the same time, and each slave set works for the same step in an execution plan,  if a sql runs in degree of 1000 and one slave is slow,  all other 999 slaves have to wait for it to complete before execute the next step. For a hash join with 10 tables,  totally we have 9 joins, each join will complete until the slowest slave completed. Oracle doc suggests to use denormalized for data warehouse, and teradata doc said denormalization is not needed as join in teradata is cheap. Do you like denormalization? More often than not, we have to do the denormalization because of performance issue, if the performance is not a problem, why use denormalization?

If not use partition table, oracle has to hash every row in the table to know which slave the row should go to, if use partition table AND the join column is the partition key, then oracle could use partition wise join or partial partition wise join, in this way oracle doesn’t need to hash every row. Teradata also has to distribute the rows in one node to other nodes if the join column is not the primary index (which is used to determine which node the row should sit in), but the mechanism behind is simple and efficient: distribute the rows in one node to all others node based on the join column and put distributed rows in a pool, there is no limit as parallel_execution_message_size in oracle. The pool teradata used is just like temporary tablespace which needs physical IO to perform against it,  you might say memory is faster than disk, but for large query, who expects the data could be cached fully in memory?

Advertisements

Read Full Post »