Saturday, March 11, 2017

Divide and Conquer Very Big Table

Index is not the ultimate to the fast table access

Most of the time when people dealing with the very large table which is more than billion rows, people tend to think that creating index for the this big table shall help data access. However, there are several techniques which database designer can exploit to speed up data access from the big table.

  1. First, we can use table partition technique. This will divide the table into small logical table. Nevertheless, the Oracle partition feature costs more money. The organization would have to pay for separate license for the partition feature.
  2. If the license fee is the limitation, we can divide the the big table to small ones. 
  3. Applying the second approach with other Oracle capabilities, such data compression and index, we can accomplish the following tasks
    • Administration task, when the the big table is slighted into several small tables, we can remove old data more quickly by dropping unneeded table.
    • Oracle data compression can help reducing the amount disk space to be consume. In case for table scan, it will reduce I/O as well.
    • Small table mean small data file and small index help table access and reduce I/O.
I have worked with three approaches. The total records test = 88,054,450 rows
  1. Doing table partition by month - table: transaction_data
  2. Doing sample table with transaction data for quarter one table: transaction_data_q1
  3. Divide transaction data by month
    • transaction_data_jan
    • transaction_data_feb 
    • transaction_date_mar
When query the table for row count by date as

select count(*) from transaction_data 
where tran_dt = to_date('15-JAN-2016','DD-MON-YYY');

select count(*) from transaction_data_jan
where tran_dt = to_date('15-JAN-2016','DD-MON-YYY');

The above two queries consume about 00:01:18. They are very close result. The other query

select count(*) from transaction_data_q1
where tran_dt = to_date('15-JAN-2016','DD-MON-YYY');

This consumes about 00:03:27 query time.

Then, the index is created on tran_dt colume for table transaction_data_q1. Query time is extreamly improved. With same query from table transaction_data_q1. It takes only a fraction of second to yield the result.

In the actual scenario, we would not divide transaction table to month level. This would be too small and too many tables. Divide transaction table to quarter level seems to be reasonable. There are after all four tables per year.

When old data is not needed, table can be dropped easily. Image if the transaction table has data from year 2012 to 2016. This is a very one big fat table to managed. Building index for one very large table is time and resource consuming. When there is a need to delete data in year 2012, it is still taking a lot of time to do delete. This will impact on undo log as well. Moreover, the delete data result in index fragment.

Therefore, dividing the to big transaction table to several manageable can help in many aspects.

No comments:

Post a Comment