Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Importance of CPQ for Industrial Manufacturing | Main | Cross Pollination of Solutions across Industry Verticals »

Incremental Loads on In Memory Tables

Preparing the Database for In-Memory Capability:

 

Changing Database Compatibility Level:

 

This is needed for parallel queries and auto-update of statistics in the database.

ALTER DATABASE FACILITIES_FARM_DB SET COMPATIBILITY_LEVEL = 130;

 

Enabling MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT Option:

 

Enables the database option MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT to avoid the need to use the WITH (SNAPSHOT) hint for ad hoc queries accessing memory-optimized tables.

ALTER DATABASE FACILITIES_FARM_DB

SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT = ON;

 

Add a FILEGROUP and CONTAINER:

 

  • Only one filegroup can be created for memory optimized tables in a database. All the table data and metadata is stored in this filegroup only.
  • Now, I am creating a filegroup named FACILITIES_FARM_DB_mod using statement:

ALTER DATABASE FACILITIES_FARM_DB

ADD FILEGROUP FACILITIES_FARM_DB_mod CONTAINS MEMORY_OPTIMIZED_DATA;

 

  • At least a container should be added to this filegroup. We are now creating a container to the FACILITIES_FARM_DB_mod filegroup using below statement:

ALTER DATABASE FACILITIES_FARM_DB

ADD FILE (name='FACILITIES_FARM_DB_mod2', filename='F:\data\FACILITIES_FARM_DB_mod2') TO FILEGROUP FACILITIES_FARM_DB_mod;

This is to add MEMORY_OPTIMIZED_DATA filegroup. This can also be done using UI in SSMS.

Create a schema:

 

CREATE SCHEMA INFY_SCHEMA;

GO

Creating a Linked Server:

 

To get the data from into the in-memory tables, I preferred my source as another db in which the tables are present. This is a transactional database (OLTP) where records come in every second. All the records in these tables capture the update_dt also using which we will create the incremental strategy. To use the tables in this db to fetch the data, we need to link this db to our FACILITIES_FARM_DB using Linked Server setup in SSMS.

Linked Server can be added from Server Objects in SSMS.

Select the provider as SQL Server and give-in name and give-in the server details of your source db.

Using these steps, I created a linked server named FACILITIES_MASTER_DB_SERVER which would connect me to FACILITIES_MASTER_DB and browse/refer the tables in the master db.

Creating an In-Memory Table:

 

The points to ponder before we create memory optimized tables:

  • Changing the database compatibility level to 130 (Microsoft's suggestion for optimized tables)
  • Enabling the database option MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT (prevents locks when using ad-hoc queries)
  • Enabling MEMORY_OPTIMIZED option on the table
  • Set DURABILITY to SCHEMA_AND_DATA (keeps both schema and data of table in secondary disk)

Let's create a new in-memory optimized table called INFY_PMU_ORDER in schema INFY_SCHEMA.

The CREATE table script for memory optimized table is a regular script but we need to add two attributes in the last. Also, a non-clustered primary key index is mandatory.

CREATE TABLE [INFY_SCHEMA].[INFY_PMU_ORDER](

                [ROW_WID] [int] NOT NULL PRIMARY KEY NONCLUSTERED,

                [INSERT_DT] [datetime] NULL,

                [UPDATE_DT] [datetime] NULL,

                [PROJECT_ID] [varchar](32) NULL,

                [ORDER_NUMBER] [varchar](4000) NULL,

                [TYPE] [varchar](100) NULL,

                [ORDER_TYPE] [varchar](30) NULL,

                [COMPLETION_DATE] [datetime] NULL,

                ) WITH 

        (MEMORY_OPTIMIZED = ON, 

        DURABILITY = SCHEMA_AND_DATA); 

GO

Creating a Control Table:

 

A control table is used to maintain and monitor the delta records. It will typically have the load start time, load end time, status of the load (running/success). In the control table we use, we will be having a record for each table everyday with the typical fields and the count of updated records for each UPDATE and each INSERT. The structure of our control table is as follows:

CREATE TABLE [INFY_SCHEMA].[CONTROL_TABLE](

                [SNO] [int] NULL,

                [BATCH_ID] [int] NULL,

                [CURR_ETL_RUN_DATETIME] [datetime] NULL,

                [LAST_ETL_RUN_DATETIME] [datetime] NULL,

                [STATUS] [nvarchar](50) NULL,

                [TABLE_NAME] [nvarchar](50) NULL,

                [UPDATE_COUNT] [int] NULL,

                [INSERT_COUNT] [int] NULL,

                [SOURCE] [nvarchar](20) NULL,

                [LOAD_TYPE] [varchar](30) NULL

) ON [PRIMARY]

GO

We will be updating the record of table with the insert and update counts for every incremental load along with batch id and sno.

Stored Procedure with Incremental Logic:

 

This stored procedure is to implement incremental logic for all the in-memory tables. The source for these would be Datahub db and a database link must be created between the Datahub DB and the replicator db. Once the linked server is established, using a set of INSERT and UPDATE statements, we can implement the incremental strategy. The steps to follow for the table would be:

1.     Insert the full load record in the control table

2.     Get the max start datetime from the control table into a variable say @UpdateDt

3.     Insert a default record into control table with status 'RUNNING'

4.     Update the existing records in respective table based on the value @UpdateDt

5.     Get the rows count using @@ROWCOUNT and update the UPDATE_COUNT field in control table

6.     Insert the new records in respective table based on the value @UpdateDt

7.     Get the rows count using @@ROWCOUNT and update the INSERT_COUNT field in control table

8.     Update the relevant record in control table with status 'SUCCESS'

The above steps are repeated in blocks for as many tables as you need and thus the procedure is created.

Here is the script for this procedure:

 

CREATE PROCEDURE [dbo].[Sp_facilities_farm_incremental] 
AS 
  
BEGIN 
      
SET nocount ON; 

      
DECLARE @UpdateDt DATETIME 
      
DECLARE @RowCount1 INTEGER 
      
DECLARE @RowCount2 INTEGER 

      
/************************ INFY_PMU_ORDER ***********************************/ 
      
SET @UpdateDt = (SELECT Max(curr_etl_run_datetime) 
                       
FROM   infy_schema.control_table 
                       
WHERE  status = 'SUCCESS' 
                              
AND table_name = 'INFY_PMU_ORDER') 

      
INSERT INTO infy_schema.control_table 
                  
([sno], 
                   
[batch_id], 
                   
[curr_etl_run_datetime], 
                   
[last_etl_run_datetime], 
                   
[status], 
                   
[table_name], 
                   
[update_count], 
                   
[insert_count], 
                   
[source], 
                   
[load_type]) 
      
VALUES      ((SELECT Max(sno) + 1 
                    
FROM   infy_schema.control_table 
                    
WHERE  ( status = 'SUCCESS' 
                              
OR status = 'RUNNING' )), 
                   
(SELECT Max(batch_id) + 1 
                    
FROM   infy_schema.control_table 
                    
WHERE  table_name = 'INFY_PMU_ORDER' 
                           
AND ( status = 'SUCCESS' 
                                  
OR status = 'RUNNING' )), 
                   
Getdate(), 
                   
(SELECT Max(curr_etl_run_datetime) 
                    
FROM   infy_schema.control_table 
                    
WHERE  table_name = 'INFY_PMU_ORDER' 
                           
AND status = 'SUCCESS'), 
                   
'RUNNING', 
                   
'INFY_PMU_ORDER', 
                   
NULL, 
                   
NULL, 
                   
'DATAHUB', 
                   
'INCREMENTAL'); 

      
UPDATE tab1 
      
SET    tab1.insert_dt = tab2.insert_dt, 
             
tab1.update_dt = tab2.update_dt, 
             
tab1.project_id = tab2.project_id, 
             
tab1.order_number = tab2.order_number, 
             
tab1.type = tab2.type, 
             
tab1.order_type = tab2.order_type, 
             
tab1.completion_date = tab2.completion_date 
      
FROM   [INFY_SCHEMA].[infy_pmu_order] AS tab1 
             
INNER JOIN (SELECT [row_wid], 
                                
[insert_dt], 
                                
[update_dt], 
                                
[project_id], 
                                
[order_number], 
                                
[type], 
                                
[order_type], 
                                
[completion_date] 
                         
FROM 
[FACILITIES_MASTER_DB_SERVER].[FACILITIES_MASTER_DB].[INFY_SCHEMA].[infy_pmu_order] 
 
WHERE  update_dt >= @UpdateDt) tab2 
        
ON tab1.row_wid = tab2.row_wid; 

    
SELECT @RowCount1 = @@ROWCOUNT; 

    
UPDATE infy_schema.control_table 
    
SET    update_count = @RowCount1 
    
WHERE  sno = (SELECT Max(sno) 
                  
FROM   infy_schema.control_table 
                  
WHERE  table_name = 'INFY_PMU_ORDER' 
                         
AND status = 'RUNNING'); 

    
INSERT INTO [INFY_SCHEMA].[infy_pmu_order] 
                
([row_wid], 
                 
[insert_dt], 
                 
[update_dt], 
                 
[project_id], 
                 
[order_number], 
                 
[type], 
                 
[order_type], 
                 
[completion_date]) 
    
SELECT [row_wid], 
           
[insert_dt], 
           
[update_dt], 
           
[project_id], 
           
[order_number], 
           
[type], 
           
[order_type], 
           
[completion_date] 
    
FROM 
[FACILITIES_MASTER_DB_SERVER].[FACILITIES_MASTER_DB].[INFY_SCHEMA].[infy_pmu_order] 
WHERE  update_dt >= @UpdateDt 
       
AND row_wid NOT IN (SELECT row_wid 
                           
FROM   [INFY_SCHEMA].[infy_pmu_order]); 

    
SELECT @RowCount2 = @@ROWCOUNT; 

    
UPDATE infy_schema.control_table 
    
SET    insert_count = @RowCount2 
    
WHERE  sno = (SELECT Max(sno) 
                  
FROM   infy_schema.control_table 
                  
WHERE  table_name = 'INFY_PMU_ORDER' 
                         
AND status = 'RUNNING'); 

    
UPDATE infy_schema.control_table 
    
SET    status = 'SUCCESS' 
    
WHERE  sno = (SELECT Max(sno) 
                  
FROM   infy_schema.control_table 
                  
WHERE  table_name = 'INFY_PMU_ORDER' 
                         
AND status = 'RUNNING'); 

    
RETURN 1 
END 

To execute the procedure, go to:

FACILITIES_FARM_DB -> Programmability -> Stored Procedures -> spTConnectIncremental (right click) -> Execute Stored Procedure


Scheduling an SQL Agent:

 

Now schedule the procedure we have created in the previous step using SQL Server Agent in SSMS. Once a name is given to this job in General tab, schedule the timing and repeat properties needed in Schedules tab as shown below:

 

Creating Indexes on Memory Optimized Tables:

 

The nominal (CREATE INDEX...) statement doesn't work on in-memory tables.

Instead, we need to use

(ALTER TABLE... ADD INDEX...) syntax.

ALTER TABLE INFY_PMU_ORDER ADD CONSTRAINT constraint_Unique_PI 

   UNIQUE NONCLUSTERED (UPDATE_DT DESC, PROJECT_ID); 

ALTER TABLE INFY_PMU_ORDER ADD INDEX hash_projectid 

                HASH (PROJECT_ID) WITH (BUCKET_COUNT = 64);  -- Nonunique


Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles