Tuesday 22 August 2017

Type 1 SCD for Hadoop (Free Script)

What are Slowly Changing Dimensions?

Slowly Changing Dimensions (SCD) - Dimensions that change slowly over time, rather than changing frequently or on regular schedule, time-basis. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions are: Customer, Address, Employee.

Below is a list of the Slowly Changing Dimensions in use today:

Type 0 - The passive method
Type 1 - Overwrites the old value, no changes are tracked
Type 2 - Creates a new additional record to track changes
Type 3 - Adds a new column to track changes
Type 4 - Uses a separate historical table to track changes
Type 6 - Combines Type 1,2,3 approaches (1+2+3=6)

In this article we will discuss how to ingest data into Hadoop Big Data environment using the Type 1 Slowly Changing Dimension approach. At the end of this article you will also find a link to download the free script (a complete framework) that implements this.

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

In this example, in the Customer Dimension, we have the following record:

Customer Key	Name	State
5001	Peter	Washington

After Peter moved from Washington to Maine, the new information replaces the original. In this case, Washington is replaced with Maine, and after the data ingestion Customer Dimension looks like this:

Customer Key	Name	State
5001	Peter	Maine

Advantages:

This approach is easy to implement, since there is no need to keep track of the old information.

Disadvantages:

All history is lost. By applying this approach, it is not possible to trace back in history. For example, in this case, you do not have the ability to know that Peter lived in Washington before.

Usage:

About 50% of the time.

When to use Type 1:

Type 1 slowly changing dimension should be used when it is not necessary to keep track of historical changes in your Data Warehouse.

How to Ingest Data using SCD Type 1 Framework

We at Proden Technologies have built a series of Scripts to ingest several data patterns such as slowly changing dimension Type 1, Type 2 into Hadoop Big Data environment. Follow the instructions below to download the SCD Type 1 Data Ingestion Framework and load data:

Download the Data Ingestion Framework for SCD Type 1.
Follow the instructions to copy the scripts to your Hadoop Big Data environment.
Change Directory to the location where you have copied the scripts
Use the sample command line below to ingest data into Hadoop.

Note

Ensure Sqoop is installed and configured correctly.

Use bash shell to execute the shell scripts in this Framework.

Command

./Type1.sh Source_SQL="Source Data as SQL statement" Target_Table="Table where data should be loaded" Target_Columns="Target Columns to load data" Mapped_Columns="Source to Target Column Map" Data_Base_Name="Hadoop Database Name to load data" SQL_File_Location="SQL File Location" Log_File="Log File Name to log the script execution details"

No	Key Name	Description
1	Target_Table	Table where data should be loaded
2	Join_Columns	It joins the source and target columns.
3	Mapped_Columns	source and target columns are mapped to load the data.
4	Source_SQL	Enter Source SQL query, which has new and updated records.
5	Target_Columns	It loads the comma separated list in target column.
6	Data_Base_Name	It names the Hadoop database where the target table is located.
7	SQL_File_Location	Enter the path from where the Type_1.sql file is downloaded.
8	Log_File	It shows the script execution details.

Example

./Type_1.sh Target_Table="FRAUD_DEPT_2" Source_SQL="Select * From dept_1" Mapped_Columns="DEPARTMENT_ID,DEPARTMENT_ID,DEPARTMENT_NAME,DEPARTMENT_NAME,MANAGER_ID,MANAGER_ID,LOCATION_ID,LOCATION_ID,PHONE_NUMBER,PHONE_NUMBER" Join_Columns="DEPARTMENT_ID,DEPARTMENT_ID" Target_Columns="DEPARTMENT_ID,DEPARTMENT_NAME,MANAGER_ID,LOCATION_ID,PHONE_NUMBER,DEPT_PRIOR" Data_Base_Name="default" SQL_File_Location="/home/cloudera/Desktop/Type_1" Log_File="/home/cloudera/Desktop/Type_1/Log_File.txt"

Disclaimer

Please ensure you read and understand the following general disclaimer:

IMPORTANT:

THIS SOFTWARE END USER LICENSE AGREEMENT (“EULA”) IS A LEGAL AGREEMENT BETWEEN YOU AND PRODEN TECHNOLOGIES, INC. READ IT CAREFULLY BEFORE COMPLETING THE INSTALLATION PROCESS AND USING THE SOFTWARE. IT PROVIDES A LICENSE TO USE THE SOFTWARE AND CONTAINS WARRANTY INFORMATION AND LIABILITY DISCLAIMERS. BY INSTALLING AND USING THE SOFTWARE, YOU ARE CONFIRMING YOUR ACCEPTANCE OF THE SOFTWARE AND AGREEING TO BECOME BOUND BY THE TERMS OF THIS AGREEMENT. IF YOU DO NOT AGREE TO BE BOUND BY THESE TERMS, THEN SELECT THE "CANCEL" BUTTON. DO NOT PROCEED TO REGISTER & INSTALL THE SOFTWARE. LIABILITY DISCLAIMER•THE accel<>DS PROGRAM IS DISTRIBUTED "AS IS". NO WARRANTY OF ANY KIND IS EXPRESSED OR IMPLIED. YOU USE IT AT YOUR OWN RISK. NEITHER THE AUTHORS NOR PRODEN TECHNOLOGIES, INC. WILL BE LIABLE FOR DATA LOSS, DAMAGES AND LOSS OF PROFITS OR ANY OTHER KIND OF LOSS WHILE USING OR MISUSING THIS SOFTWARE.

RESTRICTIONS:

You may not use, copy, emulate, clone, rent, lease, sell, modify, decompile, disassemble, otherwise reverse engineer, or transfer any version of the Software, or any subset of it, except as provided for in this agreement. Any such unauthorized use shall result in immediate and automatic termination of this license and may result in criminal and/or civil prosecution.

TERMS:

This license is effective until terminated. You may terminate it by destroying the program, the documentation and copies thereof. This license will also terminate if you fail to comply with any terms or conditions of this agreement. You agree upon such termination to destroy all copies of the program and of the documentation, or return them to the author.

Download Data Ingestion Framework (Free)

SCD Type 1 Data Ingestion Scripts for Hadoop

Type 2 SCD for Hadoop (Free Script)

What are Slowly Changing Dimensions?

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions are: customer, geography, employee.

Type 0 - The passive method
Type 1 - Overwrites the old value
Type 2 - Creates a new additional record
Type 3 - Adds a new column
Type 4 - Uses a historical table
Type 6 - Combines types 1,2,3 approaches (1+2+3=6)

There are many approaches how to deal with SCD. The most popular are:

accel-DS Shell Script Engine V2.0 is Now Available

accel-DS Shell Script Engine can transform data using sql,files with most of the popular database and file formats used in Hadoop.

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.

In our example, recall we originally have the following table:

Customer Key	Name	State
5001	Peter	Washington

After Peter moved from Washington to Maine, we add the new information as a new row into the table:

Customer Key	Name	State
5001	Peter	Washington
5005	Peter	Maine

Advantages:

This allows us to accurately keep all historical information.

Disadvantages:

This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
This necessarily complicates the ETL process.

Usage:

About 50% of the time.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

No	Key Name	Description
1	Target_Table	Table where data should be loaded
2	Join_Columns	Enter source sql column and target table column to join those columns.
3	Mapped_Columns	Enter source sql column and target table column, source column data will be loaded in the respective target table column.
4	Source_SQL	Enter Source SQL query, which has new and updated records..
5	Target_Columns	Enter all the target table column names seperated by comma in the insersion order.
6	Eff_Date_Column	Enter target table column name, where record created date has to be loaded. Enter 'NA' if you don't want to track created date.
7	End_Date_Column	Enter target table column name where, recently record updated date has to be loaded. Enter 'NA' if you don't want to track updated date.
8	Latest_Rec_Column	Enter target table column name where flag specifies the record as latest or old. Enter 'NA' if you don't want to track latest record.
9	Data_Base_Name	Hadoop Database Name where the Target Table is located.
10	Transpose_Flag	Flag for Transpose. Enter 'Y' if you have Transpose column to load.
11	Transpose_Track_Column	Enter the source sql transpose column name, target table transpose column name and source column names to track transpose column (#This option is not applicable if (Arg) Transpose_Flag is 'N')
12	Audit_Columns	Audit columns delimited by ',' FORMAT : "creation_ts-CURRENT_TIMESTAMP,created_by-DEFAULT-VISHNU,process_control_id-UNIQUE_VALUE"
13	SQL_File_Location	Enter the path in which downloaded Type_2.sql file present.
14	Log_File	Enter the File path and name to log the script execution details.

Syntax

Target_Table="$Target_Table$" Source_SQL="$Source_SQL$" Mapped_Columns="$Mapped_Columns$" Join_Columns="$Join_Columns$" Target_Columns="$Target_Columns$" Data_Base_Name="$Data_Base_Name$" SQL_File_Location="$SQL_File_Location$" Log_File="$Log_File$" ./Type_1.sh

Example

Target_Table="EDS_Request_Prod_KV" Join_Columns="ACCOUNT_ID,ACCOUNT_ID,SSN,SSN,SRC_SYSTEM,SRC_SYSTEM,ROW_TYPE,ROW_TYPE" Mapped_Columns="ACCOUNT_ID,ACCOUNT_ID,SSN,SSN,SRC_SYSTEM,SRC_SYSTEM,KV_COLUMN,KV_COLUMN,ROW_TYPE,ROW_TYPE,ADDRESS1,ADDRESS1,ADDRESS2,ADDRESS2,ADDRESS3,ADDRESS3,ADDRESS4,ADDRESS4,CITY,CITY,COUNTRY,COUNTRY,STATE,STATE,POSTAL,POSTAL" Source_SQL="SELECT * FROM cygnus.EDS_Request_STG_KV" Target_Columns="ACCOUNT_ID,SSN,SRC_SYSTEM,KV_COLUMN,ROW_TYPE,ADDRESS1,ADDRESS2,ADDRESS3,ADDRESS4,CITY,COUNTRY,STATE,POSTAL,as_of_date,creation_ts, created_by,process_control_id,effective_date,latestcol" Eff_Date_Column="effective_date" End_Date_Column="as_of_date" Latest_Rec_Column="latestcol" Data_Base_Name="cygnus" Transpose_Flag="Y" Transpose_Track_Column="kv_column,kv_column,ADDRESS1+ADDRESS2+ADDRESS3+ADDRESS4+CITY+COUNTRY+STATE+POSTAL,ADDRESS1+ADDRESS2+ADDRESS3+ADDRESS4+CITY+COUNTRY+STATE+POSTAL" Audit_Columns="creation_ts-CURRENT_TIMESTAMP,created_by-DEFAULT-VISHNU,process_control_id-UNIQUE_VALUE" SQL_File_Location="/home/cloudera/Desktop/Type_2" Log_File="/home/cloudera/Desktop/Type_2/type2_EDS_log.txt" ./Type_2.sh

Disclaimer

Please ensure you read and understand the following general risk disclaimer:

IMPORTANT:

RESTRICTIONS:

TERMS:

Download Data Ingestion Framework (Free)

SCD Type 2 Data Ingestion Scripts for Hadoop

Data Ingestion Framework for Hadoop

data ingestion tool for hadoop

Data Ingestion Framework for Hadoop

This version of the Data Ingestion Framework is a Script Engine you can use to ingest data from any database, data files (both fixed width and delimited) into Hadoop environment. This lets you get started, ingest the data in a matter of days.
If you are looking for a solution beyond Data Ingestion, please take a look at accel-DS for Data Integration. This solution let you ingest, clean and Transform data from a variety of data sources into Hadoop and vice versa.

Objective

To provide a simple, easy to use the framework to ingest data into Hadoop from a variety of data sources.

Framework

This is a set of shell script to which you can pass various parameters, such as source database or files details, target (Hadoop) details, target table name etc.,
This framework has a very small footprint and you can start ingesting data from day one.

Benefits

Ingest from a variety of data sources - database, data files (both fixed width and delimited)
Target Tables, Data Types are created by the Framework
Load multiple data files with a single call to the engine.

How to Ingest Data

Download the Data Ingestion Framework.
Follow the instructions to copy it to your Hadoop environment.
Change Directory to the location where you have copied the scripts
Use any of the Data Ingestion commands listed under Sample Scripts section, to ingest data into Hadoop.

Note

Ensure Sqoop is installed and configured correctly.

Use bash shell to execute the shell scripts in this Framework.

Sample Scripts

In this article, sample scripts are provided for the following scenarios:

Create and Insert - Delimited File.
Create and Insert - Fixed width File.
Create and Insert - Table.
Create and Insert - SQL.
Create and Insert - XML File.
Insert only - Delimited File.
Insert only - Fixed Width File.
Insert only - Table.
Insert only - Query.
Insert only - XML File.

1. Create and Insert - Delimited File

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create".Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Type_Of_Create	Options "External, Managed". If External is chosen, then the table created will be EXTERNAL HIVE table. If Managed is chosen, then the table created will be HIVE MANAGED table.
3	Data_Base_Name	Enter the Database name in which the table needs to be created.
4	Target_Table	Enter the table name to create and load.
5	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file. Choose Fixed width, if the Source File is fixed width data file. Choose Table, if data needs to be imported from another database to hive.
6	Table_Layout_Path	Enter Table layout file path and name. If the #Type_Of_Table is Delimited or Table, then the Table Layout File should have column names and their data types, it must be tab delimited. If the #Type_Of_Table is the Fixed width, then the Table Layout file should have column names, their data types, column start position and column end position, it must be tab delimited.
7	Table_Delimiter	Enter the Table delimiter.
8	File_Delimiter	Enter the column delimiter used in Source File.
9	Load_Data_Path	Enter the Source File path with name. If you need to load multiple files, enter all the file names with path delimited by a comma.
10	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest.
11	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table.
12	Table_Delete_Flag	Enter 'Delete Target Table', if you need to delete the target table, if exists.
13	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example

Type_Of_Ingestion="Create & Insert" Type_Of_Create="External" Data_Base_Name="default" Target_Table="Temp_Delimited_Data" Type_Of_Table="Delimited" Table_Layout_Path="/home/cloudera/Desktop/Table_Creation/Table_Layout_Delimited_Table.txt" Table_Delimiter="~" File_Delimiter="|" Load_Data_Path="/home/hadoop/Desktop/eds_request_data.txt" Transpose_Flag="Y" Null_Insert_Flag="Y" Table_Delete_Flag="Delete Target Table" Log_File="/home/cloudera/Desktop/Table_Creation/Log_File.txt"./Data_Ing_Eng.sh

Example:( Transpose )

Type_Of_Ingestion="Create & Insert" Type_Of_Create="External" Data_Base_Name="default" Target_Table="EDS_Request_Prod_KV" Type_Of_Table="Delimited" Table_Layout_Path="/home/hadoop/Desktop/eds_request_layout.txt" Table_Delimiter="~" File_Delimiter="," Load_Data_Path="/home/cloudera/Desktop/Table_Creation/Comma_File.txt" Transpose_Flag="Y" Null_Insert_Flag="Y" Table_Delete_Flag="Delete Target Table" Log_File="/home/cloudera/Desktop/Table_Creation/Log_File.txt" ./Data_Ing_Eng.sh

2. Create and Insert - Fixed width File.

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create".Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Type_Of_Create	Options "External, Managed". If External is chosen, then the table created will be EXTERNAL HIVE table. If Managed is chosen, then the table created will be HIVE MANAGED table.
3	Data_Base_Name	Enter the Database name in which the table needs to be created.
4	Target_Table	Enter the table name to create and load.
5	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file. Choose Fixed width, #if the Source File is fixed width data file.
6	Convert_Fixed_delimited_Flag	Options are Y or N. Enter Y, if you want to convert the fixed width data file to delimited data file. Enter N, if you want to load data in fixed width format.
7	Fixed_Delimiter	Enter the delimiter, if you choose 'Y' for #Convert_Fixed_delimited_Flag. This delimiter will be used to create the delimited data file.
8	Load_Data_Path	Enter Table layout file path and name. If the #Type_Of_Table is Delimited or Table, then the Table Layout File should have column names and their data types, tab delimited.If the #Type_Of_Table is the Fixed width, then the Table Layout file should have column names, their data types, column start position and column end position, it must be tab delimited.
9	Load_Data_Path	Enter the Source File path with name. If you need to load multiple files, enter all the file names with path delimited by a comma.
10	Table_Delete_Flag	Enter 'Delete Target Table', if you need to delete the target table, if exists.
11	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example :(If Arg Convert_Fixed_delimited_Flag = "Y")

Type_Of_Ingestion="Create & Insert" Type_Of_Create="Managed" Data_Base_Name="default" Target_Table="Temp_Fixed_Data" Type_Of_Table="Fixed width" Convert_Fixed_delimited_Flag "Y" Fixed_Delimiter="|" Table_Layout_Path="/home/cloudera/Desktop/Table_Creation/Table_Layout_Fixed_Data.txt" Load_Data_Path="/home/cloudera/Desktop/Table_Creation/Fixed_Width_sample_value.txt" Table_Delete_Flag="Delete Target Table" Log_File="/home/cloudera/Desktop/Table_Creation/Log_File.txt" ./Data_Ing_Eng.sh

Example :(If Arg Convert_Fixed_delimited_Flag = "N")

Type_Of_Ingestion="Create & Insert" Type_Of_Create="Managed" Data_Base_Name="default" Target_Table="Temp_Fixed_Data" Type_Of_Table="Fixed width" Convert_Fixed_delimited_Flag="N" Fixed_Delimiter="NA" Table_Layout_Path="/home/cloudera/Desktop/Table_Creation/Table_Layout_Fixed_Data.txt" Load_Data_Path="/home/cloudera/Desktop/Table_Creation/Fixed_Width_sample_value.txt" Table_Delete_Flag="Delete Target Table" Log_File="/home/cloudera/Desktop/Table_Creation/Log_File.txt" ./Data_Ing_Eng.sh

3. Create and Insert - Table.

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create".Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Type_Of_Create	Options "External, Managed". If External is chosen, then the table created will be EXTERNAL HIVE table. If Managed is chosen, then the table created will be HIVE MANAGED table.
3	Data_Base_Name	Enter the Database name in which the table needs to be created.
4	Target_Table	Enter the table name to create and load.
5	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file. Choose Fixed width, if the Source File is fixed width data file.
6	Create_Layout_Flag	Options are Y or N. Enter Y, if you want to create layout file automatically. Enter N, if you give layout file.
7	Table_Layout_Path	Enter layout file path and name, if you chose 'N' for #Create_Layout_Flag otherwise enter 'NA'.
8	Column_Name_Query	If you choose 'Y' for #Create_Layout_Flag, provide a Metadata SQL that can return the Source Table's, column names, data type, precision, and scale. This will be used to build a comparable table in Hive.
9	Mapping_Data_Path	Enter the Source to Hive Data Type mapping file path and name. This file will be used to convert Source Column Data Types to appropriate Hive Columns. Refer to supplied Oracle2HiveDataTypeMapping.txt.
10	Load_Data_Path	Enter source table connection string, username, password file path, source tablename and hdfs file path where the data from The source table will be stored. This information should be entered and delimited by ','(comma) in the same order.
11	Table_Delimiter	Enter the table delimiter.
12	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest. #This option is not applicable if (Arg) Create_Layout_Flag is 'Y'
13	Audit_Columns	Enter the Audit Column details. It should contain audit column name, audit column datatype and function name,. all details should be '~' delimited. Each audit column details should be delimited by ','.
14	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table. This option is not applicable if (Arg) Create_Layout_Flag is 'Y'.
15	Table_Delete_Flag	Enter 'Delete Target Table', if you need to delete the target table, if exists.
16	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example :(Auto creates the Table layout)

Type_Of_Ingestion="Create & Insert" Type_Of_Create="External" Data_Base_Name="stage_db" Target_Table="EMP_TARGET" Type_Of_Table="Table" Create_Layout_Flag="Y" Table_Layout_Path="NA" Column_Name_Query="SELECT COLUMN_NAME,DATA_TYPE,DATA_PRECISION,DATA_SCALE FROM ALL_TAB_COLUMNS WHERE TABLE_NAME='EMP_TEMP' ##CONDITION## ORDER BY COLUMN_ID" Mapping_Data_Path="/home/hadoop/Desktop/Oracle2HiveDataTypeMapping.txt" Load_Data_Path="jdbc:oracle:thin:@192.168.100.8:1521:orcl,scott,file:/home/hadoop/Desktop/pass.txt,EMP_TEMP,/user/hadoop/" Table_Delimiter="~" Transpose_Flag="N" Audit_Columns="AS_OF_DATE~DATE~CURRENT_DATE,CREATION_TS~VARCHAR(50)~CURRENT_TIMESTAMP,CREATED_BY~VARCHAR(50)~DEFAULT-J712798,PROCESS_CONTROL_ID~BIGINT~UNIQUE_VALUE" Null_Insert_Flag="N" Table_Delete_Flag="Delete Target Table" Log_File="/home/hadoop/Desktop/Temp_Fixed_Data_Log.txt" ./Data_Ing_Eng.sh

Example :(User provided Table layout)

Type_Of_Ingestion="Create & Insert" Type_Of_Create="External" Data_Base_Name="default" Target_Table="Temp_Fixed_Data" Type_Of_Table="Table" Create_Layout_Flag="N" Table_Layout_Path="/home/hadoop/Desktop/Fixed_Length_Data.txt" Column_Name_Query="NA" Mapping_Data_Path="NA" Load_Data_Path="jdbc:oracle:thin:@192.168.100.8:1521:orcl,scott,file:/home/hadoop/Desktop/pass.txt,EXPORT_SQOOP,/user/hadoop/" Table_Delimiter="~" Transpose_Flag="Y" Audit_Columns="NA" Null_Insert_Flag="Y" Table_Delete_Flag="Delete Target Table" Log_File="/home/hadoop/Desktop/Temp_Fixed_Data_Log.txt" ./Data_Ing_Eng.sh

4. Create and Insert - SQL

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create". Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Type_Of_Create	Options "External, Managed". If External is chosen, then the table created will be EXTERNAL HIVE table. If Managed is chosen, then the table created will be HIVE MANAGED table.
3	Data_Base_Name	Enter the Database name in which the table needs to be created.
4	Target_Table	Enter the table name to create and load.
5	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file. #Choose Fixed width of the Source File is fixed width data file. Choose Table, if data needs to be imported from another database to hive.
6	Table_Layout_Path	Enter Table layout file path and name. If the #Type_Of_Table is Delimited or Table, then the Table Layout File should have column names and their data types, tab delimited.
7	Source_SQL_Query	Enter the Source SQL query in which data will be exported to target table.
8	Load_Data_Path	Enter source table connection string, username, password file path and hdfs file path where the data from source table will be stored. This information should be entered and delimited by ','(comma) in the same order.
9	Table_Delimiter	Enter the table delimiter.
10	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest.
11	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table.
12	Table_Delete_Flag	Enter 'Delete Target Table', if you need to delete the target table, if exists.
13	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example

Type_Of_Ingestion="Create & Insert" Type_Of_Create="External" Data_Base_Name="Cygnus" Target_Table="EMPLOYEE_MASTER" Type_Of_Table="SQL" Table_Layout_Path="/home/hadoop/Desktop/EMPLOYEE_LAYOUT_FILE.txt" Source_SQL_Query="SELECT EMP_NO, NAME, POSITION, CLUB, NATIONALITY, BIRTHPLACE, HIREDATE, SALARY, PHONE_NO, EMAIL FROM EMPLOYEE WHERE HIREDATE > '25-05-2009' AND ##CONDITION##" Load_Data_Path="jdbc:oracle:thin:@192.168.100.8:1521:orcl,scott,file:/home/hadoop/Desktop/pass.txt,/user/hadoop/" Table_Delimiter="~" Transpose_Flag="Y" Null_Insert_Flag="Y" Table_Delete_Flag="Delete Target Table" Log_File="/home/hadoop/Desktop/EMPLOYEE_MASTER.txt" ./Data_Ing_Eng.sh

5. Create and Insert - XML File.

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create". Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Type_Of_Create	Options "External, Managed". If External is chosen, then the table created will be EXTERNAL HIVE table. If Managed is chosen, then the table created will be HIVE MANAGED table.
3	Data_Base_Name	Enter the Database name in which the table needs to be created.
4	Target_Table	Enter the table name to create and load.
5	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file. Choose Fixed width of the Source File is fixed width data file. Choose Table, if data needs to be imported from another database to hive.
6	Table_Layout_Path	Enter Table layout file path and name. If the #Type_Of_Table is Delimited or Table, then the Table Layout File should have column names and their data types, tab delimited. If the #Type_Of_Table is the Fixed width, then the Table Layout file should have column names, their data types, column start position and column end position, it must be tab delimited.
7	Table_Delimiter	Enter the Table delimiter.
8	Load_Data_Path	Enter the Source File path with name. If you need to load multiple files, enter all the file names with path delimited by a comma.
9	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest.
10	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table.
11	Table_Delete_Flag	Enter 'Delete Target Table', if you need to delete the target table, in case it exists.
12	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example

Type_Of_Ingestion="Create & Insert" Type_Of_Create="External" Data_Base_Name="default" Target_Table="XML_Test" Type_Of_Table="XML_File" Table_Layout_Path="/home/hadoop/Desktop/xml/XML_Layout.txt" Load_Data_Path="/home/hadoop/Desktop/xml/log_file.xml" Table_Delimiter="~" Transpose_Flag="N" Null_Insert_Flag="N" Table_Delete_Flag="Delete Target Table" Log_File="/home/hadoop/Desktop/xml/XML_Test_Log.txt" ./Data_Ing_Eng.sh

6. Insert only - Delimited File

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create". Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Data_Base_Name	Enter the Database name in which the table needs to be created.
3	Target_Table	Enter the table name to create and load..
4	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file.Choose Fixed width, if the Source File is fixed width data file. Choose Table, if data needs to be imported from another database to hive.
5	Table_Layout_Path	Enter Table layout file path and name. If the #Type_Of_Table is Delimited or Table, then the Table Layout File should have column names and their data types, tab delimited
6	Table_Delimiter	Enter the Table delimiter.
7	File_Delimiter	Enter the column delimiter used in Source File.
8	Load_Data_Path	Enter the Source File path with name. If you need to load multiple files, enter all the file names with path delimited by a comma. NOTE: The column delimiter in this Source file should match with the delimiter configured in the target table.
9	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest.
10	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table.
11	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example

Type_Of_Ingestion="Insert" Data_Base_Name="default" Target_Table="Temp_Delimited_Data" Type_Of_Table="Delimited" Table_Layout_Path="/home/hadoop/Desktop/KV_LAYOUT.txt" Table_Delimiter="~" File_Delimiter="|" Load_Data_Path="/home/cloudera/Desktop/Table_Creation/Comma_File.txt" Transpose_Flag="Y" Null_Insert_Flag="Y" Log_File"/home/cloudera/Desktop/Table_Creation/Log_File.txt"

Example:(Transpose)

Type_Of_Ingestion="Insert" Data_Base_Name="default" Target_Table="Individual_Prod_KV" Type_Of_Table="Delimited" Table_Layout_Path="/home/hadoop/Desktop/KV_LAYOUT.txt" Table_Delimiter="~" File_Delimiter="|" Load_Data_Path="/home/hadoop/Desktop/individual_data.txt" Transpose_Flag="Y" Null_Insert_Flag="Y" Log_File="/home/hadoop/Desktop/Individual_Prod_KV_Log.txt" ./Data_Ing_Eng.sh

7. Insert only - Fixed Width File

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create". Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Data_Base_Name	Enter the Database name in which the table needs to be created.
3	Target_Table	Enter the table name to create and load..
4	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file.Choose Fixed width, if the Source File is fixed width data file. Choose Table, if data needs to be imported from another database to hive.
5	Convert_Fixed_delimited_Flag	Options are Y or N. Enter Y, if you want to convert the fixed width data file to delimited data file. Enter N, if you want to load data in fixed width format.
6	Fixed_Delimiter	Enter the delimiter, if you chose 'Y' for # Arg Convert_Fixed_delimited_Flag. This delimiter will be used to create the delimited data file.
7	Table_Layout_Path	Enter Table layout file path and name. If the # Arg Convert_Fixed_delimited_Flag is Delimited or Table, then the Table Layout File should have column names and their data types, tab delimited. If the # Arg Convert_Fixed_delimited_Flag is the Fixed width, then the Table Layout file should have column names, their data types, column start position and column end position, it must be tab delimited.
8	Load_Data_Path	Enter the Source File path with name. If you need to load multiple files, enter all the file names with path delimited by a comma. NOTE: The Fixed width layout in this Source file should match with the Fixed width or delimiter configured in the target table.
9	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example:(If Arg Convert_Fixed_delimited_Flag = "Y")

Type_Of_Ingestion="Insert" Data_Base_Name="default" Target_Table="Temp_Fixed_Data" Type_Of_Table="Fixed width" Convert_Fixed_delimited_Flag="Y" Fixed_Delimiter="|" Table_Layout_Path="/home/cloudera/Desktop/Table_Creation/Table_Layout_Fixed_Data.txt" Load_Data_Path="/home/cloudera/Desktop/Table_Creation/Fixed_Width_sample_value.txt,/home/cloudera/Desktop/Table_Creation/Fixed_Width_sample_value1.txt" Log_File="/home/cloudera/Desktop/Table_Creation/Log_File1.txt" ./Data_Ing_Eng.sh

Example:(If Arg Convert_Fixed_delimited_Flag = "N")

Type_Of_Ingestion="Insert" Data_Base_Name="default" Target_Table="Temp_Fixed_Data" Type_Of_Table="Fixed width" Convert_Fixed_delimited_Flag="N" Fixed_Delimiter="NA" Table_Layout_Path="/home/cloudera/Desktop/Table_Creation/Table_Layout_Fixed_Data.txt" Load_Data_Path="/home/cloudera/Desktop/Table_Creation/Fixed_Width_sample_value.txt" Log_File="/home/cloudera/Desktop/Table_Creation/Log_File1.txt" ./Data_Ing_Eng.sh

8. Insert only - Table

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create". Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Data_Base_Name	Enter the Database name in which the source table is available.
3	Target_Table	Enter the table name to create and load..
4	Type_Of_Table	Options are: Delimited, Fixed width, XML_File and Table. Choose Delimited, if the Source File is a delimited file.Choose Fixed width, if the Source File is fixed width data file. Choose Table, if data needs to be imported from another database to hive.
5	Load_Data_Path	Enter source table connection string, username, password file path, source tablename and hdfs file path where the data from the source table will be stored. This information should be entered and delimited by ','(comma) in the same order.
6	Table_Delimiter	Enter the Column Delimiter to create the Target table.
7	Create_Layout_Flag	Options are Y or N. Enter Y if you want to create Table layout file automatically. Enter N, if you will provide the Table layout file.
8	Table_Layout_Path	Enter layout file path and name, if you chose 'N' for #Table_Layout_Path, otherwise enter 'NA'.
9	Column_Name_Query	If you chose 'Y' for #Create_Layout_Flag, provide a Metadata SQL that can return the Source Table's column names, datatype, precision and scale. This will be used to build a comparable table in Hive.
10	Mapping_Data_Path	Enter the Source to Hive Data Type mapping file path and name. This file will be used to convert Source Column Data Types to appropriate Hive Columns.
11	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest. This option is not applicable if #Create_Layout_Flag is 'Y'.
12	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table.
13	Audit_Columns	Enter the additional columns that you like to ingest. It should contain the column name, data type, and function name, Functions supported are UNIQUE_VALUE, DEFAULT-, CURRENT_DATE, CURRENT_TIMESTAMP delimited by ~ (Tilde). If there are multiple columns delimit them by, (Comma).
14	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example:(If Arg Convert_Fixed_delimited_Flag = "Y")

Type_Of_Ingestion="INSERT" Data_Base_Name="STAGE_DB" Target_Table="EMP_TARGET" Type_Of_Table="Table" Load_Data_Path="jdbc:oracle:thin:@192.168.100.8:1521:orcl,scott,file:/home/hadoop/Desktop/pass.txt,EXPORT_SQOOP,/user/hadoop/" Table_Delimiter="~" Create_Layout_Flag="Y" Table_Layout_Path="NA" Column_Name_Query="SELECT COLUMN_NAME,DATA_TYPE,DATA_PRECISION,DATA_SCALE FROM ALL_TAB_COLUMNS WHERE TABLE_NAME='EMP_TARGET' ##CONDITION## ORDER BY COLUMN_ID" Mapping_Data_Path="/home/hadoop/Desktop/Oracle2HiveDataTypeMapping.txt" Transpose_Flag="N" Null_Insert_Flag="Y" Audit_Columns="AS_OF_DATE~DATE~CURRENT_DATE,CREATION_TS~VARCHAR(50)~CURRENT_TIMESTAMP,CREATED_BY~VARCHAR(50)~DEFAULT-J712798,PROCESS_CONTROL_ID~BIGINT~UNIQUE_VALUE" Log_File="/home/hadoop/Desktop/Log_File.txt" ./Data_Ing_Eng.sh

9. Insert only - Query.

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create". Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Data_Base_Name	Enter the Database name in which the source table is available.
3	Target_Table	Enter the target table name to load.
4	Type_Of_Table	Options are: Delimited, Fixed width, XML_File, Table and SQL. Choose Delimited, if the Source File is a delimited file. Choose Fixed width of the Source File is fixed width data file.Choose Table, if whole table data needs to be imported from another database to hive. Choose SQL if you are passing SQL as a data source. In this case, you need to provide the Table Layout file as well.
5	Load_Data_Path	Enter source table connection string, username, password file path and hdfs file path where the data from The source table will be stored.This information should be entered and delimited by ','(comma) in the same order
6	Table_Delimiter	Enter the Column Delimiter to create the Target table.
7	Table_Layout_Path	Enter Table layout file path and name. If the Type_Of_Table is Delimited or Table, then the Table Layout File should have column names and their data types, tab delimited.
8	Source_SQL_Query	Enter the Source SQL query in which data will be exported to target table.
9	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest.
10	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table.
11	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example

Type_Of_Ingestion="INSERT" Data_Base_Name="stage_db" Target_Table="EMP_SQL_TEST" Type_Of_Table="SQL" Load_Data_Path="jdbc:oracle:thin:@192.168.100.8:1521:orcl,scott,file:/home/hadoop/Desktop/pass.txt,/user/hadoop/" Table_Delimiter="~" Table_Layout_Path="/home/hadoop/Desktop/EMP_LAYOUT_FILE.txt" Source_SQL_Query="select * from emp_temp where hiredate > '30-05-2017' AND ##CONDITION##" Transpose_Flag="Y" Null_Insert_Flag="Y" Log_File="/home/hadoop/Desktop/Log_File.txt"

Example:(Transpose)

Type_Of_Ingestion="INSERT" Data_Base_Name="default" Target_Table="STG_INDIVIDUAL_QUERY" Type_Of_Table="SQL" Load_Data_Path="jdbc:oracle:thin:@192.168.100.8:1521:orcl,scott,file:/home/hadoop/Desktop/pass.txt,/user/hadoop/" Table_Delimiter="~" Table_Layout_Path="/home/hadoop/Desktop/KV_LAYOUT.txt" Source_SQL_Query="select * from STG_GARWIN_INDIVIDUAL where CONTRACT_RELATIONSHIP = 'OWN' AND ##CONDITION##" Transpose_Flag="Y" Null_Insert_Flag="Y" Log_File="/home/hadoop/Desktop/Log_File.txt"

10. Insert only - XML File.

Command Template

#	Key Name	Description
1	Type_Of_Ingestion	Options are "Create & Insert, Insert, Create". Create & Insert option will create table and load data into the created table. Insert option will load data into the specified table. Create option will only create the table without loading any data.
2	Data_Base_Name	Enter the Database name in which the source table is available.
3	Target_Table	Enter the target table name to load.
4	Type_Of_Table	Options are: Delimited, Fixed width, XML_File, Table and SQL. Choose Delimited, if the Source File is a delimited file. Choose Fixed width of the Source File is fixed width data file.Choose Table, if whole table data needs to be imported from another database to hive. Choose SQL if you are passing SQL as a data source. In this case, you need to provide the Table Layout file as well.
5	Table_Layout_Path	Enter Table layout file path and name. If the Type_Of_Table is Delimited or Table, then the Table Layout File should have column names and their data types, tab delimited.
6	Table_Delimiter	Enter the Column Delimiter to create the Target table.
7	Load_Data_Path	Enter the Source File path with name.If you need to load multiple files, enter all the file names with path delimited by a comma. NOTE: The column delimiter in this Source file should match with the delimiter configured in the target table.
8	Transpose_Flag	Flag for Transpose. Enter 'Y' if you want to load data using Transpose Ingest.
9	Null_Insert_Flag	Enter 'Y' if you don't want to insert null data into target table.
10	Log_File	Enter the Log file path and name to Store the logs generated by this tool.

Example

Type_Of_Ingestion="Insert" Data_Base_Name="default" Target_Table="XML_Test" Type_Of_Table="XML_File" Table_Layout_Path="/home/hadoop/Desktop/xml/XML_Layout.txt" Table_Delimiter="~" Load_Data_Path="/home/hadoop/Desktop/xml/log_file.xml" Transpose_Flag="N" Null_Insert_Flag="N" Log_File="/home/hadoop/Desktop/xml/XML_File_Log.txt" ./Data_Ing_Eng.sh

Disclaimer

Please ensure you read and understand the following general disclaimer:

IMPORTANT:

RESTRICTIONS:

TERMS:

Data Ingestion Framework for Hadoop (Free)

Thursday 23 March 2017

Big Data Integration Tools Framework

A versatile, no-code Data Integration tool integrates your Enterprise and offline data in minutes!

Big Data Integration

Data is the new competitive battleground making it more important than ever to get an edge up on your competition with a fast, multi-point and modern approach to big data integration.

A versatile Data Integration tool to blend data from a variety of data sources (such as HBase, HDFS, HIVE) in minutes. Go beyond just Data Integration - clean, transform and prepare the data for analysis. Absolutely no coding required, ideal for business users.

Data Integration made easy

accel<>DS (Data Store) makes Data Integration so easy and simple to use, in matter of minutes you are able to pull data from your enterprise systems, HBase, HDFS, HIVE, Delimited files, Fixedwidth files, standalone Databases. The Data Integration module provides you with an easy to use, familiar spreadsheet interface to load data from a variety of data sources.

Versatile, Variety of Data Sources

The Data Integration module is versatile, allows you to integrate data from most of the popular databases such as HBase, HDFS, HIVE, Delimited files, Fixedwidth files.

No coding required

Data Integration tasks are predominantly drag and drop and it is ideal for users with no or very little programming background such as Sales Managers, Finance Managers, Business Analysts and power users in their team. Even better, as you build your integration tasks you are able to see as the data is built letting you visualize it.

Features

Build Data Stores / Data Warehouses integrating data from a variety of Data Sources.
Create Data Scripts to accomplish data integration tasks.
Use Workflow to assemble multiple scripts and execute them as a group.
Schedule and monitor Workflows to execute data integration tasks.
Familiar spreadsheet interface to load and see data in real time as data is integrated.

Benefits

Easy to use, business friendly tool.
Build Data Stores from a variety of data sources in matter of minutes
Maintain Data Stores with ability to refresh data periodically.
Visual, No coding required architecture make data integration easy.
No coding required.

Hadoop Ecosystem for accel<>DS

Performance, Simplicity, Flexibility

The Data Integration tool is built from grounds up with performance and simplicity in mind. It utilizes the strength of the native databases transformation and processing power. In addition, accel<>DS is bundled with a native data transformation engine as well to provide high performance post integration transformations after data is brought in from different data sources.

Disclaimer

Please ensure you read and understand the following general risk disclaimer:

IMPORTANT:

RESTRICTIONS:

TERMS:

Data Integration Framework for Hadoop (Free)