Monday, 6 May 2013

How to Execute multiple SQL queries in Talend

Lot of time we need to execute multiple SQL queries in a ETL Job. For instance, we need to setup database table on machine where we do not have direct access to the database. Also, we may need to truncate or drop temporary tables that are created as part of the ETL job for processing and temporary purpose. We can easily achieve this using Talend’s tMy<xxx>Row component. Here xxx refers to the database type. e.g. for MySQL the name of component is tMySQLRow and for oracle its tOracleRow.


We can execute multiple SQL database queries in Talend using one of the two techniques mentioned below:


1. Embedded SQL statements in Talend Job. (tMySQLRow)

2. SQL statements in file which will be read in Talend job.

1. Embedded SQL statements in Talend Job:  In this method we hard code the SQL statements in the Talend tMySQLRow component.









We have to set the additional JDBC parameters to allow multiple queries to be executed. To do this enter "allowMultiQueries=true" in the Additional JDBC parameters text box on the tMySQLConnection component.



Enter the multiple SQL separated by semicolon “;” in the Query text box.



This method is generally used when we have specific set of SQL queries that need to be executed in the Talend Job. We can not change the queries with every run.

2. SQL statements in file which will be read in Talend job. In this method we keep the SQL queries to be executed in the external file. In this we can control what all queries should run when the Job executes.



In this Talend Job, component tFileInputFullRow will read each SQL query from the file and then tMySqlRow will execute the query one by one.

To execute the queries one by one dynamically, we need to enter row1.query (query is the name of the filed to be fetched from tFileInputFullRow component) in the Query box instead of the hard coded queries. What we are doing here is that we are reading SQL queries one by one and passing it to tMySQLRow component for execution.



We can use this Job as a sub job and pass the path of the input file having multiple SQL queries as context parameter to this child job. In this way we can use this as a generic child job multiple times in a project. This will help us to execute multiple SQL queries, when we want to control the sql queries to be executed.

Lets Demonstrate both techniques using an example:

For demonstration I am going to take following emp_details table.Schema of the table is shown below:

desc talend_demo.emp_details;









Initially we do not have any data in the table.

Select * from talend_demo.emp_details;
/* 0 rows affected, 0 rows found. Duration for 1 query: 0.000 sec. */

We will run following SQL statements:

INSERT INTO talend_demo.emp_details VALUES (101,"John Miller",10,20000);
INSERT INTO talend_demo.emp_details VALUES (102,"David Kate",11,30000);
INSERT INTO talend_demo.emp_details VALUES (103,"Alex M",10,40000);
INSERT INTO talend_demo.emp_details VALUES (104,"Brad L",10,10000);
INSERT INTO talend_demo.emp_details VALUES (105,"Angel K",12,20000);
INSERT INTO talend_demo.emp_details VALUES (106,"Steve L",11,25000);
ALTER TABLE talend_demo.emp_details MODIFY emp_dept int;


Technique 1.  Embedded SQL statements in Talend Job. (tMySQLRow)

Execute the Job

Select * from talend_demo.emp_details;
/* 0 rows affected, 7 rows found. Duration for 1 query: 0.000 sec. */













You can see that 6 records has been added to the table as per following  insert statements.

INSERT INTO talend_demo.emp_details VALUES (101,"John Miller",10,20000);
INSERT INTO talend_demo.emp_details VALUES (102,"David Kate",11,30000);
INSERT INTO talend_demo.emp_details VALUES (103,"Alex M",10,40000);
INSERT INTO talend_demo.emp_details VALUES (104,"Brad L",10,10000);
INSERT INTO talend_demo.emp_details VALUES (105,"Angel K",12,20000);
INSERT INTO talend_demo.emp_details VALUES (106,"Steve L",11,25000);

desc talend_demo.emp_details;
/* 0 rows affected, 4 rows found. Duration for 1 query: 0.000 sec. */

Similarly, Table structure has been altered as per following query:

ALTER TABLE talend_demo.emp_details MODIFY emp_dept int;

Technique 2. SQL statements in file which will be read in Talend job.

Execute the Job



Select * from talend_demo.emp_details;
/* 0 rows affected, 7 rows found. Duration for 1 query: 0.000 sec. */


You can see that 6 records has been added to the table as per following  insert statements.

INSERT INTO talend_demo.emp_details VALUES (101,"John Miller",10,20000);
INSERT INTO talend_demo.emp_details VALUES (102,"David Kate",11,30000);
INSERT INTO talend_demo.emp_details VALUES (103,"Alex M",10,40000);
INSERT INTO talend_demo.emp_details VALUES (104,"Brad L",10,10000);
INSERT INTO talend_demo.emp_details VALUES (105,"Angel K",12,20000);
INSERT INTO talend_demo.emp_details VALUES (106,"Steve L",11,25000);

desc talend_demo.emp_details;
/* 0 rows affected, 4 rows found. Duration for 1 query: 0.000 sec. */


Similarly, Table structure has been altered as per following query:

ALTER TABLE talend_demo.emp_details MODIFY emp_dept int;

Let me know, if you have followed any other approach to execute multiple SQL queries in Talend. Looking forward to your comments.

22 comments:

  1. If I try to run with toraclerow component and using technique 2, Im getting error like invalid SQL statement

    ReplyDelete
    Replies
    1. My Querry was : "Select 'YourMoM' from dual;"

      Delete
    2. I guess you should run some query which doesn't expect any results. Something like DML statements such as ALTER, INSERT etc

      Delete
    3. IEEE Final Year projects Project Centers in Chennai are consistently sought after. Final Year Students Projects take a shot at them to improve their aptitudes. IEEE Final Year project centers ground for all fragments of CSE & IT engineers hoping to assemble.Final Year Projects for CSE

      Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Spring Framework Corporate TRaining .

      Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai

      The Angular Training covers a wide range of topics including Angular Directives, Angular Services, and Angular programmability.Angular Training

      Delete
  2. How to execute multiple queries in tredshiftrow ??

    ReplyDelete
  3. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
    rprogramming training in bangalore

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I just recently discovered your blog and have now scrolled through the entire thing several times. I am very impressed and inspired by your skill and creativity, and your "style" is very much in line with mine. I hope you keep blogging and sharing your design idea
    data science training in chennai

    data science training in velachery

    android training in chennai

    android training in velachery

    devops training in chennai

    devops training in velachery

    artificial intelligence training in chennai

    artificial intelligence training in velachery

    ReplyDelete
  6. Fantastic blog! Thanks for sharing a very interesting post, I appreciate to blogger for an amazing post.
    Data Science
    Selenium
    ETL Testing
    AWS
    Python Online Classes

    ReplyDelete
  7. Great tips and very easy to understand. This will definitely be very useful for me when I get a chance to start my blog. ExcelR Data Science Course In Pune

    ReplyDelete
  8. That's really impressive and helpful information you have given, very valuable content.
    We are also into education and you also can take advantage really awesome job oriented courses

    ReplyDelete
  9. Thanks a lot for all your valuable articles! We are really happy about your thoughts...
    Digital marketing Training in Bangalore

    ReplyDelete