Monday 27 October 2014

Talend Data Integration Development Best Practices


1. Talend workspace path should not contain any spaces.

     Workspace paths to avoid:
         c:\Open Project\Talend Open Studio\workspace
        d:\My Projects\Talend\workspace

     Recommended Workspace Path:
        c:\Talend\workpace
        d:\OpenProject\Talend\workspace
        c:\MyProject\repository

It is always recommended to not to have any space between the Talend workspace path. We tend to encounter few issues if we have these spaces in path. Hence we should always avoid these spaces.

2. Never forget to perform Null Handling.

     Example #1 – Bad
        if(myString.length() > 0)  
          System.out.println(myString.toUpperCase());

     Example #2 – Good
       if(!Relational.ISNULL(myString) && myString.length() > 0)                               System.out.println(myString.toUpperCase());

Always perform NULL handling for the field which is going to be used in any kind of expression. Otherwise Talend Job will throw NullPointerException.

3. Create Repository Metadata for DB connections and retrieve database table schema for DB tables. 

It allows you to quickly retrieve the schema of database tables and help rapid development. If you will try to create schema for database table one by one it will take long time. Click here for more details on creating DB connections and retrieving schema.

 4. Use Repository Schema for Files/DB and DB connections.

It allows you to change the schema at one place, without having to change the schema in every job. Also, you don't need to open every job to find out if the changed schema is part of the Job or not. Changing at one place in Repository will allow you to change in every job. Click on links below to know How to create Repository metadata:


5. Create Database connection using t<Vendor>Connection component and use this connection in the Job. Do not make new connection with every component.

As most of the database have maximum connection limit. In your Talend Job if you are using multiple database components then it may fail because of maximum allowed connection issue. Click here to know, How to share database connection.

6. Always close the connection to database using t<Vendor>Close component. 







7. Create a Repository Document corresponding to every Talend job including revision history.

This will allow you to track the changes done on any Talend Job.

Sample documentation below:

Talend Job Description:
In this paragraph, write down the high level description and functionality of Talend Job

Revision History
1.0    04-10-2014        Initial Development
1.1    07-10-2014        Modification to Source and Target repository Schema
1.2    09-10-2014        Modification to transformation logic.

8. Provide Sub Job title for every sub job to describe the sub job purpose/objective.
















9. Avoid Hard Coding in Talend Job component. Instead use Talend context variables.





10. Create Context Groups in Repository

Context Group will allow you to use the same context variables in any number of jobs without having to create again and assign value again to them. Imagine your Project requires 20 context variables and there are 10 jobs that require those context variables. Without context groups it will be very difficult to create those context variables again and again in every job.

You can create different context groups for different functionality of variables. For example, you can have different context group for database parameters , SMTP params and SFTP params etc.


Click on links below to know more about context variables and context groups:
1. Understand Context Variables Part 1 (Context Variables, Context groups)
2. Understand Context Variables Part 2 ( Define context variables in Repository, which can be made available to multiple jobs)
3. Understand Context Variables Part 3 (Populate the values of context variables from file. tContextLoad)
4. How to Pass Context Variables to Child Jobs.
5. How to Pass context Variables/ Parameters through  command line.

11. Use Talend.properties file to provide the values to context variables using tContextLoad.

Always provide the value of context variables either through database table or through Talend.properties file. Below is sample of Talend.properties file.










Click hereto understand How to Populate the values of context variables from fileusing tContextLoad component.

12. Create Variables in tMap and use the variables to assign the values to target fields.
For multiple use of single expression or for using the same mapping for multiple target fields, it is always good to create a variable in tMap and assign the value of that variable in target fields. It will allow to only evaluating the expression once for multiple number of times.



13. Create user routines/functions for common transformation and validation.

Always create routines/functions for all common transformations and validation rules which can be used in multiple Talend jobs.


















Click hereto know, How to create user routines and functions.

14. Develop Talend job iteratively.

Divide the Talend Job to multiple sub jobs for easy maintainability. First create a subjob and then test it and then move to next sub job.

15. Always Exit Talend open studio before shutting down the PC.

Talend workspace may get corrupted sometimes, if you shutdown your machine before exiting Talend Open Studio. So always exit Talend before shutting down PC.

16. Always rename Main Flows in Talend Job to meaningful names.

Thanks to +Balázs Gunics for this point. It is always good to rename the main flows in Talend Job to more meaning full names so that when you refer the fields in tMap component or using tFlowIterate it will be easy to refer and understand which data is coming from which flow.

17. Always design Talend jobs by keeping performance in mind.

Thanks to +Viral Patel for this point. It is recommended to design the job by keeping performance of the job in mind. Visit this link to know "How to optimize the job in order to improve the performance".

Please let me know your thoughts on these points and also let me know, if you feel I have missed something. 

This article is written by +Vikram Takkar  and published on www.vikramtakkar.com, please let me know, if you see this article on any other website/blog.

No comments:

Post a Comment