Thursday 26 September 2013

Difference between tMap and tFilterRow - Talend

Earlier, I had written an article on differences between tMap vs tJoin. Some of readers have been asking difference between tMap and tFilterRow components in Talend.

Basic difference lies in the purpose and functionality of the components. Lets discuss the functionality of both the components in detail:

tMap:  
  1. tMap is quite powerful component in terms of functionality as compared to tFilterRow component. Apart from filtering source data it provides lot of other features like lookup, joins etc.
  2. There can be multiple source links to tMap component. One of them is the Main links and all others are Lookup links.
  3. There can be multiple output links/ output data flows from tMap component.
  4. In tMap we can apply transformation to both source and output links.
  5. In tMap, we can transform source column/fields and then apply filter on the transformed column. e.g. We can lookup data from other file based on source field and then apply filter condition to the lookup column which was not present earlier in the source data.
  6. In tMap we can filter data from all the inputs links.

tFilterRow:
  1. Purpose of tFilterRow is to filter the source data based on condition on the source columns.
  2. There can be only one source/input link to tFilterRow Component.
  3. There can be only two output links to tFilterRow i.e. Filter (records that satisfy the filter condition) and Rejects (Records that fails the condition given in the component).
  4. In tFilterRow we can only apply filter condition on source data columns.
  5. In tFilterRow we can not lookup data hence can not apply filter on lookup fields.
  6. In tFilterRow we can not have more than one input links so we can only apply filter condition on one link only.

Lets look at various ways of filtering data using tMap .




Lets look at various ways of filtering data using tFilterRows

In Basic settings, to add a filter condition on source data, you can add columns in the Conditions section. Click on + button to add one or more columns to filter data based on them.




e.g. if we want to filter data on dept_id column then add the column in the  conditions and select dept_id column from InputColumn dropdown.



We can add more than one columns to specify the filter condition. To filter all the employees whose age is greater than 40 and belongs to department 10. You have to carefully select the “Logical Operator used to combine conditions”. If we want to impose both the conditions then select And else Or.




If we have some complex expressions then we can use the “Use Advanced mode” checkbox in Basic Settings and provide the expression in the Advanced box. to refer the column from source data flow you can use input_row.<column name> e.g. to refer dept_id use input_row.dept_id.




Let me know guys if you come across any more differences between both the components.

This article is written by +Vikram Takkar and published on www.vikramtakkar.com, please let me know, if you see this article on any other website/blog.

No comments:

Post a Comment