In
the previous post here, I have shown you guys how to perform lookup
operation in Talend using tJoin component. In that post we had only
select main output data from the join output. But in today's post I will
show you, How to fetch main output data as well as reject data where
the match for the mentioned keys are not found.
I am again going to use the similar example using the tJoin component. tJoin joins two tables by doing an exact match on several
columns. It compares columns from the main flow with reference columns from the
lookup flow and outputs the main flow data and or the rejected data. This
component helps you ensure the data quality of any source data against a
reference data source.
To demonstrate tJoin, I am going to create a Talend
Job to perform an exact match on department ID column between the two files and
output the main (matched) and Reject data.
Lets look at the Input
Files:
Main Employee File - This is our main Input.
Reference (Lookup) File - We will use this file for reference against the main file and fetch
department name by using deptID field.
Let’s create a new job and following
steps mentioned below;
1.
First create the metadata of the main input delimited employee
file by following the steps mentioned in post “Generating Metadata for CVS
(Delimited) using Talend Open Studio”.
Metadata will be available under the Metadata section in the Repository
pane. Metadata should look like following:
2.
Drag the metadata created in the Step 1 to the Job designer and
select component tFileInputDelimited component.
3.
Drag component tFileInputDelimited to the job designer from
Palette pane. Double click the component the edit the properties.
4.
Enter the path of the Lookup delimited file in the File
name/Stream text box. Also provide other details like Row Separator
("\n") and Field Separators (";") as per specifications of
the Lookup files.
5.
Click on the Edit Schema button (highlighted in blue in
above screenshot) and enter schema details of the lookup file as below:
6. Now, its time to add tJoin and two tLogRow components from the Palette pane to the job designer. tJoin component will help to join (perform lookup) two data stream defined above and First tLogRow will be used to display the matched data and other will be used to display the reject data from the main stream where the match from lookup file is not found.
7. Right-click on the main Input delimited file component, select Row > Main, and drop the connector onto the tJoin component.
8.Right-click on the Lookup file component, select Row > Main, and again drop this onto the tJoin component. You'll notice that, even though we selected Row > Main in both cases, the resulting connector is slightly different. The first connector we made in step 7 is noted as Main, while the second is called Lookup. The order in which we connect the components to the tJoin is important. Join the primary data stream first, and one or more lookup components subsequently.
If you join connections in the wrong order or wish to change which is the Main data flow and which is the Lookup data flow, you can do this by right-clicking on one of the connectors and selecting Set this connection as Lookup or Set this connection as Main.
Notice that there is no column in
the schema of tJoin_1 section. We
will add empID and empName column from
tFileInputDelimited_1 section from left window.
Similarly, add deptName column from tFileInputDelimited_1
section from left window to tJoin_1
window.
10. Now we are done with schema of the tJoin component.
In the component properties apply following settings:
i.
Click Checkbox “Include lookup columns in the output”.
This will allow us to fetch the value of column from the reference (Lookup)
file as well. (Of-course for the matching records)
ii.
In the “Column mapping” section, Click on the
(+) button to add a name of the column and its mapping which will be fetched
from lookup file. In our example, we are fetching deptName from the lookup file. Hence provide the mapping for it.
iii.
In the “Key Definition”
section, Click on the (+) button to add a name of the column and its mapping
which will be used to join both the file (main and the reference/lookup files).
iv.
Check the check-box “Inner Join (with output rejects)”,
Since our aim is to catch the rejects where the match is not found in different
stream.
11. Now, Right-click on the tJoin
component, select Row
> Main, and drop the connector onto the tLogRow_1 component.
This tLogRow will be used to
display the matched data in the output stream.
13. Now,
Our job is complete. It’s time to run the Job and check the output.
In the output stream, we can see that Department name has been
populated corresponding to dept id from the Lookup file. Also notice that in
the output of tLogRow_1 only those records have been fetched where the match
has been found in the lookup file. Similarly in the second tLogRow_2 output all
those records has been displayed where match from the reference/lookup file has
not been found.
Thanks Vikram!
ReplyDeleteOne thing you missed out, we need to select the 'Mode' for the tLogRow components as 'Table (print values in cells of tables)' so as to see the output in a tabular form.
I am getting nulls for joining column if we add joining column in the output, please help me on the same
ReplyDeleteI have 2 source files 1 Have CUSTOMER_NAME,CITY_ID,CITY,CUSTOMER_ID :COLUMNS and another one have
ReplyDeleteCUSTOMER_ID,STATE....I used as tjoin based on CUSTOMER_ID and I want to CUSTOMER_ID also in the output columns but getting nulls for CUSTOMER_ID columns :.-------------+-------+----+-----+-----------.
| tLogRow_1 |
|=------------+-------+----+-----+----------=|
|CUSTMORE_NAME|CITY_ID|CITY|STATE|CUSTOMER_ID|
|=------------+-------+----+-----+----------=|
|CHANNA |13 |GNT |TN |null |
|KOTIREDDY |13 |GNT |KA |null |
|HARESH |15 |SAP |DE |null |
|HASINI |15 |SAP |GN |null |
'-------------+-------+----+-----+-----------'
Please check the following two points mentioned above ---
Deletei. Click Checkbox “Include lookup columns in the output”. This will allow us to fetch the value of column from the reference (Lookup) file as well. (Of-course for the matching records)
ii. In the “Column mapping” section, Click on the (+) button to add a name of the column and its mapping which will be fetched from lookup file. In our example, we are fetching deptName from the lookup file. Hence provide the mapping for it.