Friday, 28 December 2012

Understanding tFileList component - Talend open Studio

Today, I am going to demonstrate usage of tFileList component.tFileList iterates on files or folders of a set directory and retrieves a set of files or folders based on a filemask pattern and items on each unity.

We are going to create a Job, which aims at listing files from a defined directory, reading each file by iteration, selecting delimited data and displaying the output in the Run log console.

First lets look at the Directory structure and its files. We have three files which contains department information in directory D:\TalendFiles\dir (refer screenshot below)





All the three files have same schema and contains departmentId and departmentName. look at all the three files below.




Now our aim is to read all the files iteratively and display in Run log. To achieve this create and new job and perform following steps.

1. Drag tFileList component from Palette pane to Job Designer pane.

2. Open the tFileList component properties and enter the path of Directory where the files are located. Also provide the Filemask pattern in the Files section. This will only select the files based on the pattern. In our example, we want to process txt files hence we will enter filemask pattern as “*.txt


3. Now, Drag tFileInputDelimited component from Palette pane to Job Designer.

4. Right click tFileList component and select Row > Iterate and connect it tFileInputDelimited component. Iterate Link will execute tFileInputDelimited component for the number of files received from tFileList component.




Notice that there is cross sign (Error)  on tFileInputDelimited component. This warning is coming as this input component requires output component to be connected. We will connect the output component later. Lets ignore this error as of now.




5. Open the component properties of tInputFileDelimited component. Now we need to provide the name of the file dynamically. Hence, remove the existing path from the File name/Stream text box and press Ctrl + Spacebar from keyboard.

Talend will present all the Global variables which we can use. Select tFileList_1.CURRENT_FILEPATH as shown in the screenshot below. This will pass the path of the file from tFileList component.


7. Now change the Field separator to “,” as our files  are comma separated.



8. Click on Edit Schema to provide the schema of the files. In the Popup add two columns departmentID and departmentName  as shown in screenshot below.



9. Drag tLogRow component from Palette pane and Right click tFileInputDelimited component and Row > Main and connect to tLogRow component.




10. Its time to run our Job. Look at the screenshot below. tFileList has found 3 files and executed other components for every file.




In the next article, I will demonstrate another iterate component tFlowToIterate.


You may also like to read..

2 comments:

  1. Great stuff.. I will use this code as such scenario comes handy to us

    ReplyDelete
  2. Thats great. But how can I set tFileList if i want to iterate files in multiple dynamic tFilelist? I try to use tLoop, but iteration only works in one directory. Any one have idea

    ReplyDelete