Tuesday 25 December 2012

Extracting data from Database tables using Talend Open Studio

In the last post here, we have learned to create metadata for the database and retrieve table schema from the database. It was first step for performing any activity with database e.g. extracting, joining and loading data to database tables.

In this post we will use the schema defined in the previous post and perform following activities:

1. Extracting data from Single table to delimited file.
2. Joining two tables outside the database component using tJoin component.
3. Joining two tables within database components (without tJoin component)

Lets start with Extracting data from single table to delimited file.

1. Drag the required table (employees) from the “Table Schemas” under the Metadata section in Repository pane to Job designer. Choose tMySQLInput from the popup.

2. Drag tFileOutputDelimited component from Pallet pane to Job designer pane.
3. Open the component properties for tFileOutputDelimited to enter the path and name of output file. Also, you can update details like field and row separators.

4. Right click employee tMYSQLInput component and select Row > Main, and connect it to tFileOutputDelimited component.

5. Run the Job and check the output from the delimited file. For your reference, I have copied the output below.

employeeNumber,lastName,firstName,extension,email,officeCode,reportsTo ,jobTitle
1056,Patterson,Mary,x4611,mpatterso@classicmodelcars.com,1,1002,VP Sales
1076,Firrelli,Jeff,x9273,jfirrelli@classicmodelcars.com,1,1002,VP Marketing
1088,Patterson,William,x4871,wpatterson@classicmodelcars.com,6,1056,Sales Manager (APAC)
1102,Bondur,Gerard,x5408,gbondur@classicmodelcars.com,4,1056,Sale Manager (EMEA)
1143,Bow,Anthony,x5428,abow@classicmodelcars.com,1,1056,Sales Manager (NA)
1165,Jennings,Leslie,x3291,ljennings@classicmodelcars.com,1,1143,Sales Rep
1166,Thompson,Leslie,x4065,lthompson@classicmodelcars.com,1,1143,Sales Rep
1188,Firrelli,Julie,x2173,jfirrelli@classicmodelcars.com,2,1143,Sales Rep
1216,Patterson,Steve,x4334,spatterson@classicmodelcars.com,2,1143,Sales Rep
1286,Tseng,Foon Yue,x2248,ftseng@classicmodelcars.com,3,1143,Sales Rep
1323,Vanauf,George,x4102,gvanauf@classicmodelcars.com,3,1143,Sales Rep
1337,Bondur,Loui,x6493,lbondur@classicmodelcars.com,4,1102,Sales Rep
1370,Hernandez,Gerard,x2028,ghernande@classicmodelcars.com,4,1102,Sales Rep
1401,Castillo,Pamela,x2759,pcastillo@classicmodelcars.com,4,1102,Sales Rep
1501,Bott,Larry,x2311,lbott@classicmodelcars.com,7,1102,Sales Rep
1504,Jones,Barry,x102,bjones@classicmodelcars.com,7,1102,Sales Rep
1611,Fixter,Andy,x101,afixter@classicmodelcars.com,6,1088,Sales Rep
1612,Marsh,Peter,x102,pmarsh@classicmodelcars.com,6,1088,Sales Rep
1619,King,Tom,x103,tking@classicmodelcars.com,6,1088,Sales Rep
1621,Nishi,Mami,x101,mnishi@classicmodelcars.com,5,1056,Sales Rep
1625,Kato,Yoshimi,x102,ykato@classicmodelcars.com,5,1621,Sales Rep
1702,Gerard,Martin,x2312,mgerard@classicmodelcars.com,4,1102,Sales Rep

Join two tables outside the database component using tJoin component.

In the database schema we have retrieved in our previous post, contains employee and offices tables. In our example we will join these two tables to find out city, state and country fields for all the employees.

Lets create a new Job and follow the steps mentioned below:

1. Drag the employees and offices tables from  the “Table Schemas” under the Metadata section in Repository pane to Job designer. Choose tMySQLInput from the popup.

2. Drag tMap and tFileOutputDelimited components from Palette pane to Job designer.

3. Right click employee tMySQLInput component and select Row > Main, and connect it to tMap component. Similarly, Right click offices tMySQLInput component and select Row > Main, and connect it to tMap component.  You'll notice that, even though we selected Row > Main in both cases, the resulting connector is slightly different. The first connector we made is noted as Main, while the second is called Lookup. The order in which we connect the components to the tJoin is important. Join the primary data stream first, and one or more lookup components subsequently.

If you join connections in the wrong order or wish to change which is the Main data flow and which is the Lookup data flow, you can do this by right-clicking on one of the connectors and selecting Set this connection as Lookup or Set this connection as Main.

4. Double-click on the tMap component to open the Map Editor. Notice that there are no fields in the output flow.

5. Now drag the required columns from row1 and row2 to the output. In our example, we will drag employeeNumber, lastName, firstName and jobTitle from row1 and drag city, state and country from row2.

Also ,connect the two input data flows (row1 and row2) within the Map Editor by clicking on officeCode from row1 and dragging this to the expression key box of officeCode in row2.

Click Ok to save the mapping.

6. Run the Job and check the output from the delimited file. For your reference, I have copied the output below.

1002;Diane;Murphy;President;San Francisco;CA;USA
1056;Mary;Patterson;VP Sales;San Francisco;CA;USA
1076;Jeff;Firrelli;VP Marketing;San Francisco;CA;USA
1088;William;Patterson;Sales Manager (APAC);Sydney;;Australia
1102;Gerard;Bondur;Sale Manager (EMEA);Paris;;France
1143;Anthony;Bow;Sales Manager (NA);San Francisco;CA;USA
1165;Leslie;Jennings;Sales Rep;San Francisco;CA;USA
1166;Leslie;Thompson;Sales Rep;San Francisco;CA;USA
1188;Julie;Firrelli;Sales Rep;Boston;MA;USA
1216;Steve;Patterson;Sales Rep;Boston;MA;USA
1286;Foon Yue;Tseng;Sales Rep;NYC;NY;USA
1323;George;Vanauf;Sales Rep;NYC;NY;USA
1337;Loui;Bondur;Sales Rep;Paris;;France
1370;Gerard;Hernandez;Sales Rep;Paris;;France
1401;Pamela;Castillo;Sales Rep;Paris;;France
1501;Larry;Bott;Sales Rep;London;;UK
1504;Barry;Jones;Sales Rep;London;;UK
1611;Andy;Fixter;Sales Rep;Sydney;;Australia
1612;Peter;Marsh;Sales Rep;Sydney;;Australia
1619;Tom;King;Sales Rep;Sydney;;Australia
1621;Mami;Nishi;Sales Rep;Tokyo;Chiyoda-Ku;Japan
1625;Yoshimi;Kato;Sales Rep;Tokyo;Chiyoda-Ku;Japan
1702;Martin;Gerard;Sales Rep;Paris;;France

Joining two tables within database components

In this example we are going to use SQL query to the database component that joins two database tables.

1. In the Repository window, expand the Metadata section, click on the MySQL_CON connection and drag it onto the Job Designer.

The available components window will appear. Choose tMySQLInput from the list.

2.  Click on the Component tab below the Job Designer. You can see that the database connections are visible, but because we selected the parent component from the Repository window, rather than a specific table within the schema, there isn't a query or schema defined for this component yet.

3.  Click on the ellipsis button to the right of the Query box and the SQL Builder window will appear. We can use this to define the tables, fields, and specific query that will be used in our job.

4. We will design our query in SQL builder and we will use its the query designer functionality to build our query. In the Database Structure pane, expand the classicmodels item to reveal the tables.

5. Right click the Designer pane and select Add Tables. Select employees and offices tables from Add Tables pop up and click Ok.

This will add two selected tables (employees and offices) to designer pane as shown in screenshot below.

6. In the next step, only check the checkboxes for those fields in both the tables which are required in the output. In our example check employeeNumber, lastName and firstName from employees table and check city, state and country from offices table.

Also, right click officeCode field from employees table and drag it and connect it to officeCode to offices table. This will add join condition between both the table. Refer the query in below screen shot below.

7. Close the window and the SQl query will appear in the Query section of tMySQLInput component properties.

8. Click on Edit Schema button to check if schema is correct. You can modify the schema based on your requirements.

9. Click on and drag tFileOutputDelimited component from Palette pane to Job designer. Open the component properties of the  tFileOutputDelimited and enter the path and name of the output delimited file.

10. Right click MySQLDB_CON (tMySQLInput) component and select Row > Main and connect it to tFileOutputdelimited component.

11. Run the Job and check the output. For your reference, I have copied the output below:

1002;Murphy;Diane;San Francisco;CA;USA
1056;Patterson;Mary;San Francisco;CA;USA
1076;Firrelli;Jeff;San Francisco;CA;USA
1143;Bow;Anthony;San Francisco;CA;USA
1165;Jennings;Leslie;San Francisco;CA;USA
1166;Thompson;Leslie;San Francisco;CA;USA
1286;Tseng;Foon Yue;NYC;NY;USA

You may also like to read..

No comments:

Post a Comment