Unit Testing Template For Etl Listing
It is a known fact that ETL testing is one of the crucial aspects of any Business Intelligence (BI) based application. In order to get the quality assurance and acceptance to go live in business, the BI application should be tested well beforehand.
The list of ETL tools reveals both the strengths and weaknesses of all the major ETL tools. Unit Testing Template For Etl Tools Open. It is likely there will still be bugs, and no one is perfect, but there's no excuse for a program running for the first time in the test layer. There's always a way to perform a good unit test, and doing it is.
The primary objective of ETL testing is to ensure that the Extract, Transform & Load functionality is working as per the business requirements and in sync with the performance standards.
Before we dig into ETL Testing with Informatica, it is essential to know what ETL and Informatica are.
What You Will Learn:
What you will learn in this ETL tutorial:
- Basics of ETL, Informatica & ETL testing.
- Understanding ETL testing specific to Informatica.
- Classification of ETL testing in Informatica.
- Sample test cases for Informatica ETL testing.
- Benefits of using Informatica as an ETL tool.
- Tips & Tricks to aid you in testing.
In computing, Extract, Transform, Load (ETL) refers to a process in database usage and especially in data warehousing that performs:
- Data extraction – Extracts data from homogeneous or heterogeneous data sources.
- Data Transformation – Formats the data into required type.
- Data Load – Move and store the data to a permanent location for long term usage.
Informatica PowerCenter ETL Testing Tool:
Informatica PowerCenter is a powerful ETL tool from Informatica Corporation. It is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any
It is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any format and delivering that data throughout the enterprise at any speed. Through Informatica PowerCenter, we create workflows that perform end to end ETL operations.
Download and Install Informatica PowerCenter:
To install and configure Informatica PowerCenter 9.x use the below link that has step by step instructions:
=> Informatica PowerCenter 9 Installation and Configuration Guide
Understanding ETL testing specific to Informatica:
ETL testers often have pertinent questions about what to test in Informatica and how much test coverage is needed?
Let me take you through a tour on how to perform ETL testing specific to Informatica.
The main aspects which should be essentially covered in Informatica ETL testing are:
- Testing the functionality of Informatica workflow and its components; all the transformations used in the underlying mappings.
- To check the data completeness (i.e. ensuring if the projected data is getting loaded to the target without any truncation and data loss),
- Verifying if the data is getting loaded to the target within estimated time limits (i.e. evaluating performance of the workflow),
- Ensuring that the workflow does not allow any invalid or unwanted data to be loaded in the target.
Classification of ETL Testing in Informatica:
For better understanding and ease of the tester, ETL testing in Informatica can be divided into two main parts –
#1) High-level testing
#2) Detailed testing
Firstly, in the high-level testing:
- You can check if the Informatica workflow and related objects are valid or not.
- Verify if the workflow is getting completed successfully on running.
- Confirm if all the required sessions/tasks are being executed in the workflow.
- Validate if the data is getting loaded to the desired target directory and with the expected filename (in case the workflow is creating a file), etc.
In a nutshell, you can say that the high-level testing includes all the basic sanity checks.
Coming to the next part i.e. detailed testing in Informatica, you will be going in depth to validate if the logic implemented in Informatica is working as expected in terms of its results and performance.
- You need to do the output data validations at the field level which will confirm that each transformation is operating fine
- Verify if the record count at each level of processing and finally if the target is as expected.
- Monitor thoroughly elements like source qualifier and target in source/target statistics of session
- Ensure that the run duration of the Informatica workflow is at par with the estimated run time.
To sum up, we can say that the detailed testing includes a rigorous end to end validation of Informatica workflow and the related flow of data.
Let us take an example here:
We have a flat file that contains data about different products. It stores details like the name of the product, its description, category, date of expiry, price, etc.
My requirement is to fetch each product record from the file, generate a unique product id corresponding to each record and load it into the target database table. I also need to suppress those products which either belong to the category ‘C’ or whose expiry date is less than the current date.
Say, my flat file (source) looks like this:
(Note: Click on any image for enlarged view)
Based on my requirements stated above, my database table (Target) should look like this:
Table name: Tbl_Product
Prod_ID (Primary Key) | Product_name | Prod_description | Prod_category | Prod_expiry_date | Prod_price |
---|---|---|---|---|---|
1001 | ABC | This is product ABC. | M | 8/14/2017 | 150 |
1002 | DEF | This is product DEF. | S | 6/10/2018 | 700 |
1003 | PQRS | This is product PQRS. | M | 5/23/2019 | 1500 |
Now, say, we have developed an Informatica workflow to get the solution for my ETL requirements.
The underlying Informatica mapping will read data from the flat file, pass the data through a router transformation that will discard rows which either have product category as ‘C’ or expiry date, then I will be using a sequence generate to create the unique primary key values for Prod_ID column in Product Table.
Finally, the records will be loaded to Product table which is the target for my Informatica mapping.
Examples:
Below are the sample test cases for the scenario explained above.
You can use these test cases as a template in your Informatica testing project and add/remove similar test cases depending upon the functionality of your workflow.
#1) Test Case ID: T001
Test Case Purpose: Validate workflow – [workflow_name]
Test Procedure:
- Go to workflow manager
- Open workflow
- Workflows menu-> click on validate
Input Value/Test Data: Sources and targets are available and connected
Sources: [all source instances name]
Mappings: [all mappings name]
Targets: [all target instances name]
Session: [all sessions name]
Expected Results: Message in workflow manager status bar: “Workflow [workflow_name] is valid “
Actual Results: Message in workflow manager status bar: “Workflow [workflow_name] is valid “
Remarks: Pass
Tester Comments:
#2) Test Case ID: T002
Test Case Purpose: To ensure if the workflow is running successfully
Test Procedure:
- Go to workflow manager
- Open workflow
- Right click in workflow designer and select Start workflow
- Check status in Workflow Monitor
Input Value/Test Data: Same as test data for T001
Expected Results: Message in the output window in Workflow manager: Task Update: [workflow_name] (Succeeded)
Actual Results: Message in the output window in Workflow manager: Task Update: [workflow_name] (Succeeded)
Remarks: Pass
Tester Comments: Workflow succeeded
Note: You can easily see the workflow run status (failed/succeeded) in Workflow monitor as shown in below example. Once the workflow will be completed, the status will reflect automatically in workflow monitor.
In the above screenshot, you can see the start time and end time of workflow as well as the status as succeeded.
#3) Test Case ID: T003
Test Case Purpose: To validate if the desired number of records are getting loaded to target
Test Procedure: Once the workflow has run successfully, go to the target table in database
Check the number of rows in target database table
Input Value/Test Data: 5 rows in the source file
Target: database table – [Tbl_Product]
Query to run in SQL server: Select count(1) from [Tbl_Product]
Expected Results: 3 rows selected
Actual Results: 3 rows selected
Etl Listing Search
Remarks: Pass
Tester Comments:
#4) Test Case ID: T004
Test Case Purpose: To check if sequence generator in Informatica mapping is working fine for populating [primary_key_column_name e.g. Prod_ID] column
Test Procedure: Once the workflow has run successfully, go to the target table in database
Check the unique sequence generated in column Prod_ID
Input Value/Test Data: value for Prod_ID left blank for every row in source file
Sequence Generator mapped to Prod_ID column in the mapping
Sequence generator start value set as 1001
Target: database table- [Tbl_Product] opened in SQL Server
Expected Results: Value from 1001 to 1003 populated against every row for Prod_ID column
Actual Results: Value from 1001 to 1003 populated against every row for Prod_ID column
Remarks: Pass
Tester Comments:
#5) Test Case ID: T005
Test Case Purpose: To validate if router transformation is working fine to suppress records in case the product category is ‘C’ or the product has got expired.
Test Procedure: Once the workflow has run successfully, go to the target table in database
Run the query on the target table to check if the desired records have got suppressed.
Input Value/Test Data: 5 rows in the source file
Target: database table – [Tbl_Product]
Query to run in SQL server: Select * from Product where Prod_category=’C’ or Prod_expiry_date < sysdate;
Expected Results: no rows selected
Actual Results: no rows selected
Remarks: Pass
Tester Comments: (if any)
#6) Test Case ID: T006
Test Case Purpose: To check the performance of the workflow by recording the workflow runtime.
Test Procedure:
- Open the workflow monitor and go the run that was done as part of T001.
- Record the start time and end time of workflow.
- Calculate total run time by subtracting start time from end time.
Input Value/Test Data: Workflow has run successfully
Start time of workflow in monitor
End time of workflow in monitor.
Expected Results: 2 min 30 secs
Actual Results: 2 min 15 secs
Remarks: Pass
Tester Comments: Considering the test as ‘Pass’ in case the actual run duration is +/- 10% of expected run duration.
#7) Test Case ID: T007
Test Case Purpose: To validate data at target table column level in order to ensure that there is no data loss.
Test Procedure: Once the workflow has run successfully, go to the SQL Server.
Run the query on the target table to check there is no data loss.
Input Value/Test Data: Workflow has run successfully
One sample record from source flat file.
SQL Query: Select Top 1 * from Tbl_Patient;
Expected Results:
1 row returned
Prod_ID (Primary Key) | Product_name | Prod_description | Prod_category | Prod_expiry_date | Prod_price |
---|---|---|---|---|---|
1001 | ABC | This is product ABC. | M | 8/14/2017 | 150 |
Actual Results:
1 row returned.
Prod_ID (Primary Key) | Product_name | Prod_description | Prod_category | Prod_expiry_date | Prod_price |
---|---|---|---|---|---|
1001 | ABC | This is product ABC. | M | 8/14/2017 | 150 |
Remarks: Pass
Tester Comments: Considering the test as ‘Pass’ in case the actual run duration is +/- 10% of expected run duration.
Benefits of Using Informatica as an ETL tool:
Informatica is a popular and successful ETL tool because:
- It has a high “go live” success rate (nearly 100%)
- Informatica has the capability of enabling Lean Integration.
- It is a moderately priced tool when compared to other ETL tools.
- It comes with internal job scheduler. So, there is no need to use third-party scheduler separately like some other ETL tools do.
- Easy training and tool availability has made Informatica more popular.
Some useful Tips to assist you in Informatica ETL testing:
- Generate the test data before executing the test scenarios.
- The test data should be in sync with the test case it is used for.
- Make sure that you have covered all the 3 scenarios – no data is submitted, invalid data is submitted and valid data is submitted as an input to Informatica workflow.
- Make sure to test that all the required data is getting loaded to target completely. For this, you can use test case – T003 described above as a sample.
- It is very important to test that the workflow is doing all the data transformations correctly as per business rules.
- I would suggest that for each transformation applied in your Informatica mapping, you should have a checklist to verify output data against it. That way, you can report bugs easily if any transformation is not working fine.
Conclusion:
So, we have seen in detail, some of the sample test cases that can be used as a template to cover ETL testing in Informatica. As I mentioned earlier, you can add/remove/modify these test cases depending on the scenario you have in your project.
As I mentioned earlier, you can add/remove/modify these test cases depending on the scenario you have in your project.
The Informatica PowerCenter is a foundation for any data integration activities.
You can easily perform script-free automated testing of data copied to test, dev or production environment, and that is the reason why PowerCenter is the most popular ETL tool nowadays.
Recommended reading => ETL vs. DB Testing – A Closer Look at ETL Testing Need
About the author: This is a guest article by Priya K. She is having 4+ years of hands-on experience in developing and supporting Informatica ETL applications.
Feel free to post your queries/comments about this ETL tool.