Ensuring the quality of data migrated from one system to another can be a very important challenge for any QA since we generally find a very large volume of data to be tested, which had been processed and many times transformed before being loaded into the new database used by the new system, and in general are very critical data for the user.

In this post I will try to give some recommendations on strategies and tools that we can use to guarantee, as much as possible, the quality of the data.

I will also try to give some tips to coordinate our work with the migration team (those in charge of writing and executing the scripts that will perform the data migration).

General strategy

Our first task as QA assigned to a project that will require data migration is to coordinate with the team that will perform the migration regarding the following items:

  • Documentation: Surely the migration team will be using some type of document to track the data that will be migrated from one table to another, and the fields of each table that will be migrated. This document should look like this:
Source tableTarget tableTransformation
Source field 1Target field 1Source date format “xxx”, target date format “yyy”
Source field 2Target field 2field “xxx” * 12

We must ensure that we have access to this document, as it will be crucial for our validation when comparing source data vs. migrated data

Testing cycle: The most effective way to coordinate our work with the migration team is to do it in cycles, that is, a work period in which the data will be loaded into the new system and in which we will execute a series of tests and we will report bugs found so that later the migration team can fix them, delete and re-migrate all the data to start a new test. We will repeat these cycles until we find no more errors.

The following image represents a planning cycle, test cycle, and then migration to production once there are no more errors.

Data Comparison and Functional Testing

There are at least two types of tests that we must perform once the data has been uploaded to the destination system:

Data comparison: At a high level, the purpose of this test is to make sure that all the data migrated and processed by the scripts made by the migration team were correctly uploaded to the target system, that is, we will be making a comparison of the largest amount of records possible, using the table mentioned above.

If the amount of data is too high we will not be able to manually compare all the records since this would take an infinite amount of time, so we will have to organize the test by assembling certain “groups” of data based on specific criteria and then test a subset of each of the groups. 

The criteria by which we are going to separate the groups will be given by the particularities of the same data and by the destination system. For example, if we are migrating contracts between one client and another, we will place in a group all the contracts that have similar products and characteristics, and then we will test 5 or 6 of them. Then in another group contracts that have some particularity or that needed to be processed differently, we will have to ask this to the migration team, and we will be able to obtain answers such as “the Contracts ending in the middle of the month had to be adjusted, and their dates modified to be accepted in the destination system,” so we can form another group and make sure to test a subset of these as well.

To perform this type of test more efficiently there are many tools that can be used. One of my favorites is Pentaho Kettle, which is a free ETL tool and very easy to learn.

With Pentaho Kettle we will be able to create scripts that will allow us to carry out different queries in the target system in order to create reports with records of each of the groups that we have previously identified.

In the following example of a script made with Kettle, we make queries to different tables where the data was loaded, then we do a join and a filter to finally obtain reports in Excel about the data groups that we have identified

Functional Testing: Having all the data loaded correctly in the target system does not guarantee that the system works correctly with it. We will have to carry out different functional tests operating with the data before we finish our test cycle.

We must correctly identify what type of operations are going to be performed on the target system with the data that has been loaded and make sure to test each one of these functionalities, with each of the data groups identified above. For this, the most recommended thing is to ask our client what operations they want to do with the data once in the system and based on this, to build test cases to make sure we correctly cover each functionality with each group of data.

Both types of tests must be executed in each cycle until no more errors are found and we can end the test stage.