We have made the difficult
decision to temporarily close
our books as we currently have
no capacity for new data
cleansing projects.

Data Merging & Data Migration Services

The Data Cleansing Group is an Australian provider of data merging and data migration services. We are passionate data geeks who will do whatever it takes to solve your data problems.

Our services include extracting data from legacy or modern systems, restructuring, reorganising and merging the data into a homogeneous format. In addition, this data can be cleansed, de-duplicated and verified before being exported into your target system’s format.

Sometimes Easy, Sometimes Difficult…

Data merging & migration is a serious task due to the way that data is uniquely defined, structured and identified within the source and target systems. These issues are further aggravated when multiple sources are merged together. Typical problems include:

  • Contradicting Records: (eg “Part #: 123; Description: Widgit”, “Part #: 123; Description: Shiny Widgit” – The same item had different definitions
  • Cryptic Abbreviations: (eg occupation: Mkt. D. B)
  • Duplicated Records: (eg. Employee: John Smith, J. Smith, J.A. Smith) – Same employee included three times
  • Embedded Values: Multiple values included in a single field (eg First Name=”Bob Smith Melbourne”
  • Illegal Values: (eg. an employees date of birth is 1 Jan 1860) – Values outside of the domain’s range.
  • Incorrect References: (eg. John Smith, Dept: 27) . Whilst Dept: 27 is defined, Mr Smith actually works in dept 41)
  • Measurement Units: Cost = 10000, but this can be in dollars in System 1 but in cents or another currency in System 2
  • Misfielded Values: (eg: City=”Australia”). Correct value placed in the incorrect field
  • Missing Values: (eg data not provided or has been replaced with null or dummy data)
  • Misspellings: (eg typos, transposed values and spelling mistakes)
  • Naming Conflicts: Same name used to describe different fields in the different systems, or different names describe the same field.
  • Value Representations: (Eg Male, Female vs M,F vs 1,2
  • Violated Dependencies (eg Age = 20, but date of birth = 1 Jan 1965) – In this case, the employees age does not equal the current date minus the date of birth)
  • Violated References: ( eg: “Name: Bob Smith; Department:199” but department 199 is not defined in the old system)
  • Violated Uniqueness: (eg: different employees with the same tax file number). Tax file numbers are unique for every individual
  • Word Transpositions: (eg: ‘Smith John’ instead of ‘John Smith’)

The seriousness of the above problems depend on the project brief. Some clients just want their data migrated “warts and all” to the target system. And once it’s in the target system, they continue as before or initiate an in-house data quality project. These tend to be quick and easy migration projects.

For other clients, the brief is that the data must be as-correct-as-it-can-practically-be. These tend to be more challenging and time consuming projects.

In either case, however one requires a sound data migration strategy to ensure a cost effective solution.

The Data Merging & data migration Cycle

A Logical Approach to Data Migration

A typical data migration and merging project includes the following phases:

  1. Data and Schema Analysis:
    • Data profiling provides an indication of data quality, and includes information such as data type, length, value range, discrete values and their frequency, uniqueness, typical data pattern, etc.
    • Data mining helps discover integrity constraints and business rules. These can be used to correct illegal values, identify duplicate records and rebuild missing values across data sources.
    • Schema review through discussions with domain experts, or system support forums, to obtain insight into the definition, structure and identities within the source and target data. Where this is not possible, we rely on the data mining results.
  2. Define data formats, mapping rules and transformation workflows: Results from Step 1 above are translated into a series of programmable instructions. Their purpose is to convert the data from its source to target format, plus resolve any associated data quality issues. These transformation steps may require user input where there is no applicable logic.
  3. Verification: The accuracy and effectiveness of the transformation steps are evaluated on the source data and the transformational steps tweaked if required. Steps 2 & 3 may require multiple iterations, since some errors may only become apparent after applying the initial transformations.
  4. Transformation: The verified transformation steps are applied. Any errors requiring manual identification are identified and resolved.
  5. Export cleaned data and import into target system: The cleaned and transformed data is exported, and imported into the target system.
  6. Test, Test, Test: Key reports from the target system are compared to those from the source system to ensure that the data was correctly transformed and uploaded. Significant errors may require a reiteration of Steps 1 to 6.

Typical Data Merging & Migration Projects

Our custom database migration tools and VBA scripts enable us to quickly automate many laborious processes, providing you with a high quality, cost effective solution. This service is targeted at organisations:

  • Migrating data from old to new systems.
  • Merging data from multiple sources into an existing system.
  • Extracting data from legacy systems for archive purposes.
  • Looking to clean up their database in order to improve decision making, reduce associated data costs, or improve their data’s ROI.

Our typical data merging & migration projects include:

  • CRM data.
  • Financial data.
  • Enterprise Resource Planning (ERP) data.
  • Health, Safety and Environment data.
  • Maintenance data.
  • Product & inventory data.
  • Production data.
  • Project data.
  • Server & internet log data.

Give your new system the best possible chance of success by starting with clean data foundation.