General Cleansing Guidelines and Process


General Cleansing Guidelines and Process


General Cleansing Guidelines
Before data cleansing can begin, the data must be purified. This means finding and removing duplicate records, identifying incorrect data, and finding all records that are incomplete.


Duplicate entries include those with the same fixed asset, vendor, customer, etc. that are listed twice or more in the system. One or more of the data elements should be flagged so it’s not included in the file to be extracted. Merge the duplicated data or delete the duplicates.


Obsolete or Inactive Records

This refers to data that’s not up to date or is no longer active. Obsolete data should be removed, purged or deleted in the legacy CRM system. The rules to consider a record obsolete are:

  • Customers: No activity in the last (x) years.

  • Vehicle Ownership: Customer no longer owns the vehicle for (x) years.

Incorrect Data

Incorrect data includes data-entry errors such as spelling mistakes and stylistic inconsistencies (e.g. 2nd Street vs. Second Street, or Inc. vs Corporation). The file should be reviewed and corrected manually. If the error exists in more than one record, it may be possible to correct it automatically. Contact iSITE technical support for more information.

Incomplete Records

This refers to missing data in the current legacy CRM system. Incomplete records must be corrected, as iSITE may require some of this data.


Cleaning Process

These are the steps for the cleansing process:

  1. Run the corresponding legacy system report and download it to a .csv file.

  2. Depending on the size and/or complexity of the data file, manually or programmatically determine duplicates, as well as obsolete, incorrect or incomplete records.

  3. Correct records as detailed above. If necessary, consult with iSITE technical support.

  4. Report status per project plan and metrics sheet.

Data That Should Be Cleansed Based on SAP Requirements

Detailed data mapping and an understanding of iSITE data fields are necessary. The client and CRM provider will receive the corresponding support from the iSITE team to understand the requirements and complete field mapping. The guidelines listed below may be revised and customized for each conversion object.

Missing Required Values or Intermittent Data

The current system doesn’t require a particular field, so it has been left blank, or a given field should be filled per up-to-date procedure, but it’s skipped because the information wasn’t known at the time of data entry. This field is required in SAP per defined business process. Cleansing is required. It might be possible to automatically populate the field:

  • By plugging in a constant value, or

  • By referencing some other file to look up the information. If not, it will need manual data cleansing. Contact iSITE technical support for assistance.

Overloaded Data Fields
Two organizations use the same field to store two different elements of information. This requires cleansing in one or both data records.

Compound Data Fields

The current system doesn’t provide a separate field for a desired piece of information, so the information is stored sharing a field with other information. Example: the system includes a Contact field for an individual’s name. The system doesn’t include a separate field for the contact’s telephone number, so both the name and phone number are stored in the Contact field. It may not be possible to separate the two values reliably. Manual cleansing may be required.

Inconsistent Similar Data

Similar data is entered into separate or independent systems. Example: Two departments define projects in their systems. The same type of project-related data is entered into different systems, but the data format is different because they haven’t been validated against each other or a central system. Cleansing is required in one or both databases, depending on what the field will be used for.

Free-Form Text Fields

Free-form text fields may have data that varies in meaning, depending on the user who entered the data into the CRM system, third-party lead notes, etc. Data cleansing may be required.

Different Data Values Represent the Same Thing

Inconsistencies due to different data structures are used in different source systems. The typical problems in this instance include using different data values to represent the same thing (e.g. System A uses 1 for “yes,” System B uses Y for “yes” and System C uses a flag for “yes.”) Cleansing is required in one database file, depending on what the field will be mapped to within the CRM platform.

Intelligent Data Fields

Various positions of the data field imply additional information. CRM typically provides a separate field for the implied additional information. If there is a regular pattern to the coding, the separation can probably be done programmatically. If not, it may require manual conversion. The team will determine the solution based on the circumstances.

Encoded Data Fields

The data field in the current system contains a code to represent a full value, and SAP requires the full value or a different code to represent the same value. Example: A system that includes a one-character field named “Name Prefix”, where a code of “1” indicates “Mr.,” a code of “2” indicates “Miss,” and a code of “3” indicates “Mrs.” CRM wants the full value (“Mr.”, “Mrs.” or “Miss”), not the code. The full value can be generated programmatically from a look-up table. iSITE will propose a suggested solution.


A data field in the CRM system contains a value unique to the record. Example: The global customer ID number in VinSolutions changes if a customer record is merged with another record. This will hinder the ability to map iSITE-appended data back into the CRM and match it to a global customer ID number, creating duplicate and/or new records that will need to be reviewed manually, merged or deleted. Manual data cleansing will be required.

Field Lengths

The length of the data field in the appended data is longer than the corresponding field the CRM will support. Example: The current system has a Description field of 30 characters and the CRM has a Description field of 24 characters. The field can be unilaterally truncated or evaluated by a human and abbreviated to retain maximum readability. Manual data cleansing may be required.

Data Requiring Translation Tables

A valid field entry in the legacy data export isn’t utilized/modified in the iSITE data hygiene/append process. Establish the need for a translation table in the data cleansing procedures and describe its fields and valid entries. Data must be retained in the source file for re-upload to CRM.


Cleansing Process

These are the steps for the cleansing process:

  1. Attend a meeting to gain an understanding of iSITE data field requirements.

  2. Team up with iSITE to export legacy CRM system data. An Excel spreadsheet tool will be used to create the file to be extracted.

  3. Run corresponding CRM legacy system export report and download data to a .csv spreadsheet per previously defined data file requirements.

  4. Depending on the size and/or complexity of the data file, determine the data to be cleansed programmatically or manually as per the guidelines stated above.

  5. Correct records per the solutions suggested above. If necessary, consult with your iSITE and CRM technical support contacts.

  6. Report status to the deployment team per project plan and metrics sheet.