Skip to main content

Search Duplicate Customers (Match API)

Before creating a customer, the user has to perform a search to find whether similar record exists in the systems or not. If any such results are found, the system will display to the user to use the same one. If no record is found or user still wants to create another record, then they may proceed. Duplicate search should also happen while editing certain fields of the Customer.

How it works today (Post CMD Modernization) :

Elastic Search will be used in modernization to find the duplicate customer records, instead of Informatica Match Service. We’ll be hosting Elastic in Azure, and maintain the Customer data in the JSON Object format so that it can be used for indexing and searching.

Data :

  1. There will be an initial data load (IDL) from on-premise production to cloud tables. We’ll load the data in Elastic at the same time.
  2. With every subsequent create or update of Customer record, an entry will be made into Elastic index of Customer.

Services :

  1. One service for Indexing (insert or update) the Customer data into Elastic (Index API)
  2. One service for finding the duplicate entries based on filter criteria (Match API)

Ingest Service will store the Customer data into Relational schema of PostgreSQL, and give a call to Transformation service.

Transformation service will be responsible for inserting data into Hybrid schema of PostgreSQL (which is used for Search & Retrieve) , and also to Index (insert/update) entries into Elastic.

Match API will directly consume the data from Elastic.

Temporary Environment :

Below are the details of a temporary environment created for the purpose of POC and testing our wrapper services :

Elasticsearch URL : https://20.4.1.212:9200/

Kibana URL : http://20.4.1.212:5601/app/home#/

Username: elastic

Password: xBijp7ROOC1A4xRVXVvA

Transformation API <Index Customer records in Elastic>

Transformation API should store the same Customer JSON object in both Hybrid table and Elastic.

An indexing request should be sent to Elastic from Spring Boot based application.

Match API

API contract definition (OAS) : CUSTOMER SEARCH OAS

Acceptance criteria:

  • Duplicate match will always be within country and not cross country
  • The duplicate checks should be performed against all the customer status including Inactive
  • duplicate check should be done including customers which are in workflow

Match Rules :

Match Rules should be replicated the way they are presently configured and depicted in above table. For reference :

  1. Exact Country + Exact Tax Reference (without Tax Type)
  2. Exact Country + Exact Reference Number + Exact Reference Type
  3. Fuzzy Trading Name + Exact Country + Exact Phone Number
  4. Fuzzy trading name + Exact Country + Fuzzy City Name. (Currently Disabled)
  5. Fuzzy Address 1,2,3 (PO Box, ApartmentOrFloor, StreetName, StreetNumber, Subarea) + Fuzzy City + Fuzzy Trading Name + Exact Country (83% Match score is required)

Also note :

Value to null or null to null or null to value will be considered as match.

Technical Briefs :

Spring Boot provides support for creating wrapper services on top of Elastic.

Match rules can be sent as part of the Search Request to Elastic itself. For example, below request shows how we can customize the request to add filter attributes and match rules at the same time in the request :

The rules should be made easily configurable in the service and not tightly coupled / hard-coded.

Integrations of various APIs with Elastic & Match :

Acceptance criteria:

  • All potential customer found in above match category will carry workflow for ADS to review.
  • If a duplicate is entry found in the database is inactive then user should be given option to recreate / or proceed with own entered details (as is process)
  • In any case the system should allow user to proceed with own entered details (as is process)
  • If user selects to still proceed despite the duplicate check warning, then a workflow should be triggered to the Area Data Steward (as is process)
  • The workflow should reflect the duplicate check logic that is used for prompting the same eg. Duplicate marked due to TAX match found. Or Duplicate marked due to EORI reference found in existing database (as is process)
  • Duplicate check should be done including customers which are in workflow. (as is process)
  1. With each insert/update of Customer (only Ingest endpoint), there should be a call to insert/update record in Elastic too
  2. With each approval or rejection of workflow in Camunda, there should be a call to insert/update/delete the record in Elastic too
  3. From EMP Consumer of Customer on cloud, if we are consuming Customer feeds from other systems or from our on-premise feeds, then there should be a call to insert/update/delete the record in Elastic too
  4. CMD Portal should call Match API to find duplicates. If the duplicates are found and user still proceeds with record creation, CMD Portal should set the Workflow Indicator to “Duplicate”, while calling the Ingest Customer service. This will indicate the service that this record should land in workflow.
  5. Bulk upload should call Match API to find duplicates. If the duplicates are found and user still proceeds with record creation, CMD Portal should set the Workflow Indicator to “Duplicate”, while calling the Ingest Customer service. This will indicate the service that this record should land in workflow.
  6. Ingest Customer service should give a call to Match API to find duplicates, and if duplicates are found, then a workflow should be triggered. Only if the request is coming from CMD Portal, this check should be skipped as the check is present on CMD Portal already. For other systems, we’ll follow ingestion on “no-trust” basis i.e. we’ll always consider that Duplicate Search or Validation services were not used by the source system thus we’ll go through it before ingestion.

Below Epic is under discussion to further decide on few aspects of Match API.

MST6-2468 -CMD duplicate logic enhancement In Progress

How it used to work pre-CMD Modernization (Until Apr-2023) :

Duplicates are searched using specific combination of attributes in either Fuzzy or Exact way as mentioned below. These rules are executed one by one in the order mentioned below and results are returned based on matches found for each.

  1. Exact Country + Exact Tax Reference (without Tax Type)
  2. Exact Country + Exact Reference Number + Exact Reference Type
  3. Fuzzy Trading Name + Exact Country + Exact Phone Number
  4. Fuzzy trading name + Exact Country + Fuzzy City Name.
  5. Fuzzy Address 1,2,3 (PO Box, House Number, Street, Building No, Subarea) + Fuzzy City + Fuzzy State + Fuzzy Trading Name + Exact Country

Below table provide details about match rules used in current system. Provides details like match type, match columns and Database column names and fields used in match rules.

Match TypeRule DescriptionMatch ColumnsSelected ColumnsMatch TypeDB Col NameExisting Fields in SearchDuplicateCustomer
ExactExact Country + Tax reference (without TAX type)CTRYTax_NumCountryTax NumberECTRY_ROWIDTAX_NUMISOCountryCodeTaxNo
ExactExact Country + Reference Number + Reference TypeCTRYEX_REF_NUMCountryReference NumberReference Type CodeECTRY_ROWIDREF_NUMREF_TYPE_CDISOCountryCodeReferenceTypeNumberReferenceTypeCode
FuzzyFuzzy trading name + Exact Country + Phone Number + Fuzzy cityAttribute1CTRYEX_PH_NUM_CUSTOrganization_NameCity/TownCountryPhone NumberParty Role NameFEEFCITYCTRY_ROWIDPH_NUMPARTY_RL_NMCityISOCountryCodeNumberCustomerTradingName
FuzzyFuzzy Trading name + Exact country + Fuzzy CityTrading Name, Iso Country Code, City NameTrading NameCountryCityFEFCTRY_ROWIDCITY_ROWIDISOCountryCodeCustomerTradingNameCityName
FuzzyFuzzy Address Line 1,2,3Address_Part1PO BoxHouse NumberStreetAddress Line2Address Line3FPO_BOXHOUSE_NUMSTREET (StreetName)ADD_LN_2 (Building)ADD_LN_3 (Suburb)PoBoxHouseNoStreetNameBuildingSuburb
Fuzzy CityAttribute1City/TownFCITYCity
Exact CountryCTRYCountryECTRY_ROWIDISOCountryCode
Fuzzy Organization NameOrganization_NameParty Role NameFPARTY_RL_NMCustomerTradingName
Fuzzy Postal CodePostal_AreaPostal CodeFPSTCDPostalCode
With Match Score > 83
Was this page helpful?