Search Duplicate Customers (Match API)
Before creating a customer, the user has to perform a search to find whether similar record exists in the systems or not. If any such results are found, the system will display to the user to use the same one. If no record is found or user still wants to create another record, then they may proceed. Duplicate search should also happen while editing certain fields of the Customer.
How it works today (Post CMD Modernization) :
Elastic Search will be used in modernization to find the duplicate customer records, instead of Informatica Match Service. We’ll be hosting Elastic in Azure, and maintain the Customer data in the JSON Object format so that it can be used for indexing and searching.
Data :
- There will be an initial data load (IDL) from on-premise production to cloud tables. We’ll load the data in Elastic at the same time.
- With every subsequent create or update of Customer record, an entry will be made into Elastic index of Customer.
Services :
- One service for Indexing (insert or update) the Customer data into Elastic (Index API)
- One service for finding the duplicate entries based on filter criteria (Match API)
Ingest Service will store the Customer data into Relational schema of PostgreSQL, and give a call to Transformation service.
Transformation service will be responsible for inserting data into Hybrid schema of PostgreSQL (which is used for Search & Retrieve) , and also to Index (insert/update) entries into Elastic.
Match API will directly consume the data from Elastic.
Temporary Environment :
Below are the details of a temporary environment created for the purpose of POC and testing our wrapper services :
Elasticsearch URL : https://20.4.1.212:9200/
Kibana URL : http://20.4.1.212:5601/app/home#/
Username: elastic
Password: xBijp7ROOC1A4xRVXVvA
Transformation API <Index Customer records in Elastic>
Transformation API should store the same Customer JSON object in both Hybrid table and Elastic.
An indexing request should be sent to Elastic from Spring Boot based application.
Match API
API contract definition (OAS) : CUSTOMER SEARCH OAS
Acceptance criteria:
- Duplicate match will always be within country and not cross country
- The duplicate checks should be performed against all the customer status including Inactive
- duplicate check should be done including customers which are in workflow
Match Rules :
Match Rules should be replicated the way they are presently configured and depicted in above table. For reference :
- Exact Country + Exact Tax Reference (without Tax Type)
- Exact Country + Exact Reference Number + Exact Reference Type
- Fuzzy Trading Name + Exact Country + Exact Phone Number
- Fuzzy trading name + Exact Country + Fuzzy City Name. (Currently Disabled)
- Fuzzy Address 1,2,3 (PO Box, ApartmentOrFloor, StreetName, StreetNumber, Subarea) + Fuzzy City + Fuzzy Trading Name + Exact Country (83% Match score is required)
Also note :
Value to null or null to null or null to value will be considered as match.
Technical Briefs :
Spring Boot provides support for creating wrapper services on top of Elastic.
Match rules can be sent as part of the Search Request to Elastic itself. For example, below request shows how we can customize the request to add filter attributes and match rules at the same time in the request :
The rules should be made easily configurable in the service and not tightly coupled / hard-coded.
Integrations of various APIs with Elastic & Match :
Acceptance criteria:
- All potential customer found in above match category will carry workflow for ADS to review.
- If a duplicate is entry found in the database is inactive then user should be given option to recreate / or proceed with own entered details (as is process)
- In any case the system should allow user to proceed with own entered details (as is process)
- If user selects to still proceed despite the duplicate check warning, then a workflow should be triggered to the Area Data Steward (as is process)
- The workflow should reflect the duplicate check logic that is used for prompting the same eg. Duplicate marked due to TAX match found. Or Duplicate marked due to EORI reference found in existing database (as is process)
- Duplicate check should be done including customers which are in workflow. (as is process)
- With each insert/update of Customer (only Ingest endpoint), there should be a call to insert/update record in Elastic too
- With each approval or rejection of workflow in Camunda, there should be a call to insert/update/delete the record in Elastic too
- From EMP Consumer of Customer on cloud, if we are consuming Customer feeds from other systems or from our on-premise feeds, then there should be a call to insert/update/delete the record in Elastic too
- CMD Portal should call Match API to find duplicates. If the duplicates are found and user still proceeds with record creation, CMD Portal should set the Workflow Indicator to “Duplicate”, while calling the Ingest Customer service. This will indicate the service that this record should land in workflow.
- Bulk upload should call Match API to find duplicates. If the duplicates are found and user still proceeds with record creation, CMD Portal should set the Workflow Indicator to “Duplicate”, while calling the Ingest Customer service. This will indicate the service that this record should land in workflow.
- Ingest Customer service should give a call to Match API to find duplicates, and if duplicates are found, then a workflow should be triggered. Only if the request is coming from CMD Portal, this check should be skipped as the check is present on CMD Portal already. For other systems, we’ll follow ingestion on “no-trust” basis i.e. we’ll always consider that Duplicate Search or Validation services were not used by the source system thus we’ll go through it before ingestion.
Below Epic is under discussion to further decide on few aspects of Match API.
MST6-2468 -CMD duplicate logic enhancement In Progress
How it used to work pre-CMD Modernization (Until Apr-2023) :
Duplicates are searched using specific combination of attributes in either Fuzzy or Exact way as mentioned below. These rules are executed one by one in the order mentioned below and results are returned based on matches found for each.
- Exact Country + Exact Tax Reference (without Tax Type)
- Exact Country + Exact Reference Number + Exact Reference Type
- Fuzzy Trading Name + Exact Country + Exact Phone Number
- Fuzzy trading name + Exact Country + Fuzzy City Name.
- Fuzzy Address 1,2,3 (PO Box, House Number, Street, Building No, Subarea) + Fuzzy City + Fuzzy State + Fuzzy Trading Name + Exact Country
Below table provide details about match rules used in current system. Provides details like match type, match columns and Database column names and fields used in match rules.
| Match Type | Rule Description | Match Columns | Selected Columns | Match Type | DB Col Name | Existing Fields in SearchDuplicateCustomer |
|---|---|---|---|---|---|---|
| Exact | Exact Country + Tax reference (without TAX type) | CTRYTax_Num | CountryTax Number | E | CTRY_ROWIDTAX_NUM | ISOCountryCodeTaxNo |
| Exact | Exact Country + Reference Number + Reference Type | CTRYEX_REF_NUM | CountryReference NumberReference Type Code | E | CTRY_ROWIDREF_NUMREF_TYPE_CD | ISOCountryCodeReferenceTypeNumberReferenceTypeCode |
| Fuzzy | Fuzzy trading name + Exact Country + Phone Number + Fuzzy city | Attribute1CTRYEX_PH_NUM_CUSTOrganization_Name | City/TownCountryPhone NumberParty Role Name | FEEF | CITYCTRY_ROWIDPH_NUMPARTY_RL_NM | CityISOCountryCodeNumberCustomerTradingName |
| Fuzzy | Fuzzy Trading name + Exact country + Fuzzy City | Trading Name, Iso Country Code, City Name | Trading NameCountryCity | FEF | CTRY_ROWIDCITY_ROWID | ISOCountryCodeCustomerTradingNameCityName |
| Fuzzy | Fuzzy Address Line 1,2,3 | Address_Part1 | PO BoxHouse NumberStreetAddress Line2Address Line3 | F | PO_BOXHOUSE_NUMSTREET (StreetName)ADD_LN_2 (Building)ADD_LN_3 (Suburb) | PoBoxHouseNoStreetNameBuildingSuburb |
| Fuzzy City | Attribute1 | City/Town | F | CITY | City | |
| Exact Country | CTRY | Country | E | CTRY_ROWID | ISOCountryCode | |
| Fuzzy Organization Name | Organization_Name | Party Role Name | F | PARTY_RL_NM | CustomerTradingName | |
| Fuzzy Postal Code | Postal_Area | Postal Code | F | PSTCD | PostalCode | |
| With Match Score > 83 |