Publish & PG-Elastic Sync Issues & Resolutions
This page will help on addressing any issues related to below 2 categories
- Publishing Customer, Contact, Concern, Customer Facility entities to downstream systems
- Sync from main database (Postgres) to Elastic database
Publish issues and resolution
Below table shows list issues reported till now in the PROD environment related to publishing. The issues are categorised (Error Category) and the resolutions are provided in Resolution column in the below table.
The issues with Code fix resolution, needs to be fixed by technical team to minimise the future issues.
The issues with manual innervation needs to be fixed by our team. These issues also we need to see how we can automate them to reduce manual intervention dependency.
Any new error category reported in the table producer_sync_track needs to be documented here for further analysis.
| Sl. No | Count | Error Category | Resolution | Sample Error Message |
|---|---|---|---|---|
| 1 | 212 | Publish failed for this record | Manually reprocess parent | Publish failed for this record as parent Customer has not been published yet. It will be picked by scheduler for re-publishing after some time |
| 2 | 4923 | Publish to EMP topic failed | Manually check for code existance | Publish to EMP topic failed, could not find the customer with customer code: IN03804724 |
| 3 | 8954 | geoCityCode | Manually check geocode | geoCityCode : Exception. Customer published, but without geoCityCode |
| 4 | 4 | Exception occurred while publishing | Code fix | Exception occurred while publishing customer GB02564328 :: 500 Internal Server Error from POST https://cmdprodpublishcustomerapi.trafficmanager.net/global-mdm/customer/publish?customerCode=GB02564328&eventDetails=CREATE |
| 5 | 0 | Index 0 out of bounds for length 0 | Code fix | Index 0 out of bounds for length 0 |
| 6 | 118 | com.maersk.smds.cmd.exceptions | Manual verification | com.maersk.smds.cmd.exceptions.BadRequestException: Customer with code IN03805825 not found |
| 7 | 1001 | Not able to call the (customer/contact/concern) retrieve endpoint | Code fix | Not able to call the customer retrieve endpointAn unexpected error occurred while trying to retrieve customer with code CN06492766 |
| 8 | 0 | 500 Internal Server Error | Code fix | 500 Internal Server Error from POST https://cmdsitproduceconcernapi.trafficmanager.net/global-mdm/concerns/bulk/publish |
| 9 | 96 | Error serializing Avro message | Code fix | Error serializing Avro message |
| 10 | 12 | Failed to construct kafka producer | Code fix | Failed to construct kafka producer |
Below are the queries can be used to look into the issues tracked in the table and categorise them as mentioned in above table.
select * from mdm_smdsmd.producer_sync_track where processed='N' and error_msg not like 'Publish to EMP topic failed%' and error_msg not like 'Publish failed for this record%' and error_msg not like 'Not able to call the %' and error_msg not like 'geoCityCode%' and error_msg not like '%Exception occurred while publishing %' and error_msg not like 'Index 0 out of bounds for length 0%' and error_msg not like 'com.maersk.smds.cmd.exceptions%' and error_msg not like '500 Internal Server Error%' and error_msg not like 'Error serializing Avro message%' and error_msg not like 'Failed to construct kafka producer%';
select count(*) from mdm_smdsmd.producer_sync_track where processed='N' and error_msg like 'Publish failed for this record%' ;
Steps to follow:
- Execute below query to see any new issues are tracked
select * from mdm_smdsmd.producer_sync_track where processed='N' and create_time > CURRENT_DATE-10
- Categorise the errors into various categories based on above table
- If there is any new category, then create new category with resolution
- If the issue needs to be fixed in code, assign it to Dev team to fix in the code
- If the issue needs to be fixed manually, then perform the manual activity and see how we can automate it
PG-Elastic sync issues and resolution
Below table shows list issues reported till now in the PROD environment related to auto sync from PG to Elastic DB. The issues are categorised (Error Category) and the resolutions are provided in Resolution column in the below table.
The issues with Code fix resolution, needs to be fixed by technical team to minimise the future issues.
The issues with manual innervation needs to be fixed by our team. These issues also we need to see how we can automate them to reduce manual intervention dependency.
Any new error category reported in the table pg_elk_sync_trck needs to be documented here for further analysis.
| Sl. No | Count | Error Category | Entity | Resolution | Sample Error Message |
|---|---|---|---|---|---|
| 1 | 1 | timeout on connection | FCLTY | Code fix | 10,000 milliseconds timeout on connection http-outgoing-40 [ACTIVE] |
| 2 | 2 | Facility information is not available | FCLTY | Manual Verification | Facility information is not available in write db |
| 3 | 3542 | Customer information is not available | CUST | Manual Verification | Customer information is not available in write database |
| 4 | 2 | timeout on connection | CUST | Code fix | 10,000 milliseconds timeout on connection http-outgoing-602 [ACTIVE] |
| 5 | 2 | Contact information is not available | CONT | Manual Verification | Contact information is not available in write db |
| 6 | 1 | Elasticsearch exception | CONT | Code fix | Elasticsearch exception [type=mapper_parsing_exception, reason=The number of nested documents has exceeded the allowed limit of [10000]. This limit can be set by changing the [index.mapping.nested_objects.limit] index level setting.] |
| 7 | 1 | timeout on connection | CONT | Code fix | 10,000 milliseconds timeout on connection http-outgoing-1551 [ACTIVE] |
| 8 | 507 | Concern information is not available | CNCRN | Manual Verification | Concern information is not available in write database |
Below are the queries can be used to look into the issues tracked in the table and categorise them as mentioned in above table.
select error_msg,entity_type, count(error_msg) from mdm_smdsmd.pg_elk_sync_trck where processed='N' group by error_msg,entity_type
Steps to follow:
- Execute below query to see any new issues are tracked
select * from mdm_smdsmd.pg_elk_sync_trck where processed='N' and create_time > CURRENT_DATE-10
- Categorise the errors into various categories based on above table
- If there is any new category, then create new category with resolution
- If the issue needs to be fixed in code, assign it to Dev team to fix in the code
- If the issue needs to be fixed manually, then perform the manual activity and see how we can automate it