Data Architecture - City Districts
Jira item
Requirement
Copied from Jira item:
- District level information about cities are needed for pricing reasons so that more granular pricing can be derived and rates set up involving districts and not only city level. The pricing engine will use the district reference data for cities.
- District level information is needed for populating addresses for facilities so that it can be determined in what district a pick-up or drop off will occur.
Introduction
Two options are presented for the data architecture of District. The reason for this is, that the data domain for Geographic Defined Area already has the capability of capturing any type of defined area; being it districts, regions, cantons etc. However, this data structure is often used for Business Defined Areas, and are slightly more complex to use because of its extensibility.
It is important to note that prior to defining a data architecture a proper business analysis should have been carried out to identify how cities are actually subdivided in the real world. It is not known at the time of writing if such analysis has actually been done before stating the business requirement that cities must be able to have sub-cities (which are now referred to as Districts).
Solution
Domain model explanation
The domain “Geographic Defined Area” in the Maersk Domain Model has a data structure called “Defined Area”. This defined area can be subtypes in to a class called “District” and it can then have a name and a code.
The District class can then be related to the City in a one to many relationship so that it is possible to create any number of districts within a city.
Because a defined area can have relationships to other defined areas it will also be possible to create defined areas within districts, if such lower level granularity is ever needed. In other words, the defined area relationship structure allows for any number of levels in a hierarchy of geographic areas.
At the defined area level, the “Defined Area Type” class has a recursive relationship which will allow setting up the “fixed” hierarchy as reference data. For example to persist that a City is the parent type of District. This reference data can then be used to drive user interface implementation to allow only defined areas of type District to be created for a City.
It should also be noted that Defined Area inherits from Location which means it will have all the attributes and relationships available that the Location has. Among these are the Perimeter Points which allows for a Location to define it geographic boundary in terms of longitude and latitude points. This could become important for routing purposes for example to derive approximate distances between districts to define pricing. Another important data structure that would be inherited is the Location Relationship, which allows any subclass of Location to relate to other locations - including relations between Facilities and Cities and therefore also Facilities and Districts.
| Advantages | Disadvantages |
|---|---|
| Extensible without data tier and code changesInherits capabilities of Location, including perimeter and location relationshipsModeled in the Maersk Domain ModelProven via previous implementations | Medium complexity, slightly harder to comprehend than fixed data hierarchiesConsumer will have to iterate of filter array to get districts |
Domain model
The domain model for Geographic Defined Area can be found here.
Following additions needed to the domain model to support this option:
- Add District subtype to Geographic Defined Area Model
- Add relationship between Postal Address and District with (0..* to 0..1)
JSON sample
To understand how the JSON data could appear when using the District (inheriting from defined area) a sample has been produced below. Please note that this sample is only a mock and not a complete piece of data as there may be additional data to be accompanying the defined area data when requesting this via API or emitting it via an event.
The outset of the data sample is Paris as a City level with 2 of its 20 Arrondissements created as Districts. See this article for how Paris is divided in to subdivision.
{
"cityName": "Paris",
"cityCode": "FRPAR",
"districts": [
{
"districtCode": "FRPAR-LOU",
"districtName": "Louvre"
},
{
"districtCode": "FRPAR-BOU",
"districtName": "Bourse"
}
]
}
Data Coding
Each district must have a unique code assigned. This could be the city code concatenated with letters taken from the name of the district, e.g. if the the city code is FRPAR and the district name is Louvre the district code could be become FRPAR-LOU. Including a hyphen specifically to avoud confusing this with site codes which similar intelligence in the code but appends e.g. TRM to the city code to form FRPARTRM.
Data sourcing
District data may be able to be sourced from external providers (OpenStreetMap, Google Maps or other providers) to ease the burden on maintaining these, or inventing them. There is no single authoritative source of all administrative regions (typical name used for what Maersk refers to as District). It appears to be managed on a local city level within the country and decided by the city itself, e.g. the Arrondissements of Paris are set by the city council and have been redrawn over the years.
It should be recognised though, that some districts may be the creation of Maersk for Maersk purposes and may not be something that is mirrored in the real world.
Data migration
Data values, that are currently stored as so called sub-city, either in its own field or stored directly in the city field in existing solutions, must be migrated to the new district data structure.
There is a data migration exercise to be done in the enterprise master data hub, but also in all the consumers of the data that has local copies of the sub-city data.
It will be necessary to keep the existing city code assigned as an alternative code for the district. For example if the Louvre district today is created as city:
{
"cityName": "Louvre",
"cityCode": "FRLOU"
}
The migrated data would be become like this (not showing the fact that district is under Paris, but merely the data for the district itself)
{
"districtCode": "FRPAR-LOU",
"districtName": "Louvre",
"alternativeCodes": [
{
"alternativeCode": "FRLOU",
"alternativeCodeType": {
"alternativeCodeTypeCode": "LEGACY_SUB_CITY_CODE"
}
}
]
}
Integrations
Consumers of existing solution, for what is today called sub-city, will have to change their implementation to cater for the district data structure. Any existing version of APIs and events must be kept alive for a grace period.
Involved solutions
It is important to note that the consumers of this data may not be receiving the data directly from SMDS. For example the solution SSIB is using Synergy solution for master data and reference data. Synergy consumes data from SMDS via events, stores it in a Postgres database, and then publishes the data via APIs to http://Maersk.com solutions. It is important for the planning exercise that there is visibility to all the layers and teams that must be involved to have changes done. For the Synergy solution specifically, it will not be allowed to change, as this was a stop gap solution, which means consumers of data coming from there, must build their own consumption solution, merging data with what they get from Synergy or building something new that replaces their usage of Synergy.
Districts backward compatibility
It is required that the new solution is backward compatible, meaning that when the new data structures are in place and data has been migrated internally in the master data hub, the new district data structure can be converted to the old format and be emitted / queried according to existing implementation during grace period, in order to give existing consumers amble time to switch to a new version of API and/or events. This means that districts will have to be converted to cities and their legacy city code used.
API
The API must be specified in an OpenAPI specification and pass through the standard design review process before implementation can start. This section describes in high level the required capabilities of the API.
There are currently 2 versions live of the geography API:
When implementing the district solution a new version will be required and the old versions way of exposing data must be emitting the district data as cities, as described in https://maersk-tools.atlassian.net/wiki/spaces/GDASMDS/pages/183159784023/Data+Architecture+-+City+Districts#Districts-backward-compatibility.
Endpoints
Endpoint for getting districts:
GET /districts
This will return an array of districts, each having their code and name populated. When sending a request to this endpoint the response must be paginated. Typically only code and name will be returned for each resource, and the consumer must use the endpoint for getting the specific resource to get more details about it. Endpoint for getting all details about a specific district:
GET /districts/{districtCode}
This will return a single record of all details about the district, including code, name, is active, short name, abbreviated name, type, status etc. (attributes inherited from Defined Area can be included, e.g. alternative codes for the district)
The district can then be queried even further with endpoint like below to get all alternative codes for example:
GET /districts/{districtCode}/alternative-codes
Endpoint for getting all districts within a specific city:
GET /cities/{cityCode}/districts
Filtering
All get endpoints, except the ones to get a specific resource, can have additional filtering to for example filter by defined area name:
GET /districts?districtName=Louv*
This request will search for all districts that starts with the letters Louv. Note that the wildcard character has to be implemented in such a way that the code does match exact on name, but checks for wildcard character and applies correct matching logic.
Event
The current event which is publishing a combined geography data of all geography merged is:
MSK.geography.gda.topic.internal.any.v3
It should be revisited whether it is optimal to combine all geography data in such a manner. It puts a lot of forced logic at the consumer side to know how to split this data in to the relevant geographic components.
A solution could be to keep the v3 publishing as per todays logic, back porting the district data in to the city level as described above under https://maersk-tools.atlassian.net/wiki/spaces/GDASMDS/pages/183159784023/Data+Architecture+-+City+Districts#Districts-backward-compatibility.
The new topic should be emitting city data specifically with the their districts, and/or it might be needed to also have all districts emitted with a foreign key to their city:
MSK.geography.city.topic.internal.any.v1 MSK.geography.district.topic.internal.any.v1
Longer term this would also mean that the other geography entities should have their own topics, e.g.
MSK.geography.difinedarea.topic.internal.any.v1 MSK.geography.country.topic.internal.any.v1 MSK.geography.subdivision.topic.internal.any.v1
Note: The latter topics are not part of the city district solution.
Impact on other domains
Several domains are impacted by the implementation of district master data. These are:
- Customer
- Vendor
- Facility
Data migration
The district is currently a free text attribute on the facility, customer and vendor postal address object. Existing data must be migrated to become a district following the data structures outlined above. Meaning the district must be looked up in the district master data, and if found, populated as a distrcit object on the postal address.
Where sub-city (previous name used for district) data value has been populated to a city data field, this must migrated so that the actual relevant city for data is set back. The sub-city data value itself, should be looked up in the district data, and if it is found, the district must then be populated in the correct structure.
Data maintenance
When creating or updating postal addresses for a facility, customer or vendor, the district must be sourced from the district data set in the Geography domain.