Combining Taxonomies
Organizations often use Taxonomies in an attempt to improve service findability for their target audience. As such, when we have a project with the goal of gathering information from multiple organizations, cleansing and de-duplicating such data, and providing a viable source of information to better serve the target audience, properly combining organizations’ varying taxonomies becomes an important task.
Combining taxonomies serves two main purposes:
1) To facilitate search, so that a keyword search for a piece of information will bring up all instances of such information available from various organizations, regardless of how each organization has classified that piece of information.
2) To enable augmenting sparse or missing data from one organization by data from another organization.
There are two approaches to combining taxonomies: mapping or merging. Here is a brief description of how each approach works, along with the benefits and challenges of each.
Mapping taxonomies
In this approach, a single, often comprehensive taxonomy is selected as the target reference, and each organization’s custom taxonomy will be mapped into the vocabulary of the target taxonomy.
Issue #1: How to decide on the target reference model?
The two main factors to consider for choosing a target reference taxonomy are size and complexity: a larger reference model is more likely to be comprehensive in including terms from various taxonomies. This means more options to choose from when mapping, and potentially a more accurate mapping. However, comprehensiveness may come at the cost of additional complexity, specially for target models with nested structure: e.g., if a target taxonomy has 3 levels of nested vocabulary, the possibility that all 3 levels could be mapped to the nested levels of an organization taxonomy is slim.
Issue #2: How to resolve mapping conflicts?
Since each taxonomy belongs to a team who has put a lot of thought into their taxonomy, mapping taxonomies across organizations needs some careful consideration. Data managers may not be willing to let go of certain terms. To avoid ambiguities and less-than-ideal mapping results, input from data managers and SMEs is required to review and confirm the proposed mapping.
Issue #3: Unmatched terms can not be utilized
It must be noted that mapping taxonomies is often helpful in situations when narrower vocabularies are mapped to broader vocabularies (e.g., “financial” and “financial counselling” to “Money”). If there are terms in the organization’s taxonomy that can not be mapped to a term in the target taxonomy, we will lose such classifications. In other words, chances are that some level of accuracy will be lost during the mapping process.
Approach 2: Merging taxonomies
Another approach for reconciling multiple taxonomies is merging taxonomies. Merging combines two or more redundant vocabularies in the same area into one, eliminating duplicate terms. The end result is a new and improved taxonomy, taking the best of both of the legacy taxonomies. In this approach, for each organization that is on boarded, the master taxonomy will be augmented with new terms in the new organization taxonomy that did not exist in the master taxonomy before.
Issue #1: Scalability
Every new organization on boarding will need merging new custom taxonomy terms into an existing one. This could potentially lead to an oversized master vocabulary, which might lead to performance issues.
Issue #2: Misclassification
With merging taxonomies, identifying and removing duplicates needs careful consideration, otherwise it is possible that a single term is classified with different terms by different organizations. This may not lead to findability issues, but can potentially lead to problems with faceted Navigation.
Issue #3: Faceted Navigation
Upon onboarding new organizations into the system, their custom taxonomy need to be merged into the master taxonomy. The resulting taxonomy will be closer to a “folksonomy” (free form tags) in structure, as opposed to a controlled vocabulary. One issue with this scheme is that with free from vocabularies, defining facets to enable faceted navigation would not be feasible.
Conclusion
Considering the pros and cons of each of the two approaches above, it is clear that neither approach provides a perfect solution for the task at hand. Each case needs to be investigated and evaluated carefully with data managers and SMEs before embarking on a solution. Here are some final considerations for deciding on the best approach for combining taxonomies:
Faceted navigation capability is a useful functionality for search-heavy systems. Faceted systems work best for application that are used by such a wide range of users that no one tree is going to match everyone’s way of thinking. They are also easier to maintain than trees because adding a new item requires only filling in the information about the facets, rather than having to make a decision about exactly which category it should go into.
Regardless of whether to adopt a tree or faceted structure, with mapping we will lose some of the detailed categorization that organizations currently have. One way to mitigate this issue to some level is to merge taxonomies first, and then map each organization to the resulting merged taxonomy. While this will make losses minimal, it is not a process that can be repeated every time a new organization is onboarded.