2022 Preprint
Prediction and Curation of Missing Biomedical Identifier Mappings with Biomappings
Abstract: Biomedical identifier resources (ontologies, taxonomies, controlled vocabularies) commonly overlap in scope and contain equivalent entries under different identifiers. Maintaining mappings for these relationships is crucial for interoperability and the integration of data and knowledge. However, there are substantial gaps in available mappings motivating their semi-automated curation. Biomappings implements a curation cycle workflow for missing mappings which combines automated prediction with human-in-the-loo…
View published versions
Search citation statements
Paper Sections
Select...
1
1
1
1
Citation Types
0
10
0
0
Year Published
2025
2025
Publication Types
Select...
1
1
Relationship
1
1
Authors
Journals
Cited by 2 publications
(10 citation statements)
References 47 publications
0
10
0
0
“…SeMRA’s reasoning approaches are inherently limited by the availability of mappings from the set of sources we integrated and the quality of such mappings. To mitigate this, the modular nature of SeMRA allows additional sources to be readily added by curating new prefixes and their associated ontology artifacts in the Bioregistry ( Hoyt et al 2022 ), by implementing additional database plugins in PyOBO ( Hoyt et al 2025 ), or by directly implementing custom processors in SeMRA. However, more generally, the limitations of existing mapping sources highlight the need to incorporate semi-automated mapping curation workflows like Biomappings ( Hoyt et al 2023 ); automated approaches like Logmap ( Jiménez-Ruiz and Cuenca Grau 2011) , LOOM ( Ghazvinian et al 2009 ), and K-Boom ( Mungall et al 2019 ); and LLM-based automated review workflows like MapperGPT ( Matentzoglu et al 2023 ).…”
Section: Discussionmentioning
confidence: 99%
“…SeMRA’s reasoning approaches are inherently limited by the availability of mappings from the set of sources we integrated and the quality of such mappings. To mitigate this, the modular nature of SeMRA allows additional sources to be readily added by curating new prefixes and their associated ontology artifacts in the Bioregistry ( Hoyt et al 2022 ), by implementing additional database plugins in PyOBO ( Hoyt et al 2025 ), or by directly implementing custom processors in SeMRA. However, more generally, the limitations of existing mapping sources highlight the need to incorporate semi-automated mapping curation workflows like Biomappings ( Hoyt et al 2023 ); automated approaches like Logmap ( Jiménez-Ruiz and Cuenca Grau 2011) , LOOM ( Ghazvinian et al 2009 ), and K-Boom ( Mungall et al 2019 ); and LLM-based automated review workflows like MapperGPT ( Matentzoglu et al 2023 ).…”
Section: Discussionmentioning
confidence: 99%
“…Where possible, SeMRA wraps preexisting parsers for standard representations. For instance, SeMRA reads mappings from ontologies in OBO format by wrapping the PyOBO Python package ( Hoyt et al 2025 ). Similarly, SeMRA reads mappings from ontologies in the OWL and OBO Graph JSON formats using the Bioontologies Python package ( Hoyt and Gyori 2025 ).…”
Section: Methodsmentioning
confidence: 99%
“…In addition, we provide detailed analysis and metrics of using SeMRA on specific areas of biomedicine crucial for data integration. First, we demonstrate integrating cell and cell line resources using SeMRA, significantly expanding on results from [28]. Second, we integrate resources cataloging diseases and show substantial expansion in scope compared to results published by [21].…”
Section: Contributionmentioning
confidence: 91%
“…Semantic mappings are often made available by individual ("primary") resources, for example, most Open Biological and Biomedical Ontologies (OBO) [31] provide cross-references to equivalent or related entries in overlapping ontologies. In addition, "secondary" mappings resources including aggregators such as BridgeDB [50], TogoID [30], or independent mapping repositories like Biomappings [28] provide mappings that are collected from multiple sources or extend upon what is available from individual resources using additional curation. However, mappings remain difficult to assemble at scale because of the variety of ad hoc storage formats they are made available in, the ways they are produced (e.g., manual curation, rule-based inference, lexical matching), and the availability of metadata (e.g., precise mapping relations, curator confidence).…”
Section: Problem Statementmentioning
confidence: 99%
