Refine
Keywords
While humans find it easy to process visual information from the real world, machines struggle with this task due to the unstructured and complex nature of the information. Computer vision (CV) is the approach of artificial intelligence that attempts to automatically analyze, interpret, and extract such information. Recent CV approaches mainly use deep learning (DL) due to its very high accuracy. DL extracts useful features from unstructured images in a training dataset to use them for specific real-world tasks. However, DL requires a large number of parameters, computational power, and meaningful training data, which can be noisy, sparse, and incomplete for specific domains. Furthermore, DL tends to learn correlations from the training data that do not occur in reality, making DNNs poorly generalizable and error-prone.
Therefore, the field of visual transfer learning is seeking methods that are less dependent on training data and are thus more applicable in the constantly changing world. One idea is to enrich DL with prior knowledge. Knowledge graphs (KG) serve as a powerful tool for this purpose because they can formalize and organize prior knowledge based on an underlying ontological schema. They contain symbolic operations such as logic, rules, and reasoning, and can be created, adapted, and interpreted by domain experts. Due to the abstraction potential of symbols, KGs provide good prerequisites for generalizing their knowledge. To take advantage of the generalization properties of KG and the ability of DL to learn from large-scale unstructured data, attempts have long been made to combine explicit graph and implicit vector representations. However, with the recent development of knowledge graph embedding methods, where a graph is transferred into a vector space, new perspectives for a combination in vector space are opening up.
In this work, we attempt to combine prior knowledge from a KG with DL to improve visual transfer learning using the following steps: First, we explore the potential benefits of using prior knowledge encoded in a KG for DL-based visual transfer learning. Second, we investigate approaches that already combine KG and DL and create a categorization based on their general idea of knowledge integration. Third, we propose a novel method for the specific category of using the knowledge graph as a trainer, where a DNN is trained to adapt to a representation given by prior knowledge of a KG. Fourth, we extend the proposed method by extracting relevant context in the form of a subgraph of the KG to investigate the relationship between prior knowledge and performance on a specific CV task. In summary, this work provides deep insights into the combination of KG and DL, with the goal of making DL approaches more generalizable, more efficient, and more interpretable through prior knowledge.