004 Datenverarbeitung; Informatik
Refine
Year of publication
- 2017 (3) (remove)
Keywords
- (general) boustrophedon (returning) finite automata (1)
- (general) jumping finite automata (1)
- (regular : regular) array grammars (1)
- (regulär: regulär) Array-Grammati (1)
- Disambiguierung von Personennamen (1)
- Elektronische Bibliothek (1)
- Eulerian trails (1)
- Graphen mit Eulerschen Pfaden (1)
- Immunoglobulin (1)
- Immunsystem (1)
Institute
- Informatik (2)
- Fachbereich 1 (1)
With the advent of highthroughput sequencing (HTS), profiling immunoglobulin (IG) repertoires has become an essential part of immunological research. The dissection of IG repertoires promises to transform our understanding of the adaptive immune system dynamics. Advances in sequencing technology now also allow the use of the Ion Torrent Personal Genome Machine (PGM) to cover the full length of IG mRNA transcripts. The applications of this benchtop scale HTS platform range from identification of new therapeutic antibodies to the deconvolution of malignant B cell tumors. In the context of this thesis, the usability of the PGM is assessed to investigate the IG heavy chain (IGH) repertoires of animal models. First, an innovate bioinformatics approach is presented to identify antigendriven IGH sequences from bulk sequenced bone marrow samples of transgenic humanized rats, expressing a human IG repertoire (OmniRatTM). We show, that these rats mount a convergent IGH CDR3 response towards measles virus hemagglutinin protein and tetanus toxoid, with high similarity to human counterparts. In the future, databases could contain all IGH CDR3 sequences with known specificity to mine IG repertoire datasets for past antigen exposures, ultimately reconstructing the immunological history of an individual. Second, a unique molecular identifier (UID) based HTS approach and network property analysis is used to characterize the CLLlike CD5+ B cell expansion of A20BKO mice overexpressing a natural short splice variant of the CYLD gene (A20BKOsCYLDBOE). We could determine, that in these mice, overexpression of sCYLD leads to unmutated subvariant of CLL (UCLL). Furthermore, we found that this short splice variant is also seen in human CLL patients highlighting it as important target for future investigations. Third, the UID based HTS approach is improved by adapting it to the PGM sequencing technology and applying a custommade data processing pipeline including the ImMunoGeneTics (IMGT) database error detection. Like this, we were able to obtain correct IGH sequences with over 99.5% confidence and correct CDR3 sequences with over 99.9% confidence. Taken together, the results, protocols and sample processing strategies described in this thesis will improve the usability of animal models and the Ion Torrent PGM HTS platform in the field if IG repertoire research.
Digital libraries have become a central aspect of our live. They provide us with an immediate access to an amount of data which has been unthinkable in the past. Support of computers and the ability to aggregate data from different libraries enables small projects to maintain large digital collections on various topics. A central aspect of digital libraries is the metadata -- the information that describes the objects in the collection. Metadata are digital and can be processed and studied automatically. In recent years, several studies considered different aspects of metadata. Many studies focus on finding defects in the data. Specifically, locating errors related to the handling of personal names has drawn attention. In most cases the studies concentrate on the most recent metadata of a collection. For example, they look for errors in the collection at day X. This is a reasonable approach for many applications. However, to answer questions such as when the errors were added to the collection we need to consider the history of the metadata itself. In this work, we study how the history of metadata can be used to improve the understanding of a digital library. To this goal, we consider how digital libraries handle and store their metadata. Based in this information we develop a taxonomy to describe available historical data which means data on how the metadata records changed over time. We develop a system that identifies changes to metadata over time and groups them in semantically related blocks. We found that historical meta data is often unavailable. However, we were able to apply our system on a set of large real-world collections. A central part of this work is the identification and analysis of changes to metadata which corrected a defect in the collection. These corrections are the accumulated effort to ensure data quality of a digital library. In this work, we present a system that automatically extracts corrections of defects from the set of all modifications. We present test collections containing more than 100,000 test cases which we created by extracting defects and their corrections from DBLP. This collections can be used to evaluate automatic approaches for error detection. Furthermore, we use these collections to study properties of defects. We will concentrate on defects related to the person name problem. We show that many defects occur in situations where very little context information is available. This has major implications for automatic defect detection. We also show that properties of defects depend on the digital library in which they occur. We also discuss briefly how corrected defects can be used to detect hidden or future defects. Besides the study of defects, we show that historical metadata can be used to study the development of a digital library over time. In this work, we present different studies as example how historical metadata can be used. At first we describe the development of the DBLP collection over a period of 15 years. Specifically, we study how the coverage of different computer science sub fields changed over time. We show that DBLP evolved from a specialized project to a collection that encompasses most parts of computer science. In another study we analyze the impact of user emails to defect corrections in DBLP. We show that these emails trigger a significant amount of error corrections. Based on these data we can draw conclusions on why users report a defective entry in DBLP.
Automata theory is the study of abstract machines. It is a theory in theoretical computer science and discrete mathematics (a subject of study in mathematics and computer science). The word automata (the plural of automaton) comes from a Greek word which means "self-acting". Automata theory is closely related to formal language theory [99, 101]. The theory of formal languages constitutes the backbone of the field of science now generally known as theoretical computer science. This thesis aims to introduce a few types of automata and studies then class of languages recognized by them. Chapter 1 is the road map with introduction and preliminaries. In Chapter 2 we consider few formal languages associated to graphs that has Eulerian trails. We place few languages in the Chomsky hierarchy that has some other properties together with the Eulerian property. In Chapter 3 we consider jumping finite automata, i. e., finite automata in which input head after reading and consuming a symbol, can jump to an arbitrary position of the remaining input. We characterize the class of languages described by jumping finite automata in terms of special shuffle expressions and survey other equivalent notions from the existing literature. We could also characterize some super classes of this language class. In Chapter 4 we introduce boustrophedon finite automata, i. e., finite automata working on rectangular shaped arrays (i. e., pictures) in a boustrophedon mode and we also introduce returning finite automata that reads the input, line after line, does not alters the direction like boustrophedon finite automata i. e., reads always from left to right, line after line. We provide close relationships with the well-established class of regular matrix (array) languages. We sketch possible applications to character recognition and kolam patterns. Chapter 5 deals with general boustrophedon finite automata, general returning finite automata that read with different scanning strategies. We show that all 32 different variants only describe two different classes of array languages. We also introduce Mealy machines working on pictures and show how these can be used in a modular design of picture processing devices. In Chapter 6 we compare three different types of regular grammars of array languages introduced in the literature, regular matrix grammars, (regular : regular) array grammars, isometric regular array grammars, and variants thereof, focusing on hierarchical questions. We also refine the presentation of (regular : regular) array grammars in order to clarify the interrelations. In Chapter 7 we provide further directions of research with respect to the study that we have done in each of the chapters.