<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Computer Science Dissertations</title>
<copyright>Copyright (c) 2013 Georgia State University All rights reserved.</copyright>
<link>http://digitalarchive.gsu.edu/cs_diss</link>
<description>Recent documents in Computer Science Dissertations</description>
<language>en-us</language>
<lastBuildDate>Mon, 13 May 2013 11:31:52 PDT</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Scientific High Performance Computing (HPC) Applications On The Azure Cloud Platform</title>
<link>http://digitalarchive.gsu.edu/cs_diss/75</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/75</guid>
<pubDate>Wed, 01 May 2013 08:31:10 PDT</pubDate>
<description>
	<![CDATA[
	<p>Cloud computing is emerging as a promising platform for compute and data intensive scientific applications. Thanks to the on-demand elastic provisioning capabilities, cloud computing has instigated curiosity among researchers from a wide range of disciplines. However, even though many vendors have rolled out their commercial cloud infrastructures, the service offerings are usually only best-effort based without any performance guarantees. Utilization of these resources will be questionable if it can not meet the performance expectations of deployed applications. Additionally, the lack of the familiar development tools hamper the productivity of eScience developers to write robust scientific high performance computing (HPC) applications. There are no standard frameworks that are currently supported by any large set of vendors offering cloud computing services. Consequently, the application portability among different cloud platforms for scientific applications is hard. Among all clouds, the emerging Azure cloud from Microsoft in particular remains a challenge for HPC program development both due to lack of its support for traditional parallel programming support such as Message Passing Interface (MPI) and map-reduce and due to its evolving application programming interfaces (APIs). We have designed newer frameworks and runtime environments to help HPC application developers by providing them with easy to use tools similar to those known from traditional parallel and distributed computing environment set- ting, such as MPI, for scientific application development on the Azure cloud platform. It is challenging to create an efficient framework for any cloud platform, including the Windows Azure platform, as they are mostly offered to users as a black-box with a set of application programming interfaces (APIs) to access various service components. The primary contributions of this Ph.D. thesis are (i) creating a generic framework for bag-of-tasks HPC applications to serve as the basic building block for application development on the Azure cloud platform, (ii) creating a set of APIs for HPC application development over the Azure cloud platform, which is similar to message passing interface (MPI) from traditional parallel and distributed setting, and (iii) implementing Crayons using the proposed APIs as the first end-to-end parallel scientific application to parallelize the fundamental GIS operations.</p>

	]]>
</description>

<author>Dinesh Agarwal</author>


</item>






<item>
<title>Collaborative Communication And Storage In Energy-Synchronized Sensor Networks</title>
<link>http://digitalarchive.gsu.edu/cs_diss/74</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/74</guid>
<pubDate>Thu, 25 Apr 2013 09:01:36 PDT</pubDate>
<description>
	<![CDATA[
	<p>In a battery-less sensor network, all the operation of sensor nodes are strictly constrained by and synchronized with the fluctuations of harvested energy, causing nodes to be disruptive from network and hence unstable network connectivity. Such wireless sensor network is named as energy-synchronized sensor networks. The unpredictable network disruptions and challenging communication environments make the traditional communication protocols inefficient and require a new paradigm-shift in design. In this thesis, I propose a set of algorithms on collaborative data communication and storage for energy-synchronized sensor networks. The solutions are based on erasure codes and probabilistic network codings. The proposed set of algorithms significantly improve the data communication throughput and persistency, and they are inherently amenable to probabilistic nature of transmission in wireless networks.</p>
<p>The technical contributions explore collaborative communication with both no coding and network coding methods. First, I propose a collaborative data delivery protocol to exploit the optimal performance of multiple energy-synchronized paths without network coding, i.e. a new max-flow min-variance algorithm. In consort with this data delivery protocol, a localized TDMA MAC protocol is designed to synchronize nodes' duty-cycles and mitigate media access contentions. However, the energy supply can change dynamically over time, making determined duty cycles synchronization difficult in practice. A probabilistic approach is investigated. Therefore, I present Opportunistic Network Erasure Coding protocol (ONEC), to collaboratively collect data. ONEC derives the probability distribution of coding degree in each node and enable opportunistic in-network recoding, and guarantee the recovery of original sensor data can be achieved with high probability upon receiving any sufficient amount of encoded packets. Next, OnCode, an opportunistic in-network data coding and delivery protocol is proposed to further improve data communication under the constraints of energy synchronization. It is resilient to packet loss and network disruptions, and does not require explicit end-to-end feedback message. Moreover, I present a network Erasure Coding with randomized Power Control (ECPC) mechanism for collaborative data storage in disruptive sensor networks. ECPC only requires each node to perform a single broadcast at each of its several randomly selected power levels. Thus it incurs very low communication overhead. Finally, I propose an integrated algorithm and middleware (Ravine Stream) to improve data delivery throughput as well as data persistency in energy-synchronized sensor network.</p>

	]]>
</description>

<author>Mingsen Xu</author>


</item>






<item>
<title>Maintaining Integrity Constraints in Semantic Web</title>
<link>http://digitalarchive.gsu.edu/cs_diss/73</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/73</guid>
<pubDate>Mon, 08 Apr 2013 10:01:46 PDT</pubDate>
<description>
	<![CDATA[
	<p>As an expressive knowledge representation language for Semantic Web, Web Ontology Language (OWL) plays an important role in areas like science and commerce. The problem of maintaining integrity constraints arises because OWL employs the Open World Assumption (OWA) as well as the Non-Unique Name Assumption (NUNA). These assumptions are typically suitable for representing knowledge distributed across the Web, where the complete knowledge about a domain cannot be assumed, but make it challenging to use OWL itself for closed world integrity constraint validation. Integrity constraints (ICs) on ontologies have to be enforced; otherwise conflicting results would be derivable from the same knowledge base (KB). The current trends of incorporating ICs into OWL are based on its query language SPARQL, alternative semantics, or logic programming. These methods usually suffer from limited types of constraints they can handle, and/or inherited computational expensiveness.</p>
<p>This dissertation presents a comprehensive and efficient approach to maintaining integrity constraints. The design enforces data consistency throughout the OWL life cycle, including the processes of OWL generation, maintenance, and interactions with other ontologies. For OWL generation, the Paraconsistent model is used to maintain integrity constraints during the relational database to OWL translation process. Then a new rule-based language with set extension is introduced as a platform to allow users to specify constraints, along with a demonstration of 18 commonly used constraints written in this language. In addition, a new constraint maintenance system, called Jena2Drools, is proposed and implemented, to show its effectiveness and efficiency. To further handle inconsistencies among multiple distributed ontologies, this work constructs a framework to break down global constraints into several sub-constraints for efficient parallel validation.</p>

	]]>
</description>

<author>Ming Fang</author>


</item>






<item>
<title>Simulation Software as a Service and Service-Oriented Simulation Experiment</title>
<link>http://digitalarchive.gsu.edu/cs_diss/72</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/72</guid>
<pubDate>Tue, 04 Dec 2012 09:46:03 PST</pubDate>
<description>
	<![CDATA[
	<p>Simulation software is being increasingly used in various domains for system analysis and/or behavior prediction. Traditionally, researchers and field experts need to have access to the computers that host the simulation software to do simulation experiments. With recent advances in cloud computing and Software as a Service (SaaS), a new paradigm is emerging where simulation software is used as services that are composed with others and dynamically influence each other for service-oriented simulation experiment on the Internet.</p>
<p>The new service-oriented paradigm brings new research challenges in composing multiple simulation services in a meaningful and correct way for simulation experiments. To systematically support simulation software as a service (SimSaaS) and service-oriented simulation experiment, we propose a layered framework that includes five layers: an infrastructure layer, a simulation execution engine layer, a simulation service layer, a simulation experiment layer and finally a graphical user interface layer. Within this layered framework, we provide a specification for both simulation experiment and the involved individual simulation services. Such a formal specification is useful in order to support systematic compositions of simulation services as well as automatic deployment of composed services for carrying out simulation experiments. Built on this specification, we identify the issue of mismatch of time granularity and event granularity in composing simulation services at the pragmatic level, and develop four types of granularity handling agents to be associated with the couplings between services. The ultimate goal is to achieve standard and automated approaches for simulation service composition in the emerging service-oriented computing environment. Finally, to achieve more efficient service-oriented simulation, we develop a profile-based partitioning method that exploits a system’s dynamic behavior and uses it as a profile to guide the spatial partitioning for more efficient parallel simulation. We develop the work in this dissertation within the application context of wildfire spread simulation, and demonstrate the effectiveness of our work based on this application.</p>

	]]>
</description>

<author>Song Guo</author>


</item>






<item>
<title>Algorithms for Transcriptome Quantification and Reconstruction from RNA-Seq Data</title>
<link>http://digitalarchive.gsu.edu/cs_diss/71</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/71</guid>
<pubDate>Tue, 04 Dec 2012 09:41:30 PST</pubDate>
<description>
	<![CDATA[
	<p>Massively parallel whole transcriptome sequencing and its ability to generate full transcriptome data at the single transcript level provides a powerful tool with multiple interrelated applications, including transcriptome reconstruction, gene/isoform expression estimation, also known as transcriptome quantification. As a result, whole transcriptome sequencing has become the technology of choice for performing transcriptome analysis, rapidly replacing array-based technologies. The most commonly used transcriptome sequencing protocol, referred to as RNA-Seq, generates short (single or paired) sequencing tags from the ends of randomly generated cDNA fragments. RNA-Seq protocol reduces the sequencing cost and significantly increases data throughput, but is computationally challenging to reconstruct full-length transcripts and accurately estimate their abundances across all cell types.</p>
<p>We focus on two main problems in transcriptome data analysis, namely, transcriptome reconstruction and quantification. Transcriptome reconstruction, also referred to as novel isoform discovery, is the problem of reconstructing the transcript sequences from the sequencing data. Reconstruction can be done de novo or it can be assisted by existing genome and transcriptome annotations. Transcriptome quantification refers to the problem of estimating the expression level of each transcript. We present a genome-guided and annotation-guided transcriptome reconstruction methods as well as methods for transcript and gene expression level estimation. Empirical results on both synthetic and real RNA-seq datasets show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to previous methods.</p>

	]]>
</description>

<author>Serghei Mangul</author>


</item>






<item>
<title>Connected Dominating Set Based Topology Control in Wireless Sensor Networks</title>
<link>http://digitalarchive.gsu.edu/cs_diss/70</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/70</guid>
<pubDate>Fri, 31 Aug 2012 10:36:18 PDT</pubDate>
<description>
	<![CDATA[
	<p>Wireless Sensor Networks (WSNs) are now widely used for monitoring and controlling of systems where human intervention is not desirable or possible. Connected Dominating Sets (CDSs) based topology control in WSNs is one kind of hierarchical method to ensure sufficient coverage while reducing redundant connections in a relatively crowded network. Moreover, Minimum-sized Connected Dominating Set (MCDS) has become a well-known approach for constructing a Virtual Backbone (VB) to alleviate the broadcasting storm for efficient routing in WSNs extensively. However, no work considers the load-balance factor of CDSsin WSNs. In this dissertation, we first propose a new concept — the Load-Balanced CDS (LBCDS) and a new problem — the Load-Balanced Allocate Dominatee (LBAD) problem. Consequently, we propose a two-phase method to solve LBCDS and LBAD one by one and a one-phase Genetic Algorithm (GA) to solve the problems simultaneously.</p>
<p>Secondly, since there is no performance ratio analysis in previously mentioned work, three problems are investigated and analyzed later. To be specific, the MinMax Degree Maximal Independent Set (MDMIS) problem, the Load-Balanced Virtual Backbone (LBVB) problem, and the MinMax Valid-Degree non Backbone node Allocation (MVBA) problem. Approximation algorithms and comprehensive theoretical analysis of the approximation factors are presented in the dissertation.</p>
<p>On the other hand, in the current related literature, networks are deterministic where two nodes are assumed either connected or disconnected. In most real applications, however, there are many intermittently connected wireless links called lossy links, which only provide probabilistic connectivity. For WSNs with lossy links, we propose a Stochastic Network Model (SNM). Under this model, we measure the quality of CDSs using CDS reliability. In this dissertation, we construct an MCDS while its reliability is above a preset applicationspecified threshold, called Reliable MCDS (RMCDS). We propose a novel Genetic Algorithm (GA) with immigrant schemes called RMCDS-GA to solve the RMCDS problem.</p>
<p>Finally, we apply the constructed LBCDS to a practical application under the realistic SNM model, namely data aggregation. To be specific, a new problem, Load-Balanced Data Aggregation Tree (LBDAT), is introduced finally. Our simulation results show that the proposed algorithms outperform the existing state-of-the-art approaches significantly.</p>

	]]>
</description>

<author>Jing S. He</author>


</item>






<item>
<title>Resource Management in Survivable Multi-Granular Optical Networks</title>
<link>http://digitalarchive.gsu.edu/cs_diss/67</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/67</guid>
<pubDate>Thu, 12 Jul 2012 12:15:13 PDT</pubDate>
<description>
	<![CDATA[
	<p>The last decade witnessed a wild growth of the Internet traffic, promoted by bandwidth-hungry applications such as Youtube, P2P, and VoIP. This explosive increase is expected to proceed with an annual rate of 34% in the near future, which leads to a huge challenge to the Internet infrastructure. One foremost solution to this problem is advancing the optical networking and switching, by which abundant bandwidth can be provided in an energy-efficient manner. For instance, with Wavelength Division Multiplexing (WDM) technology, each fiber can carry a mass of wavelengths with bandwidth up to 100 Gbits/s or higher. To keep up with the traffic explosion, however, simply scaling the number of fibers and/or wavelengths per fiber results in the scalability issue in WDM networks. One major motivation of this dissertation is to address this issue in WDM networks with the idea of waveband switching (WBS). This work includes the author's study on multiple aspects of waveband switching: how to address dynamic user demand, how to accommodate static user demand, and how to achieve a survivable WBS network. When combined together, the proposed approaches form a framework that enables an efficient WBS-based Internet in the near future or the middle term. As a long-term solution for the Internet backbone, the <em>Spectrum Sliced Elastic Optical Path (SLICE) </em>Networks recently attract significant interests. SLICE aims to provide abundant bandwidth by managing the spectrum resources as orthogonal sub-carriers, a finer granular than wavelengths of WDM networks. Another important component of this dissertation is the author's timely study on this new frontier: particulary, how to efficiency accommodate the user demand in SLICE networks. We refer to the overall study as the resource management in <em>multi-granular optical networks</em>. In WBS networks, the multi-granularity includes the fiber, waveband, and wavelength. While in SLICE networks, the traffic granularity refers to the fiber, and the variety of the demand size (in terms of number of sub-carriers).</p>

	]]>
</description>

<author>yang wang</author>


</item>






<item>
<title>DI-SEC: Distributed Security Framework for Heterogeneous Wireless Sensor Networks</title>
<link>http://digitalarchive.gsu.edu/cs_diss/66</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/66</guid>
<pubDate>Tue, 17 Apr 2012 07:57:27 PDT</pubDate>
<description>
	<![CDATA[
	<p>Wireless Sensor Networks (WSNs) are deployed for monitoring in a range of critical domains (e.g., health care, military, critical infrastructure). Accordingly, these WSNs should be resilient to attacks. The current approach to defending against malicious threats is to develop and deploy a specific defense mechanism for a specific attack. However, the problem with this traditional approach to defending sensor networks is that the solution for one attack (i.e., Jamming attack) does not defend against other attacks (e.g., Sybil and Selective Forwarding). This work addresses the challenges with the traditional approach to securing sensor networks and presents a comprehensive framework, Di-Sec, that can defend against all known and forthcoming attacks. At the heart of Di-Sec lies the monitoring core (M-Core), which is an extensible and lightweight layer that gathers information and statistics relevant for creating defense modules. Along with Di-Sec, a new user-friendly domain-specific language was developed, the M-Core Control Language (MCL). Using the MCL, a user can implement new defense mechanisms without the overhead of learning the details of the underlying software architecture (i.e., TinyOS, Di-Sec). Hence, the MCL expedites the development of sensor defense mechanisms by significantly simplifying the coding process for developers. The Di-Sec framework has been implemented and tested on real sensors to evaluate its feasibility and performance. Our evaluation shows that Di-Sec is feasible on today’s resource-limited sensors and has a nominal overhead. Furthermore, we illustrate the functionality of Di-Sec by implementing four detection and defense mechanisms for attacks at various layers of the communication stack.</p>

	]]>
</description>

<author>Marco Valero</author>


</item>






<item>
<title>Protein Tertiary Model Assessment Using Granular Machine Learning Techniques</title>
<link>http://digitalarchive.gsu.edu/cs_diss/65</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/65</guid>
<pubDate>Wed, 11 Apr 2012 09:30:26 PDT</pubDate>
<description>
	<![CDATA[
	<p>The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM.</p>
<p>Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment.</p>
<p>Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same.</p>

	]]>
</description>

<author>Anjum A. Chida</author>


</item>






<item>
<title>Shadow Price Guided Genetic Algorithms</title>
<link>http://digitalarchive.gsu.edu/cs_diss/64</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/64</guid>
<pubDate>Wed, 11 Apr 2012 09:12:30 PDT</pubDate>
<description>
	<![CDATA[
	<p>The Genetic Algorithm (GA) is a popular global search algorithm. Although it has been used successfully in many fields, there are still performance challenges that prevent GA’s further success. The performance challenges include: difficult to reach optimal solutions for complex problems and take a very long time to solve difficult problems. This dissertation is to research new ways to improve GA’s performance on solution quality and convergence speed. The main focus is to present the concept of shadow price and propose a two-measurement GA. The new algorithm uses the fitness value to measure solutions and shadow price to evaluate components. New shadow price Guided operators are used to achieve good measurable evolutions. Simulation results have shown that the new shadow price Guided genetic algorithm (SGA) is effective in terms of performance and efficient in terms of speed.</p>

	]]>
</description>

<author>Gang Shen</author>


</item>






<item>
<title>Innovative Algorithms and Evaluation Methods for Biological Motif Finding</title>
<link>http://digitalarchive.gsu.edu/cs_diss/63</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/63</guid>
<pubDate>Wed, 11 Apr 2012 07:27:51 PDT</pubDate>
<description>
	<![CDATA[
	<p>Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of</p>
<p>network motifs is still invalidated and currently no databases exist for this purpose.</p>
<p>In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs.</p>
<p>In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques.</p>
<p>We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins.</p>

	]]>
</description>

<author>Wooyoung Kim</author>


</item>






<item>
<title>Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations</title>
<link>http://digitalarchive.gsu.edu/cs_diss/62</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/62</guid>
<pubDate>Tue, 22 Nov 2011 09:22:55 PST</pubDate>
<description>
	<![CDATA[
	<p>Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes.</p>
<p>In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity.</p>
<p>In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm.</p>

	]]>
</description>

<author>Ken D. Nguyen</author>


</item>






<item>
<title>Syntactic and Semantic Analysis and Visualization of Unstructured English Texts</title>
<link>http://digitalarchive.gsu.edu/cs_diss/61</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/61</guid>
<pubDate>Thu, 13 Oct 2011 06:23:37 PDT</pubDate>
<description>
	<![CDATA[
	<p>People have complex thoughts, and they often express their thoughts with complex sentences using natural languages. This complexity may facilitate efficient communications among the audience with the same knowledge base. But on the other hand, for a different or new audience this composition becomes cumbersome to understand and analyze. Analysis of such compositions using syntactic or semantic measures is a challenging job and defines the base step for natural language processing.</p>
<p>In this dissertation I explore and propose a number of new techniques to analyze and visualize the syntactic and semantic patterns of unstructured English texts.</p>
<p>The syntactic analysis is done through a proposed visualization technique which categorizes and compares different English compositions based on their different reading complexity metrics. For the semantic analysis I use Latent Semantic Analysis (LSA) to analyze the hidden patterns in complex compositions. I have used this technique to analyze comments from a social visualization web site for detecting the irrelevant ones (e.g., spam). The patterns of collaborations are also studied through statistical analysis.</p>
<p>Word sense disambiguation is used to figure out the correct sense of a word in a sentence or composition. Using textual similarity measure, based on the different word similarity measures and word sense disambiguation on collaborative text snippets from social collaborative environment, reveals a direction to untie the knots of complex hidden patterns of collaboration.</p>

	]]>
</description>

<author>Saurav Karmakar</author>


</item>






<item>
<title>Diversified Ensemble Classifiers for Highly Imbalanced Data Learning and their Application in Bioinformatics</title>
<link>http://digitalarchive.gsu.edu/cs_diss/60</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/60</guid>
<pubDate>Tue, 03 May 2011 05:42:46 PDT</pubDate>
<description>
	<![CDATA[
	<p>In this dissertation, the problem of learning from highly imbalanced data is studied. Imbalance data learning is of great importance and challenge in many real applications. Dealing with a minority class normally needs new concepts, observations and solutions in order to fully understand the underlying complicated models. We try to systematically review and solve this special learning task in this dissertation.<br />We propose a new ensemble learning framework—Diversified Ensemble Classifiers for Imbal-anced Data Learning (DECIDL), based on the advantages of existing ensemble imbalanced learning strategies. Our framework combines three learning techniques: a) ensemble learning, b) artificial example generation, and c) diversity construction by reversely data re-labeling. As a meta-learner, DECIDL utilizes general supervised learning algorithms as base learners to build an ensemble committee. <br />We create a standard benchmark data pool, which contains 30 highly skewed sets with diverse characteristics from different domains, in order to facilitate future research on imbalance data learning. We use this benchmark pool to evaluate and compare our DECIDL framework with several ensemble learning methods, namely under-bagging, over-bagging, SMOTE-bagging, and AdaBoost. Extensive experiments suggest that our DECIDL framework is comparable with other methods. The data sets, experiments and results provide a valuable knowledge base for future research on imbalance learning. <br /> We develop a simple but effective artificial example generation method for data balancing. Two new methods DBEG-ensemble and DECIDL-DBEG are then designed to improve the power of imbalance learning. Experiments show that these two methods are comparable to the state-of-the-art methods, e.g., GSVM-RU and SMOTE-bagging. <br />Furthermore, we investigate learning on imbalanced data from a new angle—active learning. By combining active learning with the DECIDL framework, we show that the newly designed Active-DECIDL method is very effective for imbalance learning, suggesting the DECIDL framework is very robust and flexible.<br />Lastly, we apply the proposed learning methods to a real-world bioinformatics problem—protein methylation prediction. Extensive computational results show that the DECIDL method does perform very well for the imbalanced data mining task. Importantly, the experimental results have confirmed our new contributions on this particular data learning problem.</p>

	]]>
</description>

<author>ZEJIN DING</author>


</item>






<item>
<title>Inferring Genomic Sequences</title>
<link>http://digitalarchive.gsu.edu/cs_diss/59</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/59</guid>
<pubDate>Fri, 29 Apr 2011 05:57:34 PDT</pubDate>
<description>
	<![CDATA[
	<p>Recent advances in next generation sequencing have provided unprecedented opportunities for high-throughput genomic research, inexpensively producing millions of genomic sequences in a single run. Analysis of massive volumes of data results in a more accurate picture of the genome complexity and requires adequate bioinformatics support. We explore computational challenges of applying next generation sequencing to particular applications, focusing on the problem of reconstructing viral quasispecies spectrum from pyrosequencing shotgun reads and problem of inferring informative single nucleotide polymorphisms (SNPs), statistically covering genetic variation of a genome region in genome-wide association studies.</p>
<p>The genomic diversity of viral quasispecies is a subject of a great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software cannot be used to simultaneously assemble and estimate the abundance of multiple closely related (but non-identical) quasispecies sequences.  Here, we introduce a new Viral Spectrum Assembler (ViSpA) for inferring quasispecies spectrum and compare it with the state-of-the-art ShoRAH tool on both synthetic and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. While ShoRAH has an advanced error correction algorithm, ViSpA is better at quasispecies assembling, producing more accurate reconstruction of a viral population. We also foresee ViSpA application to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p>
<p>Due to the large data volume in genome-wide association studies, it is desirable to find a small subset of SNPs (tags) that covers the genetic variation of the entire set. We explore the trade-off between the number of tags used per non-tagged SNP and possible overfitting and propose an efficient 2LR-Tagging heuristic.</p>

	]]>
</description>

<author>Irina A. Astrovskaya</author>


</item>






<item>
<title>Virtual Dynamic Tunnel: A Target-Agnostic Assistive User Interface Algorithm for Head-Operated Input Devices</title>
<link>http://digitalarchive.gsu.edu/cs_diss/58</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/58</guid>
<pubDate>Tue, 30 Nov 2010 12:46:52 PST</pubDate>
<description>
	<![CDATA[
	<p>Today the effective use of computers (e.g. those with Internet browsers and graphical interfaces) involves the use of some sort of cursor control like what a mouse provides. However, a standard mouse is not always the best option for all users. There are currently many devices available to provide alternative computer access. These devices may be divided into categories: brain-computer interfaces (BCI), mouth-based controls, camera-based controls, and head-tilt controls. There is no single solution as each device and application has to be tailored to each user's unique preferences and abilities. Furthermore, each device category has certain strengths and weaknesses that need to be considered when making an effective match between a user and a device. One problem that remains is that these alternative input devices do not perform as well when compared to standard mouse devices. To help with this, assistive user interface techniques can be employed. While research shows that these techniques help, most require that modifications be made to the user interfaces or that a user's intended target be known beforehand by the host computer. In this research, a novel target-agnostic assistive user interface algorithm intended to improve usage performance for both head-operated and standard mouse devices is designed, implemented (as a mouse device driver and in host computer software) and experimentally evaluated. In addition, a new wireless head-operated input device requiring no special host computer hardware, is designed, built and evaluated. It was found that the Virtual Dynamic Tunnel algorithm improved performance for a standard mouse in straight tunnel trials and that nearly 60% of users would be willing to use the head-tilt mouse as a hands-free option for cursor control.</p>

	]]>
</description>

<author>Ferrol R. Blackmon</author>


</item>






<item>
<title>Dynamic Data Driven Application System for Wildfire Spread Simulation</title>
<link>http://digitalarchive.gsu.edu/cs_diss/57</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/57</guid>
<pubDate>Mon, 29 Nov 2010 08:12:39 PST</pubDate>
<description>
	<![CDATA[
	<p>Wildfires have significant impact on both ecosystems and human society. To effectively manage wildfires, simulation models are used to study and predict wildfire spread. The accuracy of wildfire spread simulations depends on many factors, including GIS data, fuel data, weather data, and high-fidelity wildfire behavior models. Unfortunately, due to the dynamic and complex nature of wildfire, it is impractical to obtain all these data with no error. Therefore, predictions from the simulation model will be different from what it is in a real wildfire. Without assimilating data from the real wildfire and dynamically adjusting the simulation, the difference between the simulation and the real wildfire is very likely to continuously grow. With the development of sensor technologies and the advance of computer infrastructure, dynamic data driven application systems (DDDAS) have become an active research area in recent years. In a DDDAS, data obtained from wireless sensors is fed into the simulation model to make predictions of the real system. This dynamic input is treated as the measurement to evaluate the output and adjust the states of the model, thus to improve simulation results. To improve the accuracy of wildfire spread simulations, we apply the concept of DDDAS to wildfire spread simulation by dynamically assimilating sensor data from real wildfires into the simulation model. The assimilation system relates the system model and the observation data of the true state, and uses analysis approaches to obtain state estimations. We employ Sequential Monte Carlo (SMC) methods (also called particle filters) to carry out data assimilation in this work. Based on the structure of DDDAS, this dissertation presents the data assimilation system and data assimilation results in wildfire spread simulations. We carry out sensitivity analysis for different densities, frequencies, and qualities of sensor data, and quantify the effectiveness of SMC methods based on different measurement metrics. Furthermore, to improve simulation results, the image-morphing technique is introduced into the DDDAS for wildfire spread simulation.</p>

	]]>
</description>

<author>Feng Gu</author>


</item>






<item>
<title>A Framework for Group Modeling in Agent-Based Pedestrian Crowd Simulations</title>
<link>http://digitalarchive.gsu.edu/cs_diss/56</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/56</guid>
<pubDate>Tue, 23 Nov 2010 06:58:05 PST</pubDate>
<description>
	<![CDATA[
	<p>Pedestrian crowd simulation explores crowd behaviors in virtual environments. It is extensively studied in many areas, such as safety and civil engineering, transportation, social science, entertainment industry and so on. As a common phenomenon in pedestrian crowds, grouping can play important roles in crowd behaviors. To achieve more realistic simulations, it is important to support group modeling in crowd behaviors. Nevertheless, group modeling is still an open and challenging problem. The influence of groups on the dynamics of crowd movement has not been incorporated into most existing crowd models because of the complexity nature of social groups.   This research develops a framework for group modeling in agent-based pedestrian crowd simulations. The framework includes multiple layers that support a systematic approach for modeling social groups in pedestrian crowd simulations. These layers include a simulation engine layer that provides efficient simulation engines to simulate the crowd model; a behavior-based agent modeling layers that supports developing agent models using the developed BehaviorSim simulation software; a group modeling layer that provides a well-defined way to model inter-group relationships and intra-group connections among pedestrian agents in a crowd; and finally a context modeling layer that allows users to incorporate various social and psychological models into the study of social groups in pedestrian crowd. Each layer utilizes the layer below it to fulfill its functionality, and together these layers provide an integrated framework for supporting group modeling in pedestrian crowd simulations. To our knowledge this work is the first one to focus on a systematic group modeling approach for pedestrian crowd simulations. This systematic modeling approach allows users to create social group simulation models in a well-defined way for studying the effect of social and psychological factors on crowd’s grouping behavior. To demonstrate the capability of the group modeling framework, we developed an application of dynamic grouping for pedestrian crowd simulations.</p>

	]]>
</description>

<author>Fasheng Qiu</author>


</item>






<item>
<title>Energy-Efficient Data Management in Wireless Sensor Networks</title>
<link>http://digitalarchive.gsu.edu/cs_diss/55</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/55</guid>
<pubDate>Wed, 15 Sep 2010 11:40:21 PDT</pubDate>
<description>
	<![CDATA[
	<p>Wireless Sensor Networks (WSNs) are deployed widely for various applications. A variety of useful data are generated by these deployments. Since WSNs have limited resources and unreliable communication links, traditional data management techniques are not suitable. Therefore, designing effective data management techniques for WSNs becomes important. In this dissertation, we address three key issues of data management in WSNs. For data collection, a scheme of making some nodes sleep and estimating their values according to the other active nodes’ readings has been proved energy-efficient. For the purpose of improving the precision of estimation, we propose two powerful estimation models, Data Estimation using a Physical Model (DEPM) and Data Estimation using a Statistical Model (DESM). Most of existing data processing approaches of WSNs are real-time. However, historical data of WSNs are also significant for various applications. No previous study has specifically addressed distributed historical data query processing. We propose an Index based Historical Data Query Processing scheme which stores historical data locally and processes queries energy-efficiently by using a distributed index tree. Area query processing is significant for various applications of WSNs. No previous study has specifically addressed this issue. We propose an energy-efficient in-network area query processing scheme. In our scheme, we use an intelligent method (Grid lists) to describe an area, thus reducing the communication cost and dropping useless data as early as possible. With a thorough simulation study, it is shown that our schemes are effective and energy- efficient. Based on the area query processing algorithm, an Intelligent Monitoring System is designed to detect various events and provide real-time and accurate information for escaping, rescuing, and evacuation when a dangerous event happened.</p>

	]]>
</description>

<author>Chunyu Ai</author>


</item>






<item>
<title>Enhanced Web Search Engines with Query-Concept Bipartite Graphs</title>
<link>http://digitalarchive.gsu.edu/cs_diss/54</link>
<guid isPermaLink="true">http://digitalarchive.gsu.edu/cs_diss/54</guid>
<pubDate>Wed, 15 Sep 2010 11:40:20 PDT</pubDate>
<description>
	<![CDATA[
	<p>With rapid growth of information on the Web, Web search engines have gained great momentum for exploiting valuable Web resources. Although keywords-based Web search engines provide relevant search results in response to users’ queries, future enhancement is still needed. Three important issues include (1) search results can be diverse because ambiguous keywords in queries can be interpreted to different meanings; (2) indentifying keywords in long queries is difficult for search engines; and (3) generating query-specific Web page summaries is desirable for Web search results’ previews. Based on clickthrough data,   this thesis proposes a query-concept bipartite graph for representing queries’ relations, and applies the queries’ relations to applications such as (1) personalized query suggestions, (2) long queries Web searches and (3) query-specific Web page summarization.  Experimental results show that query-concept bipartite graphs are useful for performance improvement for the three applications.</p>

	]]>
</description>

<author>Yan Chen</author>


</item>





</channel>
</rss>
