Browsed by
Category: Covergence Biotech

The Co-evolution of Bioinformatics and “Big Data” Analytics

The Co-evolution of Bioinformatics and “Big Data” Analytics

Bioinformatics is a multidisciplinary field that combines biology, computer science, and statistics to develop methods for the processing and interpretation of biological data. It has grown exponentially since the late 1980’s, when the first databases of protein sequence motifs emerged. Boosted first by the growth of the internet and later by the increasing popularity of high-throughput biological experimentation, bioinformatics has evolved far beyond “motif finding” in recent years. Increasing industrialization of laboratory techniques to make them “high throughput” has revolutionized many fields of biological inquiry, and bioinformatics has rapidly evolved in conjunction with the emergence of “big data” produced by such techniques.

An early application of bioinformatics to process and interpret “big data” was the analysis of microarrays, which allowed the expression levels of thousands of genes to be examined simultaneously. More recently, the development of next-generation sequencing technologies that can determine the sequence of hundreds of millions of short pieces of DNA or RNA per experiment has spawned whole new sub-fields of bioinformatics. As the cost of sequencing has decreased, an explosive increase in the use of whole genome sequencing techniques has revolutionized molecular biology.

More importantly, the tremendous increase in the quantity and variety of data that is generated by high-throughput assays has changed the very nature of hypothesis generation and experimental design. Whereas most experiments used to be designed to test a specific hypothesis (i.e. “that expression of gene A will be altered in response to X”), it has now become more common to design experiments that are “data-driven”. Rather than looking individually at gene A, one can simultaneously examine the expression of every gene in the genome and formulate a hypothesis later based on the results. While “hypothesis-driven” experimentation will always be the cornerstone of scientific inquiry, the ability to perform “data-driven” experiments frees the process of discovery from the confines of expectation. For example, next-generation sequencing studies have demonstrated the existence of thousands of new non-coding RNAs and novel gene isoforms that were never detected by more targeted assays.

At the same time, the quantity and multidimensional nature of all of this new data has also impacted the nature and scope of bioinformatics. Because of the statistical rules surrounding “multiple testing”, the significance of expression changes that are detected for a single gene in a hypothesis-driven experiment is much greater than the significance of detecting the same expression changes in “any” gene in a genome-wide experiment. Bioinformatics tools that are designed to analyze these types of experiments must therefore account for such considerations, and bioinformaticians must accordingly have a strong grasp of biostatistics.  In addition, bioinformaticians must increasingly use sophisticated programming and data management skills to create and maintain relational databases that are too large and complex for standard commercial software. The sheer quantity of data that is generated is also too great to be uploaded, downloaded, or otherwise transferred between computers in a timely manner and therefore necessitates that bioinformaticians become proficient at working remotely on a server using to manipulate and utilize data. Because the skills required to analyze “big data” for bioinformatics are also highly applicable other kinds of data, such as hospital records, bioinformatics has co-evolved with an array of related specialties, such as health informatics, that serve to further drive demand for skilled analysts.  As techniques and systems continue to grow in power, scale, and sophistication, we can expect to see ever-increasing demand for “big data analytics” in bioinformatics and related fields. How can we encourage young professionals to evolve to meet this demand?

Posted by
Miranda M. Darby
Author Bio
Miranda M. Darby, Ph.D., is an assistant professor of bioinformatics at Hood College and director of the bioinformatics MS and certificate programs at the Hood College Graduate School. Professor Darby earned a bachelor's degree in biology from Carleton College and a Ph.D. in molecular biology and genetics from the Johns Hopkins School of Medicine. She gained practical experience in bioinformatics while completing a postdoctoral fellowship at the Johns Hopkins School of Medicine, where she developed and implemented bioinformatics tools to study the expression of repetitive elements (repeated sequences that make up roughly 50% of the genome) and also to identify concerted changes in the expression of functionally-related genes across individuals with schizophrenia, bipolar disorder, and major depression. She has mentored undergraduate researchers studying subjects ranging from the mechanisms that regulate RNA transcription in yeast to the characterization of novel mRNA isoforms expressed in the human brain. Professor Darby has published articles on the function and regulation of a non-canonical RNA transcription termination pathway in yeast; altered gene expression in psychiatric disease; and novel mRNA isoforms expressed in the human brain that are generated by splicing of repetitive RNA sequences. Her current research focuses on the development of computational methods to identify and quantify the expression of novel RNAs using whole genome RNA sequencing.
Big Announcements from Maryland Tech Council

Big Announcements from Maryland Tech Council

Please see important information below regarding our office move, guest blogs and member videos! Let me know if you have questions.  I’m looking forward to seeing you soon!

  • Big Move

    Maryland Tech Council is saying goodbye to our old digs on September 20, 2017.  Please make note, our communications will be down that day and we will resume full activity on September 21, 2017.  MTC’s new headquarters will be located at Launch Workplaces in Gaithersburg MD, 9841 Washingtonian Boulevard, Suite 200, Gaithersburg MD 20878.

  • Be a Guest Blogger

    Maryland Tech Council is launching the Member Point of View (POV) guest blogs.  We are inviting members to submit content for our blog page.  The content will be focused on your niche/industry where you can add a new POV for the MTC audience. Our goal is to position you as an authority and well-known name in the industry. And for us, we will have fresh new content for the page and get new readers to our blogger community.  It’s simple and a win-win.  We will have numerous categories that you can write articles for; those will be available in the next few weeks.  We are kicking off the Member POV blogs during Cyber Security Awareness month in October.  If you are interested in submitting a blog on that topic, please let me know and we will get you started.

  • Become a Familiar Face in the Community

    Maryland Tech Council is revitalizing the “member spotlight” that is featured in the VIBE E-newsletter. We now offer the opportunity to feature you, the member, through our new and exciting video blog or vlog.  The video will be 30-45 seconds, prerecorded at our offices, about your company. We will then feature the vlog in our monthly VIBE E-newsletter.  The vlogs allow us to distribute the member spotlight through other formats such as twitter, Facebook, etc. to get you more exposure.  I mean, we are the Tech Council, right?  


Remember, everyone in your company is a member of MTC. Please share this important information with your team.

Warm Wishes,

Michelle Ferrone
EVP, Operations
Maryland Tech Council