Big data is a big deal
Due to the proliferation of smart devices like phones, televisions, computers, and sensors of all types, be it in buildings, planes or weather stations, a zettabyte of data--a billion terabytes, more than the entire existing record of civilization to this date--is being generated every day. The motto of the now-infamous Cambridge Analytica’s website as true as it ever was: Data drives all that we do.
The trick is making sense of it all. More and powerful computers aren’t the sole answer. The challenge is how to effectively manage and interpret all this data, in improved statistical and computational methods. In short, the world needs a better algorithm. A qualitative approach is required, one that combines the computational powers of the world’s most advanced devices with the superior analytical selectivity of the human mind.
SUNY is at the forefront of creating solutions in this growing field, employing expert faculty, developing top-level business partnerships and building leading edge facilities for conducting research in areas such as energy, healthcare, life sciences and social analysis.
The National Science Foundation has recognized many of the following SUNY faculty members with CAREER awards, which support the early career activities of teacher-scholars who show promise in research and education.
We are visual animals. The vast majority of the sensory information sent to the brain is received through the eyes. Visualization is becoming increasingly useful in the big data field, because information is being generated so quickly that it can’t be handled by standard manual techniques alone.
A research assistant professor in the Department of Computer Science at Stony Brook University and a computer scientist in the Computational Science Initiative (CSI) at the US Department of Energy’s (DOE) Brookhaven National Laboratory, Wei Xiu has developed visual analytics tools to empower scientific analysis. One of her inventions is a color-coded visualization system for large datasets that can be zoomed and panned like Google Maps. This allows users to find details that might otherwise be noticeable, and to focus on specific areas of interest.
“Even subtle differences that are hard to identify in separate image displays, such as differences in elemental ratios, can be picked up with our tool—a capability essential for new scientific discovery,” she says.
At SUNY Polytechnic Institute, Professor of Nanobioscience Dr. Nate Cady is developing neuromorphic circuits that could speed up the process of examining large data sets by enabling greater processing and computational power. Inspired by the way neurons and synapses connect and form networks in the human brain, Cady’s nanoengineered circuits have potential applications in pattern recognition and data mining as well as in artificial intelligence (AI), autonomous control and navigation.
The simplest mistakes — from data entry errors to malfunctioning sensors — can skew datasets. An assistant professor in the Department of Computer Science and Engineering at the University at Buffalo, Oliver Kennedy is creating software to ‘clean’ large datasets, ensuring the efficient use of big data. Kennedy’s CAREER award will aid in the refinement of his software tool--called Mimir--which aggressively pinpoints data errors and helps users make more effective decisions.
Feng Chen, an assistant professor of computer science at the University of Albany, is working to construct a comprehensive theoretical framework for discovering complex patterns in big data. This can be a valuable tool in fields ranging from social events to natural disasters to civil unrest to disease outbreaks to cyberattacks. Chen’s CAREER award will support the further development of this framework.
Research and educational efforts are currently underway to make the energy grid smarter and more efficient. At Stony Brook University, the Science Training and Research to Inform DEcisions (STRIDE) program is an innovative program that trains STEM graduate students in interdisciplinary skills to aid, create, and eventually lead in the interpretation of complex data-enabled research into effective decisions and sound energy policies. In addition to training in advanced data analytics and visualization, STRIDE students are educated in science communication, understanding stakeholder perspectives, and deciphering scientific uncertainty.
Genomic medicine, a critical component of precision medicine, relies on the power of supercomputing to analyze large scale genetic and clinical data sets and provide insights that can lead to individualized treatment strategies. SUNY Downstate Medical Center researchers Professors Dr. Carlos N. Pato and Dr. Michele T. Pato have led efforts to develop an extensive genetic data resource, the Genomic Psychiatry Cohort, to study multiple highly heritable diseases, including schizophrenia, bipolar disorder, and Obsessive Compulsive Disorder.
The Patos began developing the cohort while at Stony Brook University (1990-1992). To date, over 40,000 individuals have been enrolled in the cohort, and the Patos, who have established the Institute for Genomic Health at Downstate, plan to add an additional twenty thousand enrollees of African American heritage with the help of Downstate’s Center for Health Disparities and funding from the National Institute of Mental Health. The expansion will allow the Patos and researchers from over 15 participating institutions, including Stony Brook University, Upstate Medical Center, and the University at Buffalo, to better understand the genetic variations that influence the risk of schizophrenia, using strategic combinations of genome sequencing and other custom array analyses.
Chemistry is essential in curing diseases, solving energy problems, and solving environmental problems. Currently the challenge is to solve contemporary challenges as the field expands to include issues previously dealt with through biology, materials science, and nanotechnology.
Chemistry breakthroughs are traditionally achieved through an often lengthy and expensive hit-and-miss process. Johannes Hachmann, assistant professor in the Department of Chemical and Biological Engineering at the University at Buffalo, is brainstorming a cyberinfrastructure that brings big data methodology to chemical research. By streamlining the chemical innovation process, his idea will help contribute to economic development worldwide. Dr. Hachmann received a CAREER award for this project, titled "Building an Advanced Cyberinfrastructure for the Data-Driven Design of Chemical Systems and the Exploration of Chemical Space.”
Thanks to the exponential growth of social media, social data analysis is a booming field. It relies on data-driven analysis of how people interact socially, often with data obtained from social networking services. Whether attempting to understand human behavior or market to a target audience, the process involves understanding the flow of data through a network, pinpointing critical nodes, or generating trending topics.
If you rant about the snow on Twitter, it can ease traffic on slippery roads. It’s true. Jing Gao, PhD, associate professor in the Department of Computer Science and Engineering, University at Buffalo, coauthored a study with researchers Adel Sadek, Qing He, Ming Ni and Lei Lin on how weather-related tweets can be analyzed to enhance computer models that in turn can calculate safe driving speeds as well as which roads to avoid. Her CAREER award will help her continue to work on creating new systems for mining data from mobile devices and social media, in hope of offering affordable, robust solutions to improve transportation, health care and other industries.
The growing field of materials science involves studying the structure of materials and relating them to their properties. Many new advanced materials are currently being developed, including nanomaterials, biomaterials, and energy materials.
Headed by Leading Professor and Chairperson, Dr. Arie E. Kaufman, The Visualization Lab of the Department of Computer Science at Stony Brook focuses on the development of volume visualization techniques used in scientific visualization and virtual reality applications. A major current project is a comprehensive visualization tool known as the VolVis system, which gives scientists and engineers a powerful analytical and management tool in fields ranging from geophysics to the biomedical sciences.
Cancer research requires more and more advanced equipment for the gathering and analysis of data. Recent upgrades to the technology at the University of Albany’s Cancer Research Center includes an advanced mass spectrometer for protein analysis and instrumentation for monitoring changes in DNA and RNA. The instrumentation is capable of handling massive amounts of information. Researchers at the Cancer Research Center use these tools as they study the genetics of breast and prostate cancer. The devices will also be used for training the next generation of scientists.
At a time when the incidence of liver diseases, including liver cancer, is spiking in the U.S., UB is harnessing big data to promote earlier detection and treatment. Andrew H. Talal, MD, professor in the Department of Medicine and Marianthi Markatou, PhD, professor in the Department of Biostatistics, are co-principal investigators on a $3 million grant from the Troup Fund of the Kaleida Health Foundation. They will utilize the techniques of big data to analyze patterns of liver disease among specific populations in order to develop better ways to screen and identify patients with chronic liver disease. Markatou is funded to develop new methods that are able to extract credible information from big data and enable the development of population health algorithms.
Wherever big data is employed, SUNY researchers are influencing the development of advanced methods for the gathering, analysis and interpretation of these colossal datasets, from Wei Xiu’s visualization color mapping tool to Jing Gao’s Twitter dataset techniques. Leading edge SUNY facilities enable advanced research, while corporate partnerships speed the flow of ideas to market, and help foster the big data revolution. As John Quackenbush of the Dana Farber Cancer Institute said, “From Kepler using Tycho Brahe’s data to build a heliocentric model of the solar system, to the birth of statistical quantum mechanics, to Darwin’s theory of evolution, to the modern theory of the gene, every major scientific revolution has been driven by one thing, and that is data.”
comments powered by Disqus