We live in a data-laden world. From the smartphone in your pocket and the satellites flying overhead to the demographics of a city and global banking transactions, data production is a constant fount.
Famed statistician John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.”
That statement rings true specifically for Xiao Hui Tai, an assistant professor in the Department of Statistics. Tai specializes in using large-scale, granular sources of data, and statistical and machine learning methods to study problems concerning conflict and the developing world. Her research is interdisciplinary in nature, requiring her to collaborate with colleagues across academic disciplines.
“As a statistician, I try to use tools that would not be readily available to other types of researchers, like people who are more on the qualitative side,” Tai said. “I’ve worked with data from mobile phones, satellite imagery, as well as detailed geospatial data, all of which can present some challenges to those without statistical expertise.”
Her research runs the gamut, from using satellite data to map opium poppy cultivation and understand its relation to socioeconomic outcomes, to using phone data to study how violence displaces people in Afghanistan.
“Increasingly, people are thinking about statistics a little bit more broadly, especially because data science has become quite popular,” Tai said. “Data is not often like you think, like in an Excel sheet, where it’s all nice and well-organized. That’s not what real-world data looks like.”
The effects of air pollution
Recent research from Tai and colleagues, appearing in Communications Earth & Environment, uses satellite-based measurements of fine particulate matter (PM2.5) combined with comprehensive public records to study the effects of pollution on elderly mortality in Chile.
Tai said the study grew out of a student project spearheaded by UC Davis Energy and Efficiency Institute Ph.D. student Pablo Busch for her class STA 250: Topics in Applied and Computational Statistics.
“The innovation here is that in previous studies, typically you need to use physical monitors placed at appropriate locations, but sometimes these monitors might not be easily available throughout an entire country,” Tai said. “For example, in rural areas, it’s less common to have monitors, and also in the developing world, because these monitors are really expensive to set up and maintain.”
Advances in satellite-based measurements have provided a new data-gathering avenue for researchers studying the effects of air pollution. In their study, Tai’s team used data produced by the Atmospheric Composition Analysis Group from the Washington University in St. Louis combined with death certificate records to reveal the relationship between increases in PM2.5 and mortality rates for those over the age of 75.
In the study, the team reported that monthly increases of 10 micrograms per cubic meter in PM2.5 exposure are associated with a 1.7% increase in all-cause mortality for the elderly.
“Methods have been developed that are quite good at using satellite imagery to infer what PM2.5 levels are,” Tai said. “The advantage is that this kind of data is now available throughout an entire geography and over time as well.”
Data and decision-making
Tai’s research emphasizes the importance and innovative potential of interdisciplinary collaborations across campus.
Recently, Tai has partnered with Lauren Peritz, an associate professor in the Department of Political Science; Katheryn Russ, professor and chair of the Department of Economics; and Carl Stahmer, executive director of the UC Davis DataLab, on a project titled “Uncovering the Private Sector Influence on Global Public Health Policy Using Automated Text Analysis.”
The project is the first to receive funding through the L&S Unites Initiative, a Letters & Science program that aims to cultivate interdisciplinary research collaborations.
“In a nutshell, the idea is that we have these emails from lobbyists in the U.S. to trade representatives, and we have country statements at World Health Assembly meetings,” Tai said. “We’re trying to see if the content of these emails is later reflected in what happens in the World Health Organization.”
Accomplishing this requires a natural language processing computer algorithm capable of sifting through thousands of pages of text. According to Tai, this pushes beyond the scope of typical text analysis algorithms, requiring input from domain experts across campus.
“The emphasis on interdisciplinary work on this campus is really unique and it fits with my research interests,” Tai said. “The college is actively trying to support these types of collaborations, which is not always the case in other places.”
YOU MAY ALSO LIKE THESE STORIES
Family Tree Traces Academic Genealogy of UC Davis Statisticians
Sixteen years in the making, the UC Davis Department of Statistics family tree traces the academic lineages of current faculty in the department. Pete Scully shares the inspiration behind the graphic's design.
The Role Machine Learning Plays in Scientific Discovery
At the College of Letters and Science at UC Davis, researchers are using the power of machine learning to help protect us from the next pandemic, discover and build new materials, and explore the myriad galaxies in the heavens above.