I’m a data scientist at Princeton University, formerly in Computer Science and currently in Ecology and Evolutionary Biology. I’m also affiliated with the Center for Statistics and Machine Learning.
My interests and expertise are in bioinformatics, software engineering, machine learning, and analyzing large-scale datasets using extensive visualizations.
As a senior data scientist, I primarily collaborate as an independent contributor, rotating among labs (~4 month terms to maximize diversity) and queueing projects via networking within the department. Additionally, I:
- develop computational workflows that enable or accelerate several projects
- mentor individual students or postdocs
- teach workshops
- serve as consultant
Highlights
Here, I highlight my technical contributions to selected collaborations in which I take the lead on a modular component. These contributions involve applying my skills mentioned above to various data modalities including spatial transcriptomics, 3D movies of neural activity, and whole-genome sequencing (WGS).
- GASTON: deep neural network to segment domains and study continuous variation in gene expression from spatial transcriptomics data
- refactored code into python package to run at scale and optimize neural network architecture
- feature-ized histology images (H&E stained) to facilitate tissue segmentation
- analyzed several colorectal cancer datasets to characterize metabolic gradients and the tumor microenvironment
- 3D movie analysis: image segmentation to decode neural activity in the mosquito antennal lobe (AL)
- created python workflows to measure activity of the entire AL
- discovered technical batch effects and used the experimental design to correct them via custom statistical models
- segmented individual glomeruli (clusters of nerve endings) within the AL via nonnegative matrix factorization (NMF)
- HATCHet2: copy-number calling (amplifications/deletions of DNA) for tumor WGS data
- wrote several modules to phase genotypes into haplotypes using the 1000GP reference panel
- Docker-ized software for cloud computing on GCP and AWS
- analyzed large datasets in collaboration with the Genomic Data Analysis Network
- gave virtual invited talk at University of Edinburgh’s biomedical AI seminar
- snpArcher: workflow to automate variant calling in nonmodel organisms
- authored original snakemake code and supplementary algorithms to massively parallelize two variant callers
- Tuskless African elephants: poaching drives evolution of tusklessness, a female-specific trait encoded by a male-lethal mutation
- contributed all genomic analyses as co-first author
- led to over 411 news stories from 301 media outlets
- invited guest on the Nice Genes! podcast