Papers

My research is supported in part by the Simons Foundation. Previously, my research was supported in part by the National Science Foundation (CAREER award DMS-1653017; DMS-1405746) and the National Institutes of Health (R01 GM123993).

Preprints

Kwon, O., Mukherjee, G., and Bien, J. (2024), Semi-Supervised Learning of Noisy Mixture of Experts Models [pdf]
Dharamshi, A., Neufeld, A., Gao, L., Bien, J., and Witten, D. (2024b), Decomposing Gaussians with Unknown Covariance [pdf] [software]
Neufeld, A., Dharamshi, A., Gao, L., Witten, D., and Bien, J. (2024), Discussion of “Data Fission: Splitting a Single Data Point” [pdf] [software]
Perry, R., Panigrahi, S., Bien, J., and Witten, D. (2024), Inference on the Proportion of Variance Explained in Principal Component Analysis [pdf]
Faletto, G., and Bien, J. (2022), Cluster Stability Selection [pdf] [software]
Yu, G., Bien, J., and Tibshirani, R. (2019b), Reluctant Interaction Modeling [pdf] [software]
Bien, J. (2016), Simulator: An Engine to Streamline Simulations, arXiv preprint arXiv:1607.00021 [pdf] [website]

Publications

Bien, J., and Mukherjee, G. (2024), Generative AI for Data Science 101: Coding Without Learning to Code, accepted to Journal of Statistics and Data Science Education [pdf] [website]
Javanmard, A., Shao, S., and Bien, J. (2024), Prediction Sets for High-Dimensional Mixture of Experts Models, accepted to Journal of the Royal Statistical Society, Series B [pdf]
Dharamshi, A., Neufeld, A., Motwani, K., Gao, L., Witten, D., and Bien, J. (2024a), Generalized Data Thinning Using Sufficient Statistics, accepted to Journal of the American Statistical Association [pdf] [software]
Shao, S., Bien, J., and Javanmard, A. (2024), Controlling the False Split Rate in Tree-Based Aggregation, accepted to Journal of the American Statistical Association [pdf] [software]
Saha, A., Witten, D., and Bien, J. (2024), Inferring Independent Sets of Gaussian Variables After Thresholding Correlations, accepted to Journal of the American Statistical Association [pdf] [software]
Faletto, G., and Bien, J. (2023), Predicting Rare Events by Shrinking Towards Proportional Odds, International Conference on Machine Learning 2023 [pdf] [software]
Hyun, S., Rolf Cape, M., Ribalet, F., and Bien, J. (2023), Modeling Cell Populations Measured by Flow Cytometry with Covariates Using Sparse Mixture of Regressions, Annals of Applied Statistics, 17, 357–377 [pdf] [software]
Kaplan, A., and Bien, J. (2023), Interactive Exploration of Large Dendrograms with Prototypes, The American Statistician, 77, 201–211 [pdf] [software] [code for paper examples]
Jönsson, B. F., Follett, C., Bien, J., Dutkiewicz, S., Hyun, S., Kulk, G., Forget, G., Müller, C., Racault, M.-F., Hill, C. N., and others (2023), Using Probability Density Functions to Evaluate Models (PDFEM, V1. 0) to Compare a Biogeochemical Model with Satellite Derived Chlorophyll, Geoscientific Model Development, 16, 4639–4657 [pdf]
Reynolds, R., Hyun, S., Tully, B., Bien, J., and Levine, N. M. (2023), Identification of Microbial Metabolic Functional Guilds from Large Genomic Datasets, Frontiers in Microbiology, 14 [pdf]
Wilms, I., Basu, S., Bien, J., and Matteson, D. S. (2023), Sparse Identification and Estimation of Large-Scale Vector Autoregressive Moving Averages, Journal of the American Statistical Association, 118, 571–582 [pdf] [software]
Wilms, I., and Bien, J. (2022), Tree-Based Node Aggregation in Sparse Graphical Models, The Journal of Machine Learning Research, 23, 11078–11113 [pdf] [software]
Gao, L. L., Bien, J., and Witten, D. (2022a), Selective Inference for Hierarchical Clustering, accepted to Journal of the American Statistical Association [pdf] [website] [software]
Ray, E., Brooks, L., Bien, J., Biggerstaff, M., Bosse, N., Bracher, J., Cramer, E., Funk, S., Gerding, A., Johansson, A., Rumack, A., Wang, Y., Zorn, M., Tibshirani, R., and Reich, N. (2022), Comparing Trained and Untrained Probabilistic Ensemble Forecasts of COVID-19 Cases and Deaths in the United States, International Journal of Forecasting [pdf]
Hyun, S., Mishra, A., Follett, C., Johnsson, B., Kulk, G., Forget, G., Racault, M.-F., Jackson, T., Dutkiewicz, S., Müller, C., and Bien, J. (2022), Ocean Mover’s Distance: Using Optimal Transport for Analyzing Oceanographic Data, Proceedings of the Royal Society A, 478 [pdf] [software]
Cramer, E.,..., Bien, J.,..., and Reich, N. (2022), Evaluation of Individual and Ensemble Probabilistic Forecasts of COVID-19 Mortality in the US, Proceedings of the National Academy of Sciences, 119 [pdf]
Gao, L. L., Witten, D., and Bien, J. (2022b), Testing for Association in Multiview Network Data, Biometrics, 78, 1018–1030 [pdf] [software] [code to reproduce all results]
Yu, G., Witten, D., and Bien, J. (2022), Controlling Costs: Feature Selection on a Budget, Stat, 11, e427 [pdf]
McDonald, D., Bien, J., Green, A., Hu, A., DeFries, N., Hyun, S., Oliveira, N., Sharpnack, J., Tang, J., Tibshirani, R., Ventura, V., Wasserman, L., and Tibshirani, R. (2021), Can Auxiliary Indicators Improve COVID-19 Forecasting and Hotspot Prediction?, Proceedings of the National Academy of Sciences, 118 [pdf] [supplement] [code to reproduce all results]
Reinhart, A.,..., Bien, J.,..., Rosenfeld, R., and Tibshirani, R. (2021), An Open Repository of Real-Time COVID-19 Indicators, Proceedings of the National Academy of Sciences, 118 [pdf] [supplement] [code to reproduce all results]
Bien, J., Yan, X., Simpson, L., and Müller, C. (2021), Tree-Aggregated Predictive Modeling of Microbiome Data, Scientific Reports, 11 [pdf] [software] [code to reproduce all results]
Yan, X., and Bien, J. (2021), Rare Feature Selection in High Dimensions, Journal of the American Statistical Association, 116, 887–900 [pdf] [software] [vignette]
Nicholson, W. B., Wilms, I., Bien, J., and Matteson, D. S. (2020), High Dimensional Forecasting via Interpretable Vector Autoregression, The Journal of Machine Learning Research, 21, 6690–6741 [pdf] [software]
Chen, S., and Bien, J. (2020), Valid Inference Corrected for Outlier Removal, Journal of Computational and Graphical Statistics, 29, 323–334 [pdf] [software]
Gao, L. L., Bien, J., and Witten, D. (2020), Are Clusterings of Multiple Data Views Independent?, Biostatistics, 21, 692–708 [pdf] [software] [code to reproduce all results]
Yu, G., Bien, J., and Witten, D. (2019a), Discussion of “Covariate-Assisted Ranking and Screening for Large-Scale Two-Sample Inference”, Journal of the Royal Statistical Society, Series B [pdf] [supplement]
Yu, G., and Bien, J. (2019), Estimating the Error Variance in a High-Dimensional Linear Model, Biometrika, 106, 533–546 [pdf] [journal] [software] [vignette]
Bien, J. (2019), Graph-Guided Banding of the Covariance Matrix, Journal of the American Statistical Association, 114, 782–792 [pdf] [software]
Bien, J., Gaynanova, I., Lederer, J., and Müller, C. L. (2019), Prediction Error Bounds for Linear Regression with the TREX, Test, 28, 451–474 [pdf]
Bien, J., Gaynanova, I., Lederer, J., and Müller, C. L. (2018), Non-Convex Global Minimization and False Discovery Rate Control for the TREX, Journal of Computational and Graphical Statistics, 27, 23–33 [pdf] [software]
Yan, X., and Bien, J. (2017), Hierarchical Sparse Modeling: A Choice of Two Group Lasso Formulations, Statistical Science, 32, 531–560 [pdf] [software]
Yu, G., and Bien, J. (2017), Learning Local Dependence in Ordered Data, Journal of Machine Learning Research, 18, 1–60 [pdf] [software] [vignette]
Wilms, I., Basu, S., Bien, J., and Matteson, D. S. (2017), Interpretable Vector AutoRegressions with Exogenous Time Series, arXiv preprint arXiv:1711.03623 [pdf]
Nicholson, W. B., Matteson, D. S., and Bien, J. (2017), VARX-L: Structured Regularization for Large Vector Autoregressions with Exogenous Variables, International Journal of Forecasting, 33, 627–651 [pdf] [software]
Lou, Y., Bien, J., Caruana, R., and Gehrke, J. (2016), Sparse Partially Linear Additive Models, Journal of Computational and Graphical Statistics, 25, 1126–1140 [pdf] [software]
Bien, J., Bunea, F., and Xiao, L. (2016), Convex Banding of the Covariance Matrix, Journal of the American Statistical Association, 111, 834–845 [pdf] [software] [vignette]
Bien, J., and Witten, D. (2016), Penalized Estimation in Complex Models, in Handbook of big data, eds. P. Bühlmann, P. Drineas, M. Kane, and M. van der Laan [link]
Bien, J., Simon, N., and Tibshirani, R. (2015), Convex Hierarchical Testing of Interactions, Annals of Applied Statistics, 9, 27–42 [pdf] [supplement] [software]
Bien, J., Taylor, J., and Tibshirani, R. (2013), A Lasso for Hierarchical Interactions, Annals of Statistics, 41, 1111–1141 [pdf] [software]
Bien, J., and Wegkamp, M. (2013), Discussion of “Correlated Variables in Regression: Clustering and Sparse Estimation”, Journal of Statistical Planning and Inference, 143, 1859–1862 [pdf]
Bien, J., and Tibshirani, R. (2011a), Hierarchical Clustering with Prototypes via Minimax Linkage, Journal of the American Statistical Association, 106, 1075–1084 [pdf] [software]
Bien, J., and Tibshirani, R. (2011c), Sparse Estimation of a Covariance Matrix, Biometrika, 98, 807–820 [pdf] [software]
Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., and Tibshirani, R. J. (2012), Strong Rules for Discarding Predictors in Lasso-Type Problems, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74, 245–266 [pdf]
Bien, J., and Tibshirani, R. (2011b), Prototype Selection for Interpretable Classification, Annals of Applied Statistics, 5, 2403–2424 [pdf] [software]
Moraveji, N., Russell, D., Bien, J., and Mease, D. (2011), Measuring Improvement in User Search Performance Resulting From Optimal Search Tips, in Proceedings of the 34th international ACM SIGIR conference on research and development in information [abstract]
Bien, J., Xu, Y., and Mahoney, M. (2010), CUR from a Sparse Optimization Viewpoint, in Advances in neural information processing systems 23 [pdf]