Masatoshi Uehara
Biography
I am an incoming tenure-track assistant professor in the computer science department at the University of Wisconsin-Madison (joining 25 Fall). Before joining Wisconsin-Madison, I am currently a postdoctoral scholar at Genentech Research and Early Development hosted by Tommaso Biancalani. I am sometimes around in UC Berkeley hosted by Sergey Levine.
I received a Ph.D. in computer science department at Cornell University (from 2020 to 2023). Prior to this, I was in the Ph.D. program in the statistics department at Harvard University (from 2017 to 2020). I hold a B.E. from the University of Tokyo (from 2013 to 2017).
You can find my Google Scholar profile here. Additionally, my CV is here. My email address is AB136@gmail.com (A=uehara, B=masatoshi). I am no longer using an email when I was at Cornell. I am from Japan, and my name in 漢字 is 上原 (=Uehara = Shàng-yuán) 雅俊 (=Masatoshi=Yǎ-jùn).
My advisor during the Ph.D. program is Nathan Kallus (Cornell). My committee members are Wen Sun (Cornell), Thorsten Joachims (Cornell), Victor Chernozhukov (MIT), Xiao-Li Meng (Harvard), and Nan Jiang (UIUC).
Here are answers to questions I often encounter.
Research interests
My research focuses primarily on reinforcement learning (RL), causal machine learning, and online learning. Currently, my main research interest is developing these ML methods for drug/target discovery. These methods are something like
RL/Contextual bandits/Black-box optimization + Deep generative models (Diffusion models), LLMs
My previous works are summarized as follows:
Sample-Efficient Offline Policy Evaluation: Doubly robust and semiparametrically efficient estimators (1,2), Incorporating deep neural networks with minimax loss (1,2). Slide is here.
Robust Offline RL under Insufficient Coverage: Model-based pessimistic offline RL. Talk is here.
Representation Learning in Interactive Settings: Model-based RL in low-rank MDPs. Talk is here.
Causal Inference + RL: Accounting for Unmeasured Confounders ([1], [2]), Data combination
Integration of Complex ML into Inverse Problems: Semiparametric IV methods to estimate functionals without identification (+how to perform inference), Nonparametric IV methods without identification Slide is here
Sample-Efficient RL in Partially Observable MDPs: OPE with general function approximation, Online RL with general function approximation, Computationally and statistically efficient PAC RL methods. Slide is here.
Publication
Red means I am the co-first/corresponding author. Blue means alphabetical order following the convention. The other papers follow the contribution-based ordering.
Conference Proceedings
Masatoshi Uehara (*), Haruka Kiyohara (*), Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, and Wen Sun. Future-Dependent Value-Based Off-Policy Evaluation in POMDPs. Neurips 2023 (Spotlight). (SLIDE)
Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun. Refined Value-Based Offline RL under Realizability and Partial Coverage. Neurips 2023.
Kiyohara, Haruka, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, and Yuta Saito. "Off-Policy Evaluation of Ranking Policies under Diverse User Behavior." In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1154-1163. 2023.
Runzhe Wu, Uehara, Masatoshi, and Wen Sun. Distributional offline policy evaluation with predictive error guarantees. ICML, 2023. (Code)
Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun. Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings. arXiv preprint arXiv:2206.12081 ICML 2023.
Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, and Masatoshi Uehara. Minimax Instrumental Variable Regression and L2 Convergence Guarantees without Identification or Closedness. COLT 2023.
Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, and Masatoshi Uehara. Inference on strongly identified functionals of weakly identified functions. COLT 2023.
Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee. PAC Reinforcement Learning for Predictive State Representations ICLR 2023
Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun. Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems. Neurips 2022.
Chengchun Shi, Masatoshi Uehara, Jiawei Huang, and Nan Jiang. A minimax learning approach to off-policy evaluation in partially observable markov decision processes. ICML(Long presentation), 2022. (Slide Code)
Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Wen Sun, and Alekh Agarwal. Efficient reinforcement learning in block mdps: A model-free representation learning approach. ICML 2022. Presented at RL THEORY VIRTUAL SEMINAR 2021 by Xuezhou. (Code)
Masatoshi Uehara, Xuezhou Zhang, and Wen Sun. Representation learning for online and offline rl in low-rank mdps. ICLR (Spotlight), 2022. Oral Paper in Ecological Theory of Reinforcement Learning Workshop at Neurips. (Talk Slide)
Masatoshi Uehara and Wen Sun. Pessimistic model-based offline rl: Pac bounds and posterior sampling under partial coverage. ICLR, 2022. Presented at RL THEORY VIRTUAL SEMINAR 2021. (Talk Slide )
Jonathan D Chang (*), Masatoshi Uehara (*), Dhruv Sreenivas, Rahul Kidambi, and Wen Sun. Mitigating covariate shift in imitation learning via offline data without great coverage. Neurips, 2021. (Code)
Nathan Kallus, Yuta Saito, and Masatoshi Uehara. Optimal off-policy evaluation from multiple logging policies. ICML, 2021. (Code)
Yichun Hu, Nathan Kallus, and Masatoshi Uehara. Fast rates for the regret of offline reinforcement learning. COLT, 2021. Presented at RL THEORY VIRTUAL SEMINAR 2021/11/26 by Yichucn. (“Minor Revision” requested from Mathematics of Operations Research)
Masatoshi Uehara (*), Masahiro Kato (*), and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. NeurIPS (Spotlight), 2020. (Talk Code)
Nathan Kallus and Masatoshi Uehara. Doubly robust off-policy value and gradient estimation for deterministic policies. NeurIPS, 2020. (Talk )
Masatoshi Uehara, Jiawei Huang, and Nan Jiang. Minimax weight and q-function learning for off-policy evaluation. ICML, 2020. (Code)
Nathan Kallus and Masatoshi Uehara. Statistically efficient off-policy policy gradients. ICML, 2020.
Nathan Kallus and Masatoshi Uehara. Double reinforcement learning for efficient and robust off-policy evaluation ICML, 2020. (Code )
Masatoshi Uehara, Takeru Matsuda, and Jae Kwang Kim. Imputation estimators for unnormalized models with missing data. AISTATS, 2020.
Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, and Takeru Matsuda. Unified estimation framework for unnormalized models with statistical efficiency. AISTATS, 2020.
Nathan Kallus and Masatoshi Uehara. Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning. NeurIPS, 2019.
Journal Articles
Nathan Kallus and Masatoshi Uehara. Efficient evaluation of natural stochastic policies in offline reinforcement learning. Biometrika, 2023+
Masatoshi Uehara, Danhyang Lee, and Jae Kwang Kim. Semiparametric response model with nonignorable nonresponse. Scandinavian Journal of Statistics, 2023
Nathan Kallus and Masatoshi Uehara. Efficiently breaking the curse of horizon: Double reinforcement learning in infinite-horizon processes. Operations research, 2021. (The version at Informs is here. But there are several typos there. )
Takeru Matsuda, Masatoshi Uehara, and Aapo Hyvarinen. Information criteria for non-normalized models. Journal of Machine Learning Research, 2021.
Nathan Kallus and Masatoshi Uehara. Double reinforcement learning for efficient off-policy evaluation in markov decision processes. Journal of Machine Learning Research, 2020. (Code)
Unpublished Articles Under Revision
Nathan Kallus, Xiaojie Mao, and Masaotshi Uehara. Localized debiased machine learning: Efficient estimation of quantile treatment effects, conditional value at risk, and beyond. arXiv preprint arXiv:1912.12945, 2020. Presented at Online Causal Inference Seminar 2020/9/15. (“Minor Revision” requested from Journal of Machine Learning Research)
Masatoshi Uehara , Chengchun Shi, and Nathan Kallus. An overview of off-policy evaluation in reinforcement learning. arXiv preprint arXiv:2212.06355 (“Minor Revision” requested from Statistical Science )
Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, and Tengyang Xie. Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency. arXiv preprint arXiv:2102.02981, 2021. (Rejection with Resubmission from Annals of Statistics) (SLIDE )
Working Drafts
Zhan, Wenhao., Uehara, Masatoshi, Kallus, Nathan, Lee, Jason D. Lee, Wen Sun. Provable Offline Reinforcement Learning with Human Feedback. arXiv preprint arXiv:2305.14816. (2023)
Zhan, Wenhao, Masatoshi Uehara, Wen Sun, and Jason D. Lee. How to Query Human Feedback Efficiently in RL?. arXiv preprint arXiv:2305.18505 (2023).
Bennett, Andrew, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, and Masatoshi Uehara. Source Condition Double Robust Inference on Functionals of Inverse Problems. arXiv preprint arXiv:2307.13793 (2023)
Nathan Kallus, Xiaojie Mao, and Masatoshi Uehara. Causal inference under unmeasured confounding with negative controls: A minimax learning approach. arXiv preprint arXiv:2103.14029, 2021.
Masatoshi Uehara, Issei Sato, Masahiro Suzuki, Kotaro Nakayama, and Yutaka Matsuo. Generative adversarial nets from a density ratio estimation perspective. arXiv preprint arXiv:1610.02920, 2016.
---- Tutorials/Talks -----
About Dynamic treatment regime
About 深層CRESTミーティング