Masatoshi Uehara

Biography

I am a PhD student at Cornell in computer science department. Previously, I was a PhD student at Harvard in statistics department. I am interested in statistical learning theory and method for sequential decision making.

My google scholar is here. My CV is here.

My interests are reinforcement learning, causal inference, online learning and application to e-commerce.

My advisor is Nathan Kallus (Cornell). My committee members are Wen Sun (Cornell), Thorsten Joachims (Cornell), Victor Chernozhukov (MIT), Xiao-Li Meng (Harvard).

Publication

Red means I am the co-first/corresponding author. Blue means alphabetical order following the convention. The other papers follow the contribution-based ordering.

(Journal Articles )

Nathan Kallus and Uehara, Masatoshi. Efficiently breaking the curse of horizon: Double reinforcement learning in infinite-horizon processes. Operations research, 2021.


Takeru Matsuda, Uehara, Masatoshi, and Aapo Hyvarinen. Information criteria for non-normalized models. JMLR, 2021.


Nathan Kallus and Uehara, Masatoshi. Double reinforcement learning for efficient off-policy evaluation in markov decision processes. JMLR, 2020.

(This is a longer version of Double reinforcement learning for efficient and robust off-policy evaluation )

(Conference Proceedings)


Chengchun Shi, Uehara, Masatoshi, J iawei Huang and Nan Jiang. A minimax learning approach to off-policy evaluation in partially observable markov decision processes. ICML 2022 (Long presentation),


Xuezhou Zhang, Yuda Song, Uehara, Masatoshi, Mengdi Wang, Wen Sun, and Alekh Agarwal. Efficient reinforcement learning in block mdps: A model-free representation learning approach. ICML 2022


Uehara, Masatoshi, Xuezhou Zhang, and Wen Sun. Representation learning for online and offline rl in low-rank mdps. ICLR, 2021 (Spotlight). Oral Paper in Ecological Theory of Reinforcement Learning Workshop at Neurips. Presented at RL THEORY VIRTUAL SEMINAR 2021. (Talk is here )


Uehara, Masatoshi and Wen Sun. Pessimistic model-based offline rl: Pac bounds and posterior sampling under partial coverage. ICLR, 2021. Presented at RL THEORY VIRTUAL SEMINAR 2021. (Talk is here. )


Jonathan D Chang, Uehara, Masatoshi, Dhruv Sreenivas, Rahul Kidambi, and Wen Sun. Mitigating covariate shift in imitation learning via offline data without great coverage. Neurips, 2021.


Nathan Kallus, Yuta Saito, and Uehara, Masatoshi. Optimal off-policy evaluation from multiple logging policies. In ICML, 2021.


Yichun Hu, Nathan Kallus, and Uehara, Masatoshi. Fast rates for the regret of offline reinforcement learning. COLT, 2021. Presented at RL THEORY VIRTUAL SEMINAR 2021/11/26.


Uehara, Masatoshi, Masahiro Kato, and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. In NeurIPS (Spotlight), 2020. (Talk is here)


Nathan Kallus and Uehara, Masatoshi. Doubly robust off-policy value and gradient estimation for deterministic policies. NeurIPS, 2020. (Talks is here )


Uehara, Masatoshi, Jiawei Huang, and Nan Jiang. Minimax weight and q-function learning for off-policy evaluation. In ICML, 2020.


Nathan Kallus and Uehara, Masatoshi. Statistically efficient off-policy policy gradients. In ICML, 2020.


Nathan Kallus and Uehara, Masatoshi. Double reinforcement learning for efficient and robust off-policy evaluation In ICML, 2020


Uehara, Masatoshi, Takeru Matsuda, and Jae Kwang Kim. Imputation estimators for unnormalized models with missing data. In AISTATS, 2020.


Uehara, Masatoshi, Takafumi Kanamori, Takashi Takenouchi, and Takeru Matsuda. Unified estimation framework for unnormalized models with statistical efficiency. AISTATS, 2020.


Nathan Kallus and Uehara, Masatoshi. Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning. NeurIPS, 2019.


(Unpublished Articles)



Uehara, Masatoshi, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, and Tengyang Xie. Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency. arXiv preprint arXiv:2102.02981, 2021.


Nathan Kallus, Xiaojie Mao, and Uehara, Masatoshi. Causal inference under unmeasured confounding with negative controls: A minimax learning approach. arXiv preprint arXiv:2103.14029, 2021.


Nathan Kallus and Uehara, Masatoshi. Efficient evaluation of natural stochastic policies in offline reinforcement learning. arXiv preprint arXiv:2006.03886, 2020.


Nathan Kallus, Xiaojie Mao, and Uehara, Masatoshi. Localized debiased machine learning: Efficient estimation of quantile treatment effects, conditional value at risk, and beyond. arXiv preprint arXiv:1912.12945, 2020. Presented at Online Causal Inference Seminar 2020/9/15.


Uehara, Masatoshi and Jae Kwang Kim. Semiparametric response model with nonignorable nonresponse. arXiv preprint arXiv:1810.12519, 2018.


Uehara, Masatoshi, Issei Sato, Masahiro Suzuki, Kotaro Nakayama, and Yutaka Matsuo. Generative adversarial nets from a density ratio estimation perspective. arXiv preprint arXiv:1610.02920, 2016.