About
I always seek self-motivated undergraduate,
master's, and Ph.D. intern students with strong programming skills. Please email me your CV if
you are interested in working with me.
Experience
-
Microsoft
- M365 Research Senior Researcher Redmond, Feb, 2024 - Now
- DKI Senior Researcher Beijing, Jul, 2023 - Jan, 2024
- MSRA Researcher Beijing, Jul, 2021 - Jun, 2023
- Georgia Tech Visiting Scholar Atlanta, Sep, 2019 - Aug, 2020
- Alibaba
Research Intern Beijing, Dec, 2018 - Aug, 2019
-
Sogou
Research Intern Beijing, Dec, 2016 - Apr, 2018
Publications
* corresponding author,
+ equal contribution, 📜 paper, 💻 code, 💽 dataset, and 📎 BibTeX.
Survey/Benchmark
- FSE'25 OpsEval: A Comprehensive Benchmark Suite for Evaluating Large Language
Models’ Capability in IT Operations Domain
📜
Yuhe Liu, Changhua Pei, Longlong Xu, Bohan Chen, Mingze Sun, Zhirui Zhang, Yongqian Sun, Shenglin Zhang, Kun Wang,
Haiming Zhang, Jianhui Li, Gaogang Xie, Xidao Wen, Xiaohui Nie, Minghua Ma, Dan Pei
- MLSys'25 AIOpsLab: A Holistic Framework for Evaluating AI Agents for Enabling Autonomous Cloud
📜
📎
Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma*, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, Saravan Rajmohan
- TMLR'25 Large Language Model-Brained GUI Agents: A Survey
📜
📎
Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
- TOSEM'25 Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis
📜
📎
Shenglin Zhang, Sibo Xia, Wenzhao Fan, Binpeng Shi, Xiao Xiong, Zhenyu Zhong, Minghua Ma, Yongqian Sun, Dan Pei
- A Survey of Time Series Anomaly Detection Methods in the AIOps Domain
📜
📎
Zhenyu Zhong and Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang,
Yongqian Sun, Qingwei Lin, Yuzhi Zhang, Dan Pei
- Constructing Large-Scale Real-World Benchmark Datasets for AIOps
📜
💽
📎
Zeyan Li, Nengwen Zhao, Shenglin Zhang, Yongqian Sun, Pengfei Chen, Xidao Wen,
Minghua Ma, Dan Pei
Conference
- ASE'25 Triangle: Empowering Incident Triage with Multi-Agent
📜
Zhaoyang Yu, Aoyang Fang, Minghua Ma*, Jaskaran Singh Walia, Chaoyun Zhang, Shu Chi, Ze Li, Murali Chintalapati, Xuchao Zhang, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Shenglin Zhang, Dan Pei, Pinjia He
- ISSRE'25 Too Many Cooks: Assessing the Need for Multi-Source Data in Microservice Failure Diagnosis
📜
Shenglin Zhang, Xiaoyu Feng, Runzhou Wang, Minghua Ma, Wenwei Gu, Yongqian Sun, Zedong Jia, Jinrui Sun and Dan Pei
- ISSRE'25 An Empirical Study of Production Incidents in Generative AI Cloud Services
📜
Haoran Yan, Yinfang Chen, Minghua Ma*, Ming Wen, Shan Lu, Shenglin Zhang, Tianyin Xu, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Chaoyun Zhang, Dongmei Zhang
- KDD'25 Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection
📜
Jun Liu, Chaoyun Zhang, Jiaxu Qian, Minghua Ma, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
- NAACL'25 UFO: A UI-Focused Agent for Windows OS Interaction
📜
📎
Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu
Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
- SANER'25 AIOpsArena: Scenario-Oriented Evaluation and Leaderboard for AIOps Algorithms in Microservices
Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang, Yuhe Ji, Lu Zhang, Wen Long, Yongnan Luo, Hengmao Chen, Dan Pei
- ICDE'25 AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models
📜
Chaoyun Zhang, Zicheng Ma, Yuhao Wu, Shilin He, Si Qin, Minghua Ma, Xiaoting Qin, Yu Kang, Yuyi Liang, Xiaoyu Gou, Yajie Xue, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
- SoCC'24 Building AI Agents for Autonomous Clouds: Challenges and Design Principles
📜
📎
Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, Saravan Rajmohan
- SoCC'24 Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure
📜
📎
Chaoyun Zhang, Randolph Yao, Si Qin, Ze Li, Shekhar Agrawal, Binit Mishra, Tri Tran, Minghua Ma, Qingwei Lin, Murali Chintalapati, Dongmei Zhang
- ASE'24 End-to-End AutoML for Unsupervised Log Anomaly Detection
📎
Shenglin Zhang, Yuhe Ji, Jiaqi Luan, Xiaohui Nie, Zi’ang Chen, Minghua Ma, Yongqian Sun, Dan Pei
- ASE'24 ART: A Unified Unsupervised Framework for Incident Management in Microservice Systems
📎
Yongqian Sun, Binpeng Shi, Mingyu Mao, Minghua Ma, Sibo Xia, Shenglin Zhang, Dan Pei
- ISSRE'24 Enhanced Fine-Tuning of Lightweight Domain-Specific Q&A model Based on Large Language Models
📎
Shenglin Zhang, Pengtian Zhu, Minghua Ma, Jiagang Wang, Yongqian Sun, Dongwen Li, Jingyu Wang, Qianying Guo, Xiaolei Hua, Lin Zhu, Dan Pei
- ISSRE'24 Early Bird: Ensuring Reliability of Cloud Systems Through Early Failure Prediction
📎
Yudong Liu, Minghua Ma*, Pu Zhao, Tianci Li, Bo Qiao, Shuo Li, Ze Li, Murali Chintalapati, Yingnong Dang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
🏆 IEEE the 35th ISSRE Best Industry Paper Candidates
- ISSRE'24 Large Language Models Can Provide Accurate and Interpretable Incident Triage
📎
Zexin Wang, Jianhui Li, Minghua Ma*, Ze Li, Yu Kang, Chaoyun Zhang, Chetan Bansal, Murali Chintalapati, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang, Changhua Pei, Gaogang Xie
- ISSRE'24 Can We Trust Auto-Mitigation? Improving Cloud Failure Prediction with Uncertain Positive Learning
📎
Haozhe Li, Minghua Ma*, Yudong Liu, Pu Zhao, Shuo Li, Lingling Zheng, Ze Li, Murali Chintalapati, Yingnong Dang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
- KDD'24 Pre-trained KPI Anomaly Detection Model Based on
Disentangled Transformer
📎
Zhaoyang Yu, Changhua Pei, Xin Wang, Minghua Ma, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang, Xidao Wen, Jianhui li, Gaogang Xie, Dan Pei
- KDD'24 Microservice Root Cause Analysis With Limited
Observability Through Intervention Recognition in the Latent Space
📎
Zhe Xie, Shenglin Zhang, Yitong Geng, Yao Zhang, Minghua Ma, Xiaohui Nie, Zhenhe Yao, Longlong Xu, Yongqian Sun, Wentao Li, Dan Pei
- ACL'24 Everything of Thoughts: Defying the Law of Penrose
Triangle for Thought Generation
📜
📎
Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Wei Zhang, Si Qin, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
- FSE'24 MonitorAssistant: Simplifying Cloud Service Monitoring via
Large Language Models
📜
📎
Zhaoyang Yu, Minghua Ma*, Chaoyun Zhang, Si Qin, Yu Kang, Chetan Bansal,
Saravan Rajmohan, Yingnong Dang, Changhua Pei, Dan Pei, Qingwei Lin, Dongmei Zhang
- FSE'24 Automated Root Causing of Cloud Incidents using In-Context
Learning with GPT-4
📜
📎
Xuchao Zhang, Supriyo Ghosh, Chetan Bansal, Rujia Wang, Minghua Ma, Yu Kang,
Saravan Rajmohan
- TheWebConf'24 Revisiting VAE for Unsupervised Time
Series Anomaly Detection: A Frequency Perspective
📜
💻
📎
Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei,
Saravan Rajmohan, Dongmei Zhang, Qingwei Lin, Haiming Zhang, Jianhui li, Gaogang Xie
- VLDB'24 ImDiffusion: Imputed Diffusion Models for Multivariate
Time
Series Anomaly Detection
📜
💻
📎
Yuhang Chen, Chaoyun Zhang, Minghua Ma, Yudong Liu, Ruomeng Ding, Bowen Li,
Shilin He, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
- ICSE'24 Xpert: Empowering Incident Management with Query
Recommendations via Large Language Models
📜
📎
Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu
Kang,
Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
- EuroSys'24 Automatic Root Cause Analysis via Large Language
Models for Cloud Incidents
📜
📎
Yinfang Chen, Huaibing Xie, Minghua Ma*, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, Jun Zeng, Supriyo Ghosh, Xuchao Zhang, Chaoyun Zhang,
Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
- ISSRE'23 CODEC: Cost-Effective Duration Prediction System for
Deadline Scheduling in the Cloud
📜
📎
Haozhe Li, Minghua Ma, Yudong Liu, Si Qin, Bo Qiao, Randolph Yao,
Harshwardhan
Chaturvedi, Tri Tran, Murali Chintalapati, Saravan Rajmohan, Qingwei Lin and Dongmei Zhang
- FSE'23 Assess and Summarize: Improve Outage Understanding with
Large Language Models
📜
📎
Pengxiang Jin+, Shenglin Zhang+, Minghua Ma, Haozhe
Li, Yu
Kang, Liqun Li, Yudong Liu, Bo Qiao, Chaoyun Zhang, Pu Zhao, Shilin He, Federica Sarro,
Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
- FSE'23 Detection is Better Than Cure - A Cloud Incidents
Perspective
📜
📎
Vaibhav Ganatra, Anjaly Parayil, Supriyo Ghosh, Yu Kang, Minghua Ma, Chetan
Bansal, Suman Nath, Jonathan Mace
- FSE'23 TraceDiag: Adaptive, Interpretable and Efficient Root
Cause
Analysis on Large-Scale Microservice Systems
📜
📎
Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Xiaomin Wu, Meng
Zhang, Qingjun Chen, Xin Gao, Xuedong Gao, HaoFan, Saravan Rajmohan, Qingwei Lin,
Dongmei Zhang
- KDD'23 Robust Multimodal Failure Detection for
Microservice Systems
📜
📎
Chenyu Zhao+, Minghua Ma+, Zhenyu Zhong, Shenglin
Zhang,
Zhiyuan Tan, Xiao Xiong, LuLu Yu, Jiayi Feng, Yongqian Sun, Yuzhi Zhang, Dan Pei,
Qingwei Lin, Dongmei Zhang
- DSN'23 Characterizing Large-Scale Private and Public Cloud
Workloads
📜
📎
Xiaoting Qin, Minghua Ma, Yuheng Zhao, Jue Zhang, Chao Du, Yudong Liu,
Anjaly Parayil, Chetan Bansal, Saravan Rajmohan, Inigo Goiri, Eli Cortez, Si Qin,
Qingwei Lin, Dongmei Zhang
- TheWebConf'23 EDITS: An Easy-to-difficult Training Strategy for
Cloud
Failure Prediction
📜
📎
Tianci Li, Pu Zhao, Yudong Liu, Minghua Ma, Lingling Zheng, Murali
Chintalapati, Bo Liu, Paul Wang, Hongyu Zhang, Yingnong Dang, Saravan Rajmohan, Qingwei
Lin, Dongmei Zhang
- ICSE-SEIP'23 Aegis: Attribution of Control Plane Change Impact
across
Layers and Components for Cloud Systems
📜
📎
Xiaohan Yan, Ken Hsieh, Yasitha Liyanage, Minghua Ma, Murali Chintalapati,
Qingwei Lin, Yingnong Dang, Dongmei Zhang
- ICSE-SEIP'23 TraceArk: Towards Actionable Performance Anomaly
Alerting for Online Service Systems
📜
📎
Zhengran Zeng, Yuqun Zhang, Yong Xu, Minghua Ma, Bo Qiao, Wentao Zou,
Qingjun Chen, Meng Zhang, Xu Zhang, Hongyu Zhang, Xuedong Gao, Hao Fan, Saravan
Rajmohan, Qingwei Lin, Dongmei Zhang
- ICSE-SEIP'23 CONAN: Diagnosing Batch Failures for Cloud Systems
📜
📎
Liqun Li, Xu Zhang, Shilin He, Yu Kang, Hongyu Zhang, Minghua Ma, Yingnong
Dang, Zhangwei Xu, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
- FSE'22 An Empirical Investigation of Missing Data Handling in
Cloud
Node Failure Prediction
📜
📎
Minghua Ma, Yudong Liu, Yuang Tong, Haozhe Li, Pu Zhao, Yong Xu, Hongyu
Zhang, Shilin He, Lu Wang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin
- FSE'22 An Empirical Study of Log Analysis at Microsoft
📜
📎
Shilin He, Xu Zhang, Pinjia He, Yong Xu, Liqun Li, Yu Kang, Minghua Ma,
Yining Wei, Yingnong Dang, Saravan Rajmohan, Qingwei Lin
- KDD'22 Multi-task Hierarchical Classification for Disk Failure
Prediction in Online Service Systems
📜
📎
Yudong Liu, Hailan Yang, Pu Zhao, Minghua Ma, Chengwu Wen, Hongyu Zhang,
Chuan Luo, Qingwei Lin, Chang Yi, Jiaojian Wang, Chenjian Zhang, Paul Wang, Yingnong
Dang,
Saravan Rajmohan, Dongmei Zhang
- TheWebConf'22 UniParser: A Unified Log Parser for Heterogeneous
Log
Data
📜
📎
Yudong Liu, Xu Zhang, Shilin He, Hongyu Zhang, Liqun Li, Yu Kang, Yong Xu, Minghua
Ma, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang
- DEXA'22 Mining Fluctuation Propagation Graph among Time Series
with
Human-in-the-Loop
📜
📎
Mingjie Li, Minghua Ma, Xiaohui Nie, Kanglin Yin, Li Cao, Xidao Wen, Zhiyun
Yuan, Duogang Wu, Guoying Li, Wei Liu, Xin Yang, Dan Pei
- ATC'21 Jump-Starting Multivariate Time Series Anomaly Detection
for
Online Service Systems
📜
💻
📎
Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin,
Xiaohui Nie, Bo Zhou, Yong Wang, Dan Pei
- VLDB'20 Diagnosing Root Causes of Intermittent Slow Queries in
Cloud
Databases
📜
📎
Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng,
Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen,
Dan Pei
- ISSRE'18 Robust and Rapid Adaption for Concept Drift in Software
System Anomaly Detection
📜
📎
Minghua Ma, Shenglin Zhang, Dan Pei, Xin Huang, Hongwei Dai
🏆 IEEE the 29th ISSRE Best Research Paper
- IWQoS'17 You Can Hide, but Your Periodic Schedule Can’t
📜
📎
Minghua Ma, Kai Zhao, Kaixin Sui, Lei Xu, Yong Li, Dan Pei
- IWQoS'16 Your Trajectory Privacy Can Be Breached Even If You Walk
in
Groups
📜
📎
Kaixin Sui, Youjian Zhao, Dapeng Liu, Minghua Ma, Lei Xu, Li Zimu, Dan Pei
- UbiComp'16 EDUM: Classroom Education Measurements via Large-scale
WiFi
Networks
📜
📎
Mengyu Zhou, Minghua Ma, Yangkun Zhang, Kaixin Sui, Dan Pei, Thomas
Moscibroda
- MobiSys'16 Characterizing and Improving WiFi Latency in
Large-Scale
Operational Networks
📜
📎
Kaixin Sui, Mengyu Zhou, Dapeng Liu, Minghua Ma, Dan Pei
- INFOCOM'16 WiFi can Be the Weakest Link of Round Trip Network
Latency
📜
📎
Changhua Pei, Youjian Zhao, Guo Chen, Ruming Tang, Yuan Meng, Minghua Ma,
Ken Ling, Dan Pei
Journal
- TSC'25 Bridging Edge and Cloud: A Knowledge-Enhanced Framework for Efficient Time Series Anomaly Detection
Shenglin Zhang, Jiacheng Zhang, Guohua Liu, Shiqi Chen, Chenyu Zhao, Minghua Ma, Yutong Chen, Yongqian Sun, Dan Pei
- TOSEM'25 Enhanced Fairness Testing via Generating Effective Initial Individual Discriminatory
Instances
📜
📎
Zhao Tian, Minghua Ma, Max Hort, Federica Sarro, Hongyu Zhang
- TSC'23 Robust Failure Diagnosis of Microservice System
through Multimodal Data
📜
📎
Shenglin Zhang, Pengxiang Jin, Zihan Lin, Yongqian Sun, Bicheng Zhang, Sibo Xia,
Zhengdan Li, Zhenyu Zhong, Minghua Ma, Wa Jin, Dai Zhang, Zhenyu Zhu,
Dan Pei
- TNSM'19 Automatic and Generic Periodicity Adaptation for KPI
Anomaly Detection
📜
📎
Nengwen Zhao, Jing Zhu, Yao Wang, Minghua Ma, Wenchi Zhang, Dapeng Liu,
Ming Zhang, Dan Pei
Services
- Organizer
- PC Member
- 2025: FSE, FSE Industry, ASE, ISSRE, KDD, TheWebConf, COLM
- 2024: FSE Industry, ISSRE, APSEC, KDD, TheWebConf, MILETS
- 2023: ASE, KDD, MILETS
- Journal Reviewer