About

    I am enthusiastic about making contributions to the AIOps ecosystem to facilitate collaboration between industry and academia. Together with Prof. Dan Pei, we launched the 4th CCF AIOps Challenge and Workshop from November 2020 to May 2021. Our team has released AIOps tools and benchmarks for multiple AIOps scenarios.
    ๐Ÿ†• Our team has released AIOpsLab, an open-source framework designed to evaluate, improve, and standardize AI agents for automating cloud operations. It provides a reproducible environment for realistic service operations, allowing researchers and engineers to enhance agent performance and capabilities with improved observability.
    I always seek self-motivated undergraduate, master's, and Ph.D. intern students with strong programming skills. Please email me your CV if you are interested in working with me.

Experience

  • Microsoft
    • M365 Research โ€ƒย ย ย ย  Senior Researcher Redmond, Feb, 2024 - ย Now
    • DKI โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒย ย  Senior Researcher Beijing, Jul, 2023 - ย Jan, 2024
    • MSRA โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒย  Researcher Beijing, Jul, 2021 - ย Jun, 2023
  • Georgia Tech โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒย  Visiting Scholar Atlanta, Sep, 2019 - Aug, 2020
  • Alibaba โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒย ย ย  Research Intern Beijing, Dec, 2018 - Aug, 2019
  • Sogou โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒย  Research Intern Beijing, Dec, 2016 - ย Apr, 2018

Publications

* corresponding author, + equal contribution, ๐Ÿ“œ paper, ๐Ÿ’ป code, ๐Ÿ’ฝ dataset, and ๐Ÿ“Ž BibTeX.

Benchmark, Survey

  • FSE'25โ€ƒOpsEval: A Comprehensive Benchmark Suite for Evaluating Large Language Modelsโ€™ Capability in IT Operations Domain
    Yuhe Liu, Changhua Pei, Longlong Xu, Bohan Chen, Mingze Sun, Zhirui Zhang, Yongqian Sun, Shenglin Zhang, Kun Wang, Haiming Zhang, Jianhui Li, Gaogang Xie, Xidao Wen, Xiaohui Nie, Minghua Ma, Dan Pei
  • MLSys'25โ€ƒAIOpsLab: A Holistic Framework for Evaluating AI Agents for Enabling Autonomous Cloud ๐Ÿ“œ ๐Ÿ“Ž
    Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma*, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, Saravan Rajmohan
  • Constructing Large-Scale Real-World Benchmark Datasets for AIOps ๐Ÿ“œ ๐Ÿ’ฝ ๐Ÿ“Ž
    Zeyan Li, Nengwen Zhao, Shenglin Zhang, Yongqian Sun, Pengfei Chen, Xidao Wen, Minghua Ma, Dan Pei.
  • Large Language Model-Brained GUI Agents: A Survey ๐Ÿ“œ ๐Ÿ“Ž
    Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
  • TOSEM'25โ€ƒFailure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis ๐Ÿ“œ ๐Ÿ“Ž
    Shenglin Zhang, Sibo Xia, Wenzhao Fan, Binpeng Shi, Xiao Xiong, Zhenyu Zhong, Minghua Ma, Yongqian Sun, Dan Pei
  • A Survey of Time Series Anomaly Detection Methods in the AIOps Domain ๐Ÿ“œ ๐Ÿ“Ž
    Zhenyu Zhong and Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang, Yongqian Sun, Qingwei Lin, Yuzhi Zhang, Dan Pei.

Conference

  • NAACL'25โ€ƒUFO: A UI-Focused Agent for Windows OS Interaction ๐Ÿ“œ ๐Ÿ“Ž
    Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang.
  • SANER'25โ€ƒAIOpsArena: Scenario-Oriented Evaluation and Leaderboard for AIOps Algorithms in Microservices
    Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang, Yuhe Ji, Lu Zhang, Wen Long, Yongnan Luo, Hengmao Chen, Dan Pei
  • ICDE'25โ€ƒAllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models ๐Ÿ“œ
    Chaoyun Zhang, Zicheng Ma, Yuhao Wu, Shilin He, Si Qin, Minghua Ma, Xiaoting Qin, Yu Kang, Yuyi Liang, Xiaoyu Gou, Yajie Xue, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
  • SoCC'24โ€ƒBuilding AI Agents for Autonomous Clouds: Challenges and Design Principles ๐Ÿ“œ ๐Ÿ“Ž
    Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, Saravan Rajmohan
  • SoCC'24 Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure ๐Ÿ“œ ๐Ÿ“Ž
    Chaoyun Zhang, Randolph Yao, Si Qin, Ze Li, Shekhar Agrawal, Binit Mishra, Tri Tran, Minghua Ma, Qingwei Lin, Murali Chintalapati, Dongmei Zhang
  • ASE'24โ€ƒEnd-to-End AutoML for Unsupervised Log Anomaly Detection ๐Ÿ“Ž
    Shenglin Zhang, Yuhe Ji, Jiaqi Luan, Xiaohui Nie, Ziโ€™ang Chen, Minghua Ma, Yongqian Sun, Dan Pei
  • ASE'24โ€ƒART: A Unified Unsupervised Framework for Incident Management in Microservice Systems ๐Ÿ“Ž
    Yongqian Sun, Binpeng Shi, Mingyu Mao, Minghua Ma, Sibo Xia, Shenglin Zhang, Dan Pei.
  • ISSRE'24โ€ƒEnhanced Fine-Tuning of Lightweight Domain-Specific Q&A model Based on Large Language Models ๐Ÿ“Ž
    Shenglin Zhang, Pengtian Zhu, Minghua Ma, Jiagang Wang, Yongqian Sun, Dongwen Li, Jingyu Wang, Qianying Guo, Xiaolei Hua, Lin Zhu, Dan Pei.
  • ISSRE'24โ€ƒEarly Bird: Ensuring Reliability of Cloud Systems Through Early Failure Prediction ๐Ÿ“Ž
    Yudong Liu, Minghua Ma*, Pu Zhao, Tianci Li, Bo Qiao, Shuo Li, Ze Li, Murali Chintalapati, Yingnong Dang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang.
    ๐Ÿ† IEEE the 35th ISSRE Best Industry Paper Candidates
  • ISSRE'24โ€ƒLarge Language Models Can Provide Accurate and Interpretable Incident Triage ๐Ÿ“Ž
    Zexin Wang, Jianhui Li, Minghua Ma*, Ze Li, Yu Kang, Chaoyun Zhang, Chetan Bansal, Murali Chintalapati, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang, Changhua Pei, Gaogang Xie.
  • ISSRE'24โ€ƒCan We Trust Auto-Mitigation? Improving Cloud Failure Prediction with Uncertain Positive Learning ๐Ÿ“Ž
    Haozhe Li, Minghua Ma*, Yudong Liu, Pu Zhao, Shuo Li, Lingling Zheng, Ze Li, Murali Chintalapati, Yingnong Dang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang.
  • KDD'24โ€ƒPre-trained KPI Anomaly Detection Model Based on Disentangled Transformer ๐Ÿ“Ž
    Zhaoyang Yu, Changhua Pei, Xin Wang, Minghua Ma, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang, Xidao Wen, Jianhui li, Gaogang Xie, Dan Pei.
  • KDD'24โ€ƒMicroservice Root Cause Analysis With Limited Observability Through Intervention Recognition in the Latent Space ๐Ÿ“Ž
    Zhe Xie, Shenglin Zhang, Yitong Geng, Yao Zhang, Minghua Ma, Xiaohui Nie, Zhenhe Yao, Longlong Xu, Yongqian Sun, Wentao Li, Dan Pei.
  • ACL'24โ€ƒEverything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation ๐Ÿ“œ ๐Ÿ“Ž
    Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Wei Zhang, Si Qin, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang.
  • FSE'24โ€ƒMonitorAssistant: Simplifying Cloud Service Monitoring via Large Language Models ๐Ÿ“œ ๐Ÿ“Ž
    Zhaoyang Yu, Minghua Ma*, Chaoyun Zhang, Si Qin, Yu Kang, Chetan Bansal, Saravan Rajmohan, Yingnong Dang, Changhua Pei, Dan Pei, Qingwei Lin, Dongmei Zhang.
  • FSE'24โ€ƒAutomated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 ๐Ÿ“œ ๐Ÿ“Ž
    Xuchao Zhang, Supriyo Ghosh, Chetan Bansal, Rujia Wang, Minghua Ma, Yu Kang, Saravan Rajmohan.
  • TheWebConf'24โ€ƒRevisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective ๐Ÿ“œ ๐Ÿ’ป ๐Ÿ“Ž
    Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei, Saravan Rajmohan, Dongmei Zhang, Qingwei Lin, Haiming Zhang, Jianhui li, Gaogang Xie.
  • VLDB'24โ€ƒImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection ๐Ÿ“œ ๐Ÿ’ป ๐Ÿ“Ž
    Yuhang Chen, Chaoyun Zhang, Minghua Ma, Yudong Liu, Ruomeng Ding, Bowen Li, Shilin He, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang.
  • ICSE'24โ€ƒXpert: Empowering Incident Management with Query Recommendations via Large Language Models ๐Ÿ“œ ๐Ÿ“Ž
    Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu Kang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang.
  • EuroSys'24โ€ƒAutomatic Root Cause Analysis via Large Language Models for Cloud Incidents ๐Ÿ“œ ๐Ÿ“Ž
    Yinfang Chen, Huaibing Xie, Minghua Ma*, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, Jun Zeng, Supriyo Ghosh, Xuchao Zhang, Chaoyun Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang.
  • ISSRE'23โ€ƒCODEC: Cost-Effective Duration Prediction System for Deadline Scheduling in the Cloud ๐Ÿ“œ ๐Ÿ“Ž
    Haozhe Li, Minghua Ma, Yudong Liu, Si Qin, Bo Qiao, Randolph Yao, Harshwardhan Chaturvedi, Tri Tran, Murali Chintalapati, Saravan Rajmohan, Qingwei Lin and Dongmei Zhang.
  • FSE'23โ€ƒAssess and Summarize: Improve Outage Understanding with Large Language Models ๐Ÿ“œ ๐Ÿ“Ž
    Pengxiang Jin+, Shenglin Zhang+, Minghua Ma, Haozhe Li, Yu Kang, Liqun Li, Yudong Liu, Bo Qiao, Chaoyun Zhang, Pu Zhao, Shilin He, Federica Sarro, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang.
  • FSE'23โ€ƒDetection is Better Than Cure - A Cloud Incidents Perspective ๐Ÿ“œ ๐Ÿ“Ž
    Vaibhav Ganatra, Anjaly Parayil, Supriyo Ghosh, Yu Kang, Minghua Ma, Chetan Bansal, Suman Nath, Jonathan Mace.
  • FSE'23โ€ƒTraceDiag: Adaptive, Interpretable and Efficient Root Cause Analysis on Large-Scale Microservice Systems ๐Ÿ“œ ๐Ÿ“Ž
    Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Xiaomin Wu, Meng Zhang, Qingjun Chen, Xin Gao, Xuedong Gao, HaoFan, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang.
  • KDD'23โ€ƒRobust Multimodal Failure Detection for Microservice Systems ๐Ÿ“œ ๐Ÿ“Ž
    Chenyu Zhao+, Minghua Ma+, Zhenyu Zhong, Shenglin Zhang, Zhiyuan Tan, Xiao Xiong, LuLu Yu, Jiayi Feng, Yongqian Sun, Yuzhi Zhang, Dan Pei, Qingwei Lin, Dongmei Zhang.
  • DSN'23โ€ƒCharacterizing Large-Scale Private and Public Cloud Workloads ๐Ÿ“œ ๐Ÿ“Ž
    Xiaoting Qin, Minghua Ma, Yuheng Zhao, Jue Zhang, Chao Du, Yudong Liu, Anjaly Parayil, Chetan Bansal, Saravan Rajmohan, Inigo Goiri, Eli Cortez, Si Qin, Qingwei Lin, Dongmei Zhang.
  • TheWebConf'23โ€ƒEDITS: An Easy-to-difficult Training Strategy for Cloud Failure Prediction ๐Ÿ“œ ๐Ÿ“Ž
    Tianci Li, Pu Zhao, Yudong Liu, Minghua Ma, Lingling Zheng, Murali Chintalapati, Bo Liu, Paul Wang, Hongyu Zhang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin and Dongmei Zhang.
  • ICSE-SEIP'23โ€ƒAegis: Attribution of Control Plane Change Impact across Layers and Components for Cloud Systems ๐Ÿ“œ ๐Ÿ“Ž
    Xiaohan Yan, Ken Hsieh, Yasitha Liyanage, Minghua Ma, Murali Chintalapati, Qingwei Lin, Yingnong Dang and Dongmei Zhang.
  • ICSE-SEIP'23โ€ƒTraceArk: Towards Actionable Performance Anomaly Alerting for Online Service Systems ๐Ÿ“œ ๐Ÿ“Ž
    Zhengran Zeng, Yuqun Zhang, Yong Xu, Minghua Ma, Bo Qiao, Wentao Zou, Qingjun Chen, Meng Zhang, Xu Zhang, Hongyu Zhang, Xuedong Gao, Hao Fan, Saravan Rajmohan, Qingwei Lin and Dongmei Zhang.
  • ICSE-SEIP'23โ€ƒCONAN: Diagnosing Batch Failures for Cloud Systems ๐Ÿ“œ ๐Ÿ“Ž
    Liqun Li, Xu Zhang, Shilin He, Yu Kang, Hongyu Zhang, Minghua Ma, Yingnong Dang, Zhangwei Xu, Saravan Rajmohan, Qingwei Lin and Dongmei Zhang.
  • FSE'22โ€ƒAn Empirical Investigation of Missing Data Handling in Cloud Node Failure Prediction ๐Ÿ“œ ๐Ÿ“Ž
    Minghua Ma, Yudong Liu, Yuang Tong, Haozhe Li, Pu Zhao, Yong Xu, Hongyu Zhang, Shilin He, Lu Wang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin.
  • FSE'22โ€ƒAn Empirical Study of Log Analysis at Microsoft ๐Ÿ“œ ๐Ÿ“Ž
    Shilin He, Xu Zhang, Pinjia He, Yong Xu, Liqun Li, Yu Kang, Minghua Ma, Yining Wei, Yingnong Dang, Saravan Rajmohan, Qingwei Lin.
  • KDD'22โ€ƒMulti-task Hierarchical Classification for Disk Failure Prediction in Online Service Systems ๐Ÿ“œ ๐Ÿ“Ž
    Yudong Liu, Hailan Yang, Pu Zhao, Minghua Ma, Chengwu Wen, Hongyu Zhang, Chuan Luo, Qingwei Lin, Chang Yi, Jiaojian Wang, Chenjian Zhang, Paul Wang, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang.
  • TheWebConf'22โ€ƒUniParser: A Unified Log Parser for Heterogeneous Log Data ๐Ÿ“œ ๐Ÿ“Ž
    Yudong Liu, Xu Zhang, Shilin He, Hongyu Zhang, Liqun Li, Yu Kang, Yong Xu, Minghua Ma, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang.
  • DEXA'22โ€ƒMining Fluctuation Propagation Graph among Time Series with Human-in-the-Loop ๐Ÿ“œ ๐Ÿ“Ž
    Mingjie Li, Minghua Ma, Xiaohui Nie, Kanglin Yin, Li Cao, Xidao Wen, Zhiyun Yuan, Duogang Wu, Guoying Li, Wei Liu, Xin Yang, Dan Pei.
  • ATC'21โ€ƒJump-Starting Multivariate Time Series Anomaly Detection for Online Service Systems ๐Ÿ“œ ๐Ÿ’ป ๐Ÿ“Ž
    Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin, Xiaohui Nie, Bo Zhou, Yong Wang, Dan Pei.
  • VLDB'20โ€ƒDiagnosing Root Causes of Intermittent Slow Queries in Cloud Databases ๐Ÿ“œ ๐Ÿ“Ž
    Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, Dan Pei.
  • ISSRE'18โ€ƒRobust and Rapid Adaption for Concept Drift in Software System Anomaly Detection ๐Ÿ“œ ๐Ÿ“Ž
    Minghua Ma, Shenglin Zhang, Dan Pei, Xin Huang, Hongwei Dai.
    ๐Ÿ† IEEE the 29th ISSRE Best Research Paper
  • IWQoS'17โ€ƒYou Can Hide, but Your Periodic Schedule Canโ€™t ๐Ÿ“œ ๐Ÿ“Ž
    Minghua Ma, Kai Zhao, Kaixin Sui, Lei Xu, Yong Li, Dan Pei.
  • IWQoS'16โ€ƒYour Trajectory Privacy Can Be Breached Even If You Walk in Groups ๐Ÿ“œ ๐Ÿ“Ž
    Kaixin Sui, Youjian Zhao, Dapeng Liu, Minghua Ma, Lei Xu, Li Zimu, Dan Pei.
  • UbiComp'16โ€ƒEDUM: Classroom Education Measurements via Large-scale WiFi Networks ๐Ÿ“œ ๐Ÿ“Ž
    Mengyu Zhou, Minghua Ma, Yangkun Zhang, Kaixin Sui, Dan Pei, Thomas Moscibroda.
  • MobiSys'16โ€ƒCharacterizing and Improving WiFi Latency in Large-Scale Operational Networks ๐Ÿ“œ ๐Ÿ“Ž
    Kaixin Sui, Mengyu Zhou, Dapeng Liu, Minghua Ma, Dan Pei.
  • INFOCOM'16โ€ƒWiFi can Be the Weakest Link of Round Trip Network Latency ๐Ÿ“œ ๐Ÿ“Ž
    Changhua Pei, Youjian Zhao, Guo Chen, Ruming Tang, Yuan Meng, Minghua Ma, Ken Ling, Dan Pei.

Journal

  • TSC'23โ€ƒRobust Failure Diagnosis of Microservice System through Multimodal Data ๐Ÿ“œ ๐Ÿ“Ž
    Shenglin Zhang, Pengxiang Jin, Zihan Lin, Yongqian Sun, Bicheng Zhang, Sibo Xia, Zhengdan Li, Zhenyu Zhong, Minghua Ma, Wa Jin, Dai Zhang, Zhenyu Zhu, Dan Pei.
  • TNSM'19โ€ƒAutomatic and Generic Periodicity Adaptation for KPI Anomaly Detection ๐Ÿ“œ ๐Ÿ“Ž
    Nengwen Zhao, Jing Zhu, Yao Wang, Minghua Ma, Wenchi Zhang, Dapeng Liu, Ming Zhang, Dan Pei.

Preprint

  • Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection ๐Ÿ“œ
    Jun Liu, Chaoyun Zhang, Jiaxu Qian, Minghua Ma, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
  • TaskWeaver: A Code-First Agent Framework ๐Ÿ“œ ๐Ÿ“Ž
    Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang.
  • Enhanced Fairness Testing via Generating Effective Initial Individual Discriminatory Instances ๐Ÿ“œ ๐Ÿ“Ž
    Minghua Ma, Zhao Tian, Max Hort, Federica Sarro, Hongyu Zhang, Qingwei Lin, Dongmei Zhang.
  • DockerMock: Pre-Build Detection of Dockerfile Faults through Mocking Instruction Execution ๐Ÿ“œ ๐Ÿ“Ž
    Mingjie Li, Xiaoying Bai, Minghua Ma, Dan Pei.

Services

  • Organizer
    • 2021: AIOps Challenge Technical Chair
  • PC Member
    • 2025: FSE, FSE Industry, ASE, ISSRE, KDD, COLM
    • 2024: FSE Industry, ISSRE, APSEC, KDD, TheWebConf, MILETS
    • 2023: ASE, KDD, MILETS
  • Journal Reviewer
    • TOSEM
    • Neurocomputing
    • TCC