Minghua Ma is a Senior Researcher at Microsoft M365 Research. His work focuses on AIOps: building AI systems that autonomously detect, diagnose, and resolve failures in large-scale cloud infrastructure, with multiple systems deployed in production at Microsoft. He received his Ph.D. from Tsinghua University in 2021, advised by Prof. Dan Pei in the Netman Group. He has published 60+ papers at venues including ICSE, FSE, EuroSys, KDD, VLDB, and MLSys. He is a Senior Member of CCF.
Internship Opportunities: I am seeking self-motivated undergraduate, master's, and Ph.D. intern students. If you are interested in working with me, please email me your CV.
Research Highlights
SKILLGen
Transforms playbooks, historical incidents, and domain knowledge into actionable skills for AI agents. Starting from automated TSG generation (FSE'26), extending to broad skill synthesis. Deployed at Microsoft.
News
Experience
Microsoft
2021 – PresentTsinghua University
2016 – 2021Georgia Tech
2019 – 2020Awards
- 🏆 IEEE ISSRE 2025 Best Research Paper Candidate
"Too Many Cooks: Assessing the Need for Multi-Source Data in Microservice Failure Diagnosis" - 🏆 IEEE ISSRE 2024 Best Industry Paper Candidate
"Early Bird: Ensuring Reliability of Cloud Systems Through Early Failure Prediction" - 🏆 IEEE ISSRE 2018 Best Research Paper
"Robust and Rapid Adaption for Concept Drift in Software System Anomaly Detection" - 🎓 Outstanding Graduate
Department of Computer Science and Technology, Tsinghua University, 2021
Teaching
- Mentor
- 2026 Spring: Boston University – EC-528 Cloud Computing
- TA
- 2017 Fall: Tsinghua University – Software Engineering
- 2017 Spring: Tsinghua University – Advanced Network Management
Services
- Organizer
- SANER 2027: Tool Demo Track Chair
- EASE 2026: Industry Track Chair
- The 4th CCF AIOps Challenge: Technical Chair
- PC Member
- 2026: FSE Industry, TheWebConf, COLM, AIOps Workshop
- 2025: FSE, FSE Industry, ASE, ISSRE, KDD, COLM
- 2024: FSE Industry, ISSRE, APSEC, KDD, TheWebConf, MILETS
- 2023: ASE, KDD, MILETS
- Journal Reviewer
- ACM Transactions on Intelligent Systems and Technology (TIST)
- ACM Transactions on Software Engineering and Methodology (TOSEM)
- IEEE Transactions on Knowledge and Data Engineering (TKDE)
- IEEE Transactions on Services Computing (TSC)
- IEEE Transactions on Cloud Computing (TCC)
- Neurocomputing
- Talks
- "LLM-based Root Cause Analysis for Cloud Incidents" (Keynote), CCF AIOps Challenge, Beijing, 2023.
- "Improving Cloud Reliability at Scale using Generative AI" (Invited), University of Michigan, Online, 2025.
Publications
* corresponding author, + equal contribution, 📝 paper, code, 📦 dataset, and 📎 BibTeX.
Loading publications...