Predicting Buggy Files with commit-prophet and Git History
These articles are AI-generated summaries. Please check the original sources for full details.
commit-prophet: I Built a Tool That Predicts Buggy Files Using Git History
Lakshmi Sravya Vedantham developed commit-prophet to mine longitudinal data from git logs. The tool identifies risk by scanning for keywords like ‘fix’ or ‘bug’ in commit messages to calculate a 0–100 risk score.
Why This Matters
Modern linters and review tools focus exclusively on the current state of a diff or style, ignoring the historical trajectory of a file. Technical reality shows that roughly 10% of files account for 90% of bugs, meaning historical instability is a more accurate predictor of future failures than current code quality alone.
Key Insights
- Defect coupling is the strongest risk signal, weighted at 50% in the algorithm because frequent appearances in ‘fix’ commits indicate a file that attracts bugs.
- The 90/10 rule of code history suggests that a small fraction of files are responsible for the vast majority of defects, yet this data is often ignored.
- Co-change analysis can reveal hidden dependencies where two files (e.g., auth and payments) always change together despite having no direct code imports.
- High churn alone is only a yellow flag (40% weight), as frequent changes without bug-fix keywords may simply indicate evolving requirements rather than instability.
- The tool is built in Python using zero external git libraries, relying on subprocess calls to git log for high-performance data extraction.
Working Examples
The weighted scoring algorithm used by commit-prophet to determine file risk.
churn_score = min(file_churn / max_churn, 1.0) * 40
defect_score = min(defect_commits / max_defects, 1.0) * 50
coupling_score = min(risky_cochanged_files / 10, 1.0) * 10
Core implementation steps for parsing git history and calculating risk metrics.
from commit_prophet import get_commits, calculate_churn, calculate_defect_coupling
commits = get_commits("/path/to/repo", since_days=90)
churn = calculate_churn(commits)
defects = calculate_defect_coupling(commits)
Practical Applications
- Use case: Running ‘commit-prophet scan —days 90’ to generate a Hotspot Risk Report identifying critical files like billing processors before deployment.
- Pitfall: Treating high churn files as inherently buggy; commit-prophet distinguishes between evolution and instability by weighing defect coupling more heavily than churn.
- Use case: Using co-change analysis to discover that an auth module and payment module are secretly coupled through shared environmental state.
- Pitfall: Ignoring historical commit data in PR reviews; commit-prophet provides the longitudinal context that standard diff tools lack.
References:
Continue reading
Next article
Implementing DNS Governance in OpenShift with Red Hat Advanced Cluster Management
Related Content
DevPulse: Automating Engineering Journals via Claude Code and Notion MCP
DevPulse uses Claude Code and Notion MCP to automate developer journaling, converting git history into a gamified XP system with a 25-quest achievement engine and 30 badges.
Architectural Shift: Replacing Singletons with Dependency Injection for Testable Code
Utkuhan Akar's team eliminated flaky test failures and hidden coupling by replacing the Singleton pattern with explicit Dependency Injection.
Building Maatru: An Agentic Telugu Literacy App with Gemma 4
Maatru uses Gemma 4 to automate pedagogical planning for Telugu literacy, reducing session LLM calls from fourteen to one via a bundling architecture.