GitHub

Capa: Malware Capability Detector

What is the project about?

Capa is a tool that automatically identifies capabilities within executable files (like PE, ELF, .NET modules, and shellcode) or sandbox reports. It analyzes the input and reports what it believes the program can do, such as acting as a backdoor, installing services, communicating via HTTP, etc. It's like an automated, initial triage step for malware analysis.

What problem does it solve?

Capa helps malware analysts, incident responders, and security researchers quickly understand the potential behavior of a suspicious file without needing to fully reverse engineer it. It bridges the gap between high-level descriptions (like "backdoor") and low-level assembly code. It saves time and effort by highlighting key functionalities. It also supports dynamic analysis by processing sandbox reports.

What are the features of the project?

  • Capability Detection: Identifies a wide range of capabilities, categorized using the MITRE ATT&CK framework and custom namespaces.
  • Multiple Input Formats: Supports PE, ELF, .NET, shellcode, and sandbox reports (CAPE, DRAKVUF, VMRay).
  • Rule-Based System: Uses a flexible, human-readable rule format (YAML) to define capabilities. A large, curated set of rules is maintained in a separate repository.
  • Detailed Reporting: Provides both high-level summaries and detailed reports showing the exact locations (addresses) where capabilities were detected. This helps analysts verify the findings and focus their reverse engineering efforts.
  • Extensible: Users can easily write their own rules to detect new or custom capabilities.
  • IDA Pro Plugin (capa explorer): Integrates directly with IDA Pro, allowing analysts to explore capabilities and create rules within the disassembler.
  • Ghidra Integration: Integrates with Ghidra, running analysis and displaying results within the Ghidra UI.
  • Web Interface (capa Explorer): Provides a browser-based interface to interactively explore capa results, both online and offline.
  • Standalone Binaries: Easy to download and run without complex installation.
  • Library Usage: Can be used as a Python library for integration with other tools.
  • Dynamic Analysis Support: Integrates with sandboxes (CAPE, DRAKVUF, VMRay) to analyze dynamic behavior captured during execution.

What are the technologies used in the project?

  • Python: The core of capa is written in Python.
  • vivisect: A Python-based static analysis framework used for disassembly and code analysis.
  • pefile: A Python library for parsing PE files.
  • .NET metadata parser: For analyzing .NET assemblies.
  • ELF parsing libraries: For analyzing ELF files.
  • YAML: Used for the rule format.
  • IDA Pro API: Used for the IDA Pro plugin.
  • Ghidra API: Used for the Ghidra integration.
  • JavaScript/HTML/CSS: For the web interface.

What are the benefits of the project?

  • Faster Malware Analysis: Quickly identifies potential capabilities, speeding up the triage process.
  • Improved Efficiency: Focuses reverse engineering efforts on the most relevant parts of the code.
  • Automated Analysis: Automates the identification of common malware techniques.
  • Extensible and Customizable: Allows users to adapt the tool to their specific needs.
  • Open Source and Community-Driven: Benefits from community contributions and improvements.
  • Integration with Popular Tools: Works seamlessly with IDA Pro and Ghidra.
  • Dynamic and Static Analysis: Combines static analysis of the file with dynamic analysis from sandbox reports.

What are the use cases of the project?

  • Malware Analysis: Initial triage and capability assessment of unknown binaries.
  • Incident Response: Quickly understanding the potential impact of a compromised system.
  • Threat Hunting: Identifying potentially malicious files based on their capabilities.
  • Security Research: Studying malware techniques and developing new detection methods.
  • Red Teaming: Understanding the capabilities of tools and techniques used by adversaries.
  • Automated Threat Intelligence: Integrating capa into automated analysis pipelines.
  • Sandbox Report Analysis: Extracting capabilities from dynamic analysis reports.
capa screenshot