Advanced Indicator of Compromise (IOC) extractor.
Overview
This library extracts URLs, IP addresses, MD5/SHA hashes, email addresses, and YARA rules from text corpora. It includes some encoded and "defanged" IOCs in the output, and optionally decodes/refangs them.
The Problem
It is common practice for malware analysts or endpoint software to "defang" IOCs such as URLs and IP addresses, in order to prevent accidental exposure to live malicious content. Being able to extract and aggregate these IOCs is often valuable for analysts. Unfortunately, existing "IOC extraction" tools often pass right by them, as they are not caught by standard regex.
For example, the simple defanging technique of surrounding periods with brackets:
Code:
127[.]0[.]0[.]1
Existing tools that use a simple IP address regex will ignore this IOC entirely.
The Solution
By combining specially crafted regex with some custom postprocessing, we are able to both detect and deobfuscate "defanged" IOCs. This saves time and effort for the analyst, who might otherwise have to manually find and convert IOCs into machine-readable format.
Installation
You may need to install the Python development headers in order to install the regex dependency. On Ubuntu/Debian-based systems, try:
Code:
sudo apt-get install python-dev
Then install iocextract from pip:
Code:
pip install iocextract
If you have problems installing on Windows, try installing regex directly by downloading the appropriate wheel from PyPI and running e.g.:
Code:
Overview
This library extracts URLs, IP addresses, MD5/SHA hashes, email addresses, and YARA rules from text corpora. It includes some encoded and "defanged" IOCs in the output, and optionally decodes/refangs them.
The Problem
It is common practice for malware analysts or endpoint software to "defang" IOCs such as URLs and IP addresses, in order to prevent accidental exposure to live malicious content. Being able to extract and aggregate these IOCs is often valuable for analysts. Unfortunately, existing "IOC extraction" tools often pass right by them, as they are not caught by standard regex.
For example, the simple defanging technique of surrounding periods with brackets:
Code:
127[.]0[.]0[.]1
Existing tools that use a simple IP address regex will ignore this IOC entirely.
The Solution
By combining specially crafted regex with some custom postprocessing, we are able to both detect and deobfuscate "defanged" IOCs. This saves time and effort for the analyst, who might otherwise have to manually find and convert IOCs into machine-readable format.
Installation
You may need to install the Python development headers in order to install the regex dependency. On Ubuntu/Debian-based systems, try:
Code:
sudo apt-get install python-dev
Then install iocextract from pip:
Code:
pip install iocextract
If you have problems installing on Windows, try installing regex directly by downloading the appropriate wheel from PyPI and running e.g.:
Code: