Threat detection through ZAT combined with machine learning

Category: Tag:

Series of articles
Threat detection through ZAT combined with machine learning (2)

Threat detection through ZAT combined with machine learning (3)

Machine learning overview
In machine learning, our processing of security data is extremely important. We need to select appropriate algorithms for different types of attack data, but the general process is divided into the following steps:

retrieve data
Feature extraction
Feature preprocessing
Feature dimensionality reduction
Model training
Model evaluation

Process zeek full traffic log data through zat
Today’s article mainly introduces the analysis method of processing zeek’s full traffic data through zat.
zeek is an open source NIDS intrusion detection engine, and currently most used is the risk control business of Internet companies. Zeek provides a tool for zeek analysis, zat.
The zat toolkit has a variety of methods for processing zeek output, which are as follows:

Process log data dynamic polling

Zeek records to Pandas data frame and Scikit-Learn

Dynamically monitor files.log and perform VirusTotal query

Dynamically monitor http.log and display “uncommon” user agents

Run Yara signature on the extracted files

Check x509 certificate

abnormal detection

Process zeek dhcp.log log data

Output a dictionary with timestamp

Process zeek’s dns.log log data, and use Pandas to output the dns.log file

Next, we use sklearn to divide the data set, here is part of the code

Output result:

Perform virus file query on zeek’s file.log log, here is part of the code. vt_query is a related library for querying VirusTotal

Query the sha256 / sha1 value of each file against the VirusTotal service

Query the http.log log of zeek, here is mainly for the data in the UA header

Output result

Use Yara to dynamically monitor the extract_files directory. When Zeek deletes a file, the code will run a set of Yara rules on the file

Output result

Detect the domain name and perform a “check on the total number of viruses” on these URLs

When your machine accesses uni10.tk, the output effect is as follows

For x509.log data, because some phishing or malicious website traffic is encrypted. We can judge this by certificate.

After running, the output is as follows

For anomaly detection, we can use the isolated forest algorithm for anomaly processing. Once an anomaly is found, we can use a clustering algorithm to group the anomalies into organized parts, so that the analyst can browse the output group instead of looking at it line by line.

Output exception group

Detect tor and calculate port number. Determine the tor traffic by traversing zeek’s ssl.log file, and some code is posted here.

The output is as follows:

The next article will take the dns.log log of zeek as an example to introduce the sklearn feature engineering of dns.log data and the use of numpy for matrix operations and pca dimensionality reduction.

Reviews

There are no reviews yet.

Be the first to review “Threat detection through ZAT combined with machine learning”

Your email address will not be published. Required fields are marked *