"Weakly Supervised Deep Learning for the Detection of Domain Generation" by B. Yu, J. Pan et al.

School of Engineering and Technology Publications

Title

Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms

Authors

B. Yu
J. Pan
D. Gray
J. Hu
C. Choudhary
A. C. Nascimento, University of Washington TacomaFollow
M. De Cock, University of Washington TacomaFollow

Publication Date

4-15-2019

Document Type

Article

Abstract

Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features.

Publication Title

IEEE Access

Volume

First Page

51542

Last Page

51556

DOI

10.1109/ACCESS.2019.2911522

Publisher Policy

open access

Open Access Status

OA Journal

Recommended Citation

Yu, B.; Pan, J.; Gray, D.; Hu, J.; Choudhary, C.; Nascimento, A. C.; and Cock, M. De, "Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms" (2019). School of Engineering and Technology Publications. 352.
https://digitalcommons.tacoma.uw.edu/tech_pub/352

Link to Full Text

Find in your library

COinS

UW Tacoma Digital Commons

School of Engineering and Technology Publications

Title

Authors

Publication Date

Document Type

Abstract

Publication Title

Volume

First Page

Last Page

DOI

Publisher Policy

Open Access Status

Recommended Citation

Browse

Author Corner

Links

SelectedWorks Sites

UW Tacoma Digital Commons

School of Engineering and Technology Publications

Title

Authors

Publication Date

Document Type

Abstract

Publication Title

Volume

First Page

Last Page

DOI

Publisher Policy

Open Access Status

Recommended Citation

Share

Browse

Author Corner

Links

SelectedWorks Sites