New Franco-Dutch research project on automatic classification of domain name abuse

2 signposts, one of which points to the left (Compromised) and one to the right (Mailiciously registered)

Wednesday 3 October 2018
Article by: Benoît Ampeau, Maciej Korczyński, Cristian Hesselman

SIDN Labs, Afnic Labs, and Grenoble Alps University will be starting a new research project called “Classification of compromised versus maliciously registered domains” (COMAR) on 1 October 2018. The Franco-Dutch project will address the problem of automatically distinguishing between domain names registered by cybercriminals for the purpose of malicious activities, and domain names exploited through vulnerable web applications. The project is designed to help intermediaries such as registrars and ccTLD registries further optimize their anti-abuse processes.

Domain name abuse

Domain names are easy to use shorthands for IP addresses that help us navigate the many online services that we use in our daily lives. While the vast majority of domain name registration and use is benign, there are cybercriminals who unfortunately misuse them, for instance to launch large-scale phishing attacks, drive-by-downloads, and spam campaigns. Security organizations such as the Anti-Phishing Working Group (APWG) and Stop Badware collect information about these misused domain names and make it available to their customers (e.g., hosting providers and domain name registries) in the form of URL blacklists.

Compromised vs. maliciously registered

Both the operational and research communities distinguish two types of domain name abuse: legitimate domains that criminals have compromised and new domain names that have been specifically registered for malicious purposes [1][2]. An example of a compromised domain name is studentflats.gr, which is a legitimate site that ran a Wordpress installation and that cybercriminals hacked to host a banking-related phishing site. This is visible in the blacklisted URL (http://studentflats.gr/wp-content/uploads/2016/.co.nz/login/personal-banking/login/auth_security.php), which has an illegally installed banking script (/uploads/…/auth_security.php) underneath the Wordpress directory (/wp-content). An example of a maliciously registered domain name is continue-details.com, which was used for a Paypal phishing site. This is visible in the blacklisted URL (http://paypal.com.login.continue-details.com/), which does not explicitly contain a malicious program such as a PHP script, but instead refers to a site specifically set up for the phish using a 5th level domain name (continue-details.com being the first and second levels and paypal.com.login. adding three more levels).

The distinction between these two groups is critical because they require different mitigation actions by different intermediaries. For example, hosting providers together with webmasters typically concentrate on cleaning up the content of compromised websites [3], whereas domain registries (e.g., SIDN and Afnic) and registrars tend to focus on handling malicious domain name registrations.

Blacklist-based classification

From an operational point of view, intermediaries typically use URL blacklists in their security systems to automatically block malicious content. However, a compromised domain name requires a more fine-grained level of mitigation. For example, if an intermediary simply blocks studentflats.gr, then it will also block the legitimate part of the site (the content the Wordpress installation is serving to visitors). So instead what is needed for a security engineer is to look at the site’s Wordpress installation and specifically (or automatically) remove the malicious PHP script from the hosting platform. This example illustrates that it is crucial to unambiguously label domains of blacklisted URLs as compromised or maliciously registered so they can be reliably used by security systems.

The ultimate goal of COMAR is to develop a machine learning-based classifier that labels blacklisted domains as compromised or maliciously registered, then extensively evaluate its accuracy, and implement it for a production-level environment. We also plan to study the attackers’ profit-maximizing behavior and their business models. We shall apply our classifier to unlabeled domain names of URL blacklists, for example, to answer the following question: do attackers prefer to register malicious domains, compromise vulnerable websites, or misuse domains of legitimate services such as cloud-based file-sharing services in their criminal activities?

Partner capabilities and interests

All three COMAR partners have extensive experience in the analysis of large heterogeneous datasets and in engineering the underlying platforms. Grenoble Alps University will concentrate on the statistical analysis of large-scale Internet measurement and incident data and publishing scientific papers, whereas both registry Labs will focus on advancing the COMAR classifier for operational environments (e.g., at SIDN and AFNIC) and making it available to their stakeholders such as .nl and .fr registrars. The complementary approach of this partnership is in line with the need for registries to continuously reinforce their capacities and capabilities to increase the security levels of their Top-level Domains (TLDs) and ultimately provide enhanced levels of trust for end-users.

Sourena Maroofi, a Ph.D. student at Grenoble Alps University, will develop and evaluate the COMAR classifier under the supervision of Maciej Korczyński, COMAR’s Principal Investigator. COMAR, funded by SIDN and Afnic, will start in October 2018 and will last for three years. The steering committee of the project consists of Cristian Hesselman (SIDN Labs), Benoît Ampeau (Afnic Labs), and Maciej Korczyński (Drakkar team, Grenoble INP, Grenoble Alps University).

About COMAR partners

COMAR is a joint project of SIDN Labs, Afnic Labs, and Grenoble Alps University.

SIDN Labs is the research team of SIDN, the registry of the .nl Top-Level Domain (TLD) in the Domain Name System (DNS). SIDN Labs’ goal is to increase the operational security and resilience of end-to-end Internet communications through world-class measurement-based research and technology development. Our research challenges include DNS and Internet security and resilience, and Internet evolution.

Afnic Labs is a key team devoted to the development and future of the Internet at Afnic. Afnic manages the .fr and 5 others French overseas TLDs. Afnic is also the back-end registry for 14 companies and local and regional authorities that have chosen to have their own TLD suffix. Each day Afnic Labs initiates and contributes to projects in line with Afnic’s assignments: an Internet that is secure and stable, open to innovation and in which the French internet community plays a key role. Just as with other partnerships in which Afnic is involved, Afnic Labs believes in the added value of collaborative research work to ultimately provide a very high-valuable, mature, state-of-the-art classifier.

Grenoble Alps University aims to establish a leading center in cybersecurity research in the Rhône-Alpes region in France with a particular focus on active and passive measurements for cybersecurity. The members of the Drakkar team have been involved in collaborative projects with law enforcement agencies, security and Internet policy organizations devoted to fighting cybercrime. Our focus is on the statistical analysis of large-scale Internet measurement and incident data to identify how cybercriminals misuse domain names and how providers of Internet services deal with security risks and incidents. The COMAR project is at the heart of these issues.

Project website: www.comar-project.fr and www.comar-project.nl (available shortly).