Content Disarm & Reconstruction Technology

So, what is Content Disarm & Reconstruction (CDR)? Here's the brief explanation about CDR according to Votiro,

Votiro: What is Content Disarm and Reconstruction (CDR)?

Content Disarm and Reconstruction (CDR) is a security technology that — depending on the type of CDR involved — flattens malicious files (CDR Type 1), removes active content from the file (CDR Type 2), or cleanses malicious code from files without impacting the usability of the file (CDR Type 3).

Also known as file sanitization, CDR has multiple forms. In general, CDR does not need to rely on detection to prevent threats. Instead of relying on databases of known signatures, the technology assumes all files are malicious and scrutinizes all files that are outside of the approved firewall.

Depending on the type of CDR, which is explained in detail below, content disarm and reconstruction can remove malware, strips any embedded code, and rebuilds the file in a way that disrupts any additional covert malicious code.

The end result of using CDR technology is:

• A flattened file delivered as a safe but unfunctional PDF (CDR Type 1)
• A file with active content, macros, and other malicious and safe content removed (CDR Type 2)
• A safe copy of the original file on a clean template, with all functionality intact (CDR Type 3, Positive Selection technology)

Before we discuss CDR technology in more detail, let's first explain the concept of the Cyber Kill Chain.

The 7 Stage of Cyber Kill Chain

Figure 1. The 7 Stages of a Cyber Attack (kill chain)

The cyber kill chain is a concept that describes the methodology of how hacking activities are generally carried out against a target. The activity begins with the process of identifying the target and ends with the theft of data from the target system or network.

LM White Paper Intel Driven Defense

LM-White-Paper-Intel-Driven-Defense.pdf

1 MB

When a hacker/attacker wants to perform a hacking action against a target, he will basically perform the seven steps described in the Cyber Kill Chain.

1.Reconnaissance

Reconnaissance is a set of processes and techniques (Footprinting, Scanning & Enumeration) that are generally used to find and collect information from the target. The information collected can be technical information such as the target's IT system or IT network, or non-technical information such as data on a company's employees collected through social media.

The goal is to find as much information as possible from the target's assets, then process it and look for vulnerabilities.

This process is actually very important, but is often underestimated. However, reconnaissance can provide a great deal of information that can increase the success of an attack. Some companies even offer professional reconnaissance services, which are usually limited to government agencies. This shows the importance of the reconnaissance process.

2.Weaponization

Once the target data has been collected, the next step is to prepare the weapons that can be used to penetrate the target. There are many weapons that can be used and it all depends on the results of the target identification.

For example, if it is known that the target has an unpatched IT system, then exploits are prepared to exploit the system.

Over time, it has actually become more difficult to perform exploitation by directly exploiting system vulnerabilities. Another weapon that can be used is the use of malware such as viruses, worms, exploit kits, etc., where the process uses human weaknesses to inadvertently run the malware. Some of these viruses or worms are created by the attackers themselves, but they can also be bought and sold on the dark web.

Creating or purchasing weaponised malware is called weaponisation.

3.Delivery

The delivery phase involves sending the pre-prepared weapon to the target. The delivery media and methods also vary, including the use of email, USB, public internet access, social media, phishing using previously hacked website media, etc. This delivery essentially uses the information gathered in the previous stage - reconnaissance - to determine what type of delivery method can be used to get the prepared weapon to the target.

In the case of malware, the device used is usually a file. The malware is attached to a file and then the file is sent to the target via email, USB, etc.

4.Exploitation

A weapon, whether in the form of a program designed to exploit the system directly - often referred to as an exploit - or malware designed to target humans, will essentially carry out the exploitation process.

The exploitation process is deemed successful if the attacker obtains access to the intended destination, whether it be the server machine, network, or computers used by the target. This may include laptops used by HR or GA staff in a business.

If the exploitation stage is successful, the attacker will attempt to retain access to the target that they have successfully controlled via stage 5.

5. Installation

After successfully infiltrating the system, the attacker's next objective would be to install a backdoor or other malware. This allows the attacker to maintain access to the target system in future, without the need to repeat the exploitation process.

The Trojan horse is a common type of malware, taking its
name from the Greek myth where the Spartans defeated the Trojans by hiding in a massive wooden horse.

The same analogy is frequently employed in targeted attacks on IT systems. For instance, an attacker might attempt to crack a videogame. If the attempt is successful in unlocking the game, simultaneous execution could lead to the installation of malware on the target computer. This subsequently gives the attacker control over the infected computer.

6.Command and Control

Once successfully installed on a target, malware acts like a covert agent, continuously reporting its status to the attacker's control centre. The attacker can communicate with the malware at any time and instruct it on what actions to take once inside the target system.

7.Action on Objectives

Examples of commands that can be issued to the agent installed on the intended system encompass the solicitation of file transfer from the target computer, covert activation of the target camera, utilization of the target device as a proxy or launching point to conduct assaults against alternative systems so that if the latter is later scrutinized, the primary suspect would be the initial target computer, among other functionalities.

This is the ultimate stage where the attacker gains complete control of the target, allowing them to exploit various vulnerabilities in the IT system over time.
Media reports of data breaches indicate that the attacker has successfully progressed to this seventh stage.

Breaking The Kill Chain

After comprehending the seven phases of the cyber kill chain illustrated earlier, we can devise a set of tactical measures to impede the assailant's efforts to accomplish their objectives by disrupting the cyber kill chain outlined above.

For instance: By consistently implementing security standards, such as ISO 27001, to an organisation's assets, including its personnel, the risk levels of reconnaissance and weaponisation stages will decrease. This consequently hampers attackers from advancing to the next phase, exploitation, and can lower their success rates in that phase.

Several techniques and technologies can break this chain, among which is the Content Disarm & Reconstruction technology (CDR).

CDR technology holds a crucial role in breaking the third link in this chain, namely, the delivery. Following the prior discussion, attackers utilise various means to transport pre-prepared weapons, such as exploits or malware, to the intended target. CDR technology's primary function is to cleanse any acquired files of any malware, ensuring they are secure.

However, lets first examine the techniques of other widely-used technologies which share a common objective of disrupting the supply chain, before delving into the workings of CDR technology.

Breaking Delivery Stage

FIgure 3. Common Technology to Break Delivery Stage

We will discuss 3 popular methods below:

Signature-based
SandBoxing
Artificial Intelligence

Each of the above technologies has advantages and disadvantages.

1.Signature-Based

Just like humans have fingerprints, every program possesses a signature. The malware category, comprising of viruses, worms, trojans, exploits, etc., represents a computer program - unlike my perception during my primary school days when my father mentioned viruses, I pictured dirt on a disc 😋.

However, malware is a computer program intended to cause harm. As a computer program, malware can naturally be extracted from its signature. This signature is distinctive and can be extracted and presented in various ways. However, when a computer system is affected by malware, it is necessary to conduct an investigation until the malware sample is obtained. After obtaining the sample, the signature is recorded and entered into the database. Antivirus products then use this database with a signature-based approach.

The antivirus product will update the database regularly, which will be sent to each user. Subsequently, when a user encounters a file infected with malware, the antivirus product will conduct a scan and immediately notify the user if the file or its component matches the signature in the antivirus database.

However, there are two significant problems associated with this method. First, the signature must be accurate. If malware evolves or changes, its signature necessarily changes alongside it. Although antivirus companies have developed innovative techniques to detect these changes, the signature is primarily determined by the known malware structure. When a previously unidentified malware appears, antivirus software may fail to identify it. From the user's standpoint, this approach necessitates the 'target' becoming infected with the malware initially and then searching for a resolution.

0-day refers to the very first day, which is also day zero. The occurrence of a 0-day attack is when the world remains unaware of newly developed malware being utilized. During this time, only the malware creator is aware of the existence of the malware, hence putting it beyond identification. As a result, it is likely that there will be initial victims who fall prey to it.

Secondly, malware usually operates in secrecy, either by hiding or infecting files. Even though the files sent to the user appear legitimate, the malware is meticulously concealed to avoid detection. During this time, only the malware creator is aware of the existence of the malware, hence putting it beyond identification. Once malware is detected by the antivirus software, the file is usually quarantined to prevent user access, as the overall aim is to protect users despite the possibility of them actually requiring the file. In this context, it is important to highlight the significance of signature-based technology. Once malware is detected by the antivirus software, the file is usually quarantined to prevent user access, as the overall aim is to protect users despite the possibility of them actually requiring the file.

In particular, when the infecting malware employs complex methods to be tightly attached to the file, it is difficult for most antivirus products to clean it without causing damage to the original file.

2.SandBoxing

Considering the difficulties of relying solely on signature-based methods (specifically the first point), a new approach for detecting 0-day malware has been developed.

Malware that goes undetected or is suspected but not found in the database is isolated in a sandbox. This sandbox is a virtual environment that effectively mimics the workings of a real-world computer system. The process of sandboxing is ilustrated in the following picture.

Figure 4. Computer SandBoxing Ilustration

Files suspected of containing malware are uploaded to the Sandbox program for analysis. The file is then executed and monitored for any malicious activity. If malware is detected, the user receives an immediate notification.

This effective sandboxing approach enables antivirus products to identify and verify new malware strains, even if their signatures have not been previously registered. The methodology employed in the sandbox is complex, but a basic analogy is that if a PDF file is intended solely for displaying a document, but upon opening it in the sandbox, other components within the file try to access an external connection via the Internet, it can be inferred that the file harbours malware. This connection attempt serves as one of the defining features of a Trojan.

We can break down the challenges of this sandboxing method into two parts.

Firstly, malware authors can add anti-sandboxing components to their malware weapons by understanding the sandboxing process. There are various ways of achieving this goal, but the malware will remain dormant for a period and track the movement of things that indicate whether it is in a genuine computer environment or a sandbox environment. For instance, it may monitor mouse/keyboard movements. If a file is opened without any mouse movement, the anti-sandboxing malware will recognize that it is in the sandbox and will refrain from contacting its C&C server for the next step. As a result, it is arduous for the sandbox to ascertain whether the file harbours any malware or not.

Additionally, signature-based has the same issue as aforementioned. Malware that is appended to a file is typically challenging to eradicate, which implies that users are unable to access the file.

3.Artificial Intelligence

As technology advances, especially with the use of Artificial Intelligence, AI can also be utilised for the detection of malware.

Figure 5. Train the Machine to Identify Malware Behaviour

AI methods vary widely but are fundamentally aimed at enabling computers to learn autonomously and identify malware characteristics. The approach is effective even against new 0-day malware variations.

One significant challenge, however, is the training process. From requiring a vast quantity of sample data to necessitating a suitably extensive computing system, particularly when applied to everyday products used by end-users. This results in end-users' machines being encumbered with computational processes that demand substantial resources when involving AI.

Another challenge, akin to past methods, is the ineffective cleaning of contaminated files leading to users being unable to access a file if it is classified as malware.

Content Disarm & Reconstruction

CDR is not a recent development and has been in existence for over a decade. Philippe Lagadec showcased a preliminary implementation of the CDR concept at the CanSecWest security conference in 2008. You can download Lagadec's presentation from here. In fact, Lagadec went a step further and made public the source code for the initial version of CDR technology on github.

💡

ExeFilter is an open-source tool and framework to filter file formats in e-mails, web pages or files. It detects many common file formats and can remove active content (scripts, macros, etc) according to a configurable policy ~ http://www.decalage.info/exefilter

CDR technology does not aim to replace other existing technologies, such as anti-virus software. Its purpose is to complement such technology, particularly for those requiring superior protection against malware attacks, whether they are 0-day or not.

CDR technology operates by following flows:

Breaking down the elements of a file into their most fundamental components and analysing each part and its metadata.
If any component, which should not form part of the file, is detected, it will be neutralised (discarded).
Subsequently, the superfluous components will be removed to reconstruct the file, thereby preserving the original file's integrity and functionality.

The three aforementioned steps are fundamental in guaranteeing the efficacy of CDR technology. Although simple, they are exceptionally potent.

The initial phase is a customary measure that numerous other approaches and technologies take. Nonetheless, it is in the second and third phases where the real magic happens. While other technologies attempt to detect malware using signature-based databases or through studying its movements in a sandbox environment, or by recognising malware properties using a mathematical model resulting from an AI approach’s training, the core CDR technology simply scrutinises the suspicious component and its compatibility with the processed file.

If the text is not a match, no further analysis is necessary, and the component can be removed immediately. The CDR product does not require knowledge of the malware's movements, type, or any other details. As long as the component is not intended to be part of the basic structure of a file, it will be removed.

Naturally, identifying the suspicious element once the file has been divided is the clandestine approach to implementing CDR technology 🙂. Nonetheless, this is the fundamental precept of CDR technology, which distinguishes it from other methods.

For example, if an email contains a Microsoft Word document, it is broken down into its basic components.

Figure 6. Breaking a file into its basic components

If a component of a file which is known not to be there is removed, it will inevitably damage the file structure. If not reconstructed properly, the file will be unreadable. Core CDR technology implements a secret recipe that has proven its reliability in reconstructing files without damaging the original file in the slightest.
In the weaponisation section above, an attacker will create a file that contains malware very cleanly. Naturally, as soon as a file is cleaned, one of its structures is lost, and the file becomes corrupted, rendering it impossible to open, even if it has been cleaned.

However, at the third stage of the process, the central CDR technology guarantees that the file remains unaffected without tampering or modifying the information, so users can access 100% of the file's contents.

Implementing CDR technology presents challenges that need to be addressed. The methodology involved in implementing this technology requires comprehensive understanding of files' basic structures and their relationship to the operating system. Accordingly, not all file types are compatible.

Nevertheless, the CDR product's core technology can proficiently handle nearly all standard file types that the public utilises while adequately protecting them from malware.

Integration

Effective solutions require integration of core technology with existing infrastructure, as relying solely on technology will not lead to optimal results.

CDR products can integrate with numerous other anti-virus product engines, where the fundamental CDR technology complements the disinfecting of 0-day malware files. Additionally, CDR products safeguard email and removable devices such as USB sticks. CDR products also offer APIs that secure file upload functions, which are commonly possessed by applications.

Let's imagine a government institution with various applications that require document uploads, including family card copies, ID cards, diplomas, and more. These documents are then uploaded via the API, where the CDR product can clean them in advance. This ensures that the documents are free from malware before they are stored in the government application's storage media.

This is a lengthy explanation concerning CDR technology solutions, whereby this technology can disrupt the cyber kill chain to prevent hackers from successfully advancing to the next stage (exploitation).

Hopefully, this summary can provide an outline of how Content Disarm and Reconstruction technology works, which can be a preferred option to secure sensitive information, particularly for organisations or strategic companies with high risks of being targeted by hackers.

Please contact our sales team at ITSEC Asia for more information regarding Content Disarm and Reconstruction technology and solutions.