A Filesharing Virus?

Almost two years ago I received a very strange spam. It contained an account (with a password) to a site billed as “the best black board around”. (If you don’t know what this is, the black boards are the Internet underground – sites dedicated to, and run by the cybercrime.) The curiosity quickly took over, I set up a virtual machine (just for the case), put it under heavy monitoring and entered the site from it. To my amazement, the site didn’t tried to infect me.

It turned out to be really big. Hundreds of exploits, and maybe thousands of malware of every kind I knew to exist, for sale. Forums covering any cybercrime, cyber-counterculture and geeky topics I could think of. One of them attracted my attention.

It was some kind of cyber-libertarian place. There appeared to be only one person participating in it. However, s/he was prolific, and had posted hundreds of messages, despite getting no attention from anyone. It seemed that the texts were collected from all over the Net. Some were pulled out of old Usenet newsgroups. Others – from blogs or forums. For some, I couldn’t trace their origin. It was one of these last things that surprised me.

I saved it locally, pulled out of the site and examined it. When I tried to log back in and to investigate further, the site demanded payment for continuing my “expired” access. So much with the spam, the greatest black board etc.

The saved text, however, continued to fascinate me. After more than an year of thinking, I finally decided to post it here, and ask for your opinion. (It is not under my copyright, but I doubt that the authors will come forward and assert their rights over it. 🙂 )

—-

The P2P networks around the world are under attack.

Until recently it was coming from copyright holders who want to prevent the usage of the content claimed to be theirs, and are ready to attack the file sharing as a technology in order to achieve this goal. Currently they are joined by governments who see in the P2P information sharing a dangerous instrument, able to control their abuse of power. If this trend continues, it is only a matter of time when the freedom P2P provides will be taken away.

The most popular P2P networks (BitTorrent, eDonkey, Kazaa, LimeWire) are not very resistant to attack. Some, like Freenet or Gnunet, are more resistant, but less popular, and contain much less content. This might allow the P2P attackers to take down the popular P2P first, and most of the available content with them. Then they may shift to the more resistant ones, using the gained experience, and ultimately destroy all of them.

To survive, the P2P networks need strenghtening, enrichment and better defense. For this, a tool is needed that:

– is abundant in Internet, and constantly gets more and more foothold
– shares content via as many as possible popular P2P networks, including the ones most resistant to attack
– stores a very large amounts of content in many places in Internet
– copies content between the different P2P networks (thus enriching automatically the more resistant ones)
– does all this without the need for human participation
– can do all this without the knowledge of the resource owners
– acts non-predictably, to avoid creating patterns for detecting and cleaning it

The ideal software for this goal will be a computer virus that:

– infects in a large number of ways, and is hard to find and disinfect
– has no payload other than the P2P ability
– automatically exchanges content, found in the P2P networks
– stores the exchanged content hidden and encrypted
– allows programmers to enhance it without compromising its stealth and resistance
– creates no incentive for the cyber gangsters to use it

Follows a blueprint of such a virus. It is based on leaked cyberwarfare research, and modified for the purpose of filesharing.

The first versions will probably implement only a part of the blueprint; later ones might implement all, and add more.

—-

– Programming

The virus will be programmed in C, with some sections in assembly language.

The source code must be licensed under a BSD-type license, in order to remove the copyright obstacles before improving it.

– Structure

— Loader

A small part of the virus code, which decrypts, uncompresses and runs the rest of it. Will be heavily obfuscated (uniquely for each copy), to make the virus detection harder.

All other parts will be compressed and encrypted with a key unique for each copy of the virus.

— Obfuscator

A module that is able to heavily obfuscate a binary code in random, non-repeating ways that produce different result every time. Used to obfuscate the loader while making a copy of the virus.

— Encryptor / Compressor

A module that assembles, compresses and encrypts the virus body, sans the loader. It also encrypts / decrypts the stored or transferred content on need, and handles the signing / checking of contents etc.

— Infector

A module that is responsible for installing the virus on the machine it was run. May have sub-modules that check if the machine is already infected, check for different OS versions and apply different methods of installing.

— Attacker

A module that propagates copies of the virus. Will have sub-modules that apply different methods of attack (eg. network infection, removable devices infection, sending the virus as e-mail, etc).

— File sharer

A module that implements the file sharing. Will have sub-modules for:

— Network handling

Will determine what transfer speed will be quick enough to effectively share content, but not enough to noticeably clog the machine network bandwidth. If available, may use transfer priorities, setting the file sharing at the lowest priority. Sometimes, for random periods, will not follow these rules (see “incomplete disclosure” below).

Will also determine what P2P protocols can be used on this machine, what may trigger the firewall, etc. If possible, may try to stealthly declare itself as an exception in the firewall config.

Will have plugins for the different file sharing protocols, and will manage them. (Sometimes, for random periods, may deactivate plugins. Each plugin will be (de)activated separately from the others: there might be periods when all are active, or none is. See “incomplete disclosure” below.)

— Content cache handling

Will implement the cache as hidden from the machine owner. Will determine what part of the disk space may be used for it without disrupting the machine work, and risking being noticed. May dynamically grow or shrink the cache, depending on the needs of the filesharing network and the machine usage stealth.

Will handle the process of the caching. Will encrypt all stored content with a hidden random key, so that the machine owner will not be able to know what is cached on his machine (and this be not legally liable for the sharing done). If possible, will hide the key well enough to prevent node raiders (authorities etc.) from decrypting the contents (and thus making impossible to persecute the machine owner, as there is no proof that he stores illegal content).

Will cache some of the content that is indirectly routed through this node, thus increasing the content availability without making extra transfer requests. This will decrease the network traffic, and the attacker’s ability to identify the true sender or receiver of the content, even if the node is raided.

Will make a part of the cached content units unavailable for sharing (requests, searches etc), for randomly selected periods of time, different for each content unit. This is a part of the incomplete disclosure politics. Since the network is expected to be big enough, this will not compromise significantly the content availability.

— Transfer handling

On request for content that is present, will send it, unless the content is currently made unavailable.

Sometimes will check not only the content store, but also will scan the machine for content that matches the request, and may send it.

On request for content that is not present or is temporarily unavailable, in some cases will refuse it, and in some will request the content from another node, and re-send it, this enabling indirect routing, and effectively hiding the sender and the receiver one from another. Will check for and prevent request looping. May route indirectly even some content that is available on the node, as a part of the incomplete disclosure. Increasing the percent of the indirect transfers, while decreasing the number of the nodes a content is requested from, may make harder the identification of the node that actually requests the content. Technologies similar to those of the TOR network may be implemented.

Will sometimes scan the machine for popular types of content, and on discovering content that is not in the file sharing networks, may adopt some of it into the content store.

— Search handling

On search request, will declare whether the pattern is found locally. (In a percent of the cases, the pattern will be mandatorily declared as not found. Some content units will always be reported as not found on search, but may be transferred on request for content. This is a part of the incomplete disclosure.)

Will resend the request to a number of other nodes (sometimes even if the content is found locally).

Will refuse to process a search request that has already passed through this node (to avoid request looping).

— Rating handling

Will determine which network content need to be cached, and which are abundant enough. Will automatically request for download some content that need caching. If a content is too abundant, in some cases may mark it for deletion from the local cache. (Some randomly selected contents will never be marked for deletion: thus, even a large amount of ‘insiders’ falsely claiming to have the content will not be able to completely flush it from the network. This is a part of the ‘incomplete disclosure’ policy.)

Will try its best to determine if a content is dummy, and should be dropped from the network. Will do that in an ‘insider-resistant’ way, so that even a big number of ‘insiders’ voting a real content as dummy (and other ‘insiders’ falsely claiming to have this content) will not be able to eliminate it completely. Every node will make the decision for every content separately.

Will try its best to determine ‘insider’ type attacks (where a large number of dummy ‘nodes’ try to flush a content from the network in some way, or to flood the network with dummy contents). May block these nodes from interactions, or decrease their rating. Every node will make these decisions separately.

— Other

Misc other modules may be added. Some examples follow.

— Exposed virus module

On some machines the virus might be written also in an exposed, pre-designated well-known place. Thus, if the users wish to keep it, they will able to back it up before cleaning, to infect the machine back after this, and even to re-distribute it, if they want to.

— Exposed source module

The virus may carry with itself its source code, in compressed form. If it finds on the machine proof that the owner is a programmer (eg. C/C++ compiler), in some cases may drop the virus source code in a predefined place, in unencrypted form. This will enable some programmers to improve the code, and produce new, improved or simply different versions, thus making harder the overall virus cleaning.

— Forbidden types of modules

Some types of modules are forbidden:
– Modules that allow any form of remote control (eg. botnet software): their action may prompt the machine owner to clean the virus.
– Modules that implement any interface to the local machine, incl. filesharing: they may compromise the virus secrecy, may compromise the machine owner’s ability to deny knowledge of the sharing, and may deter some machine owners from using also a standard filesharing software, and enriching the local content through it.
– Modules that exchange virus modules or plugins (the nature of the process makes relying on any trust mechanism, including public key infrastructure, too dangerous to be permitted; releasing newer versions will be a better practice)

– Modus operandi

— The incomplete disclosure

In all of its activities, the virus will sometimes intentionally fail to perform. All types of resources it uses will be subject to randomly determined periods of non-usage. Since almost all types of attacks against it and its network rely at some point on its predictable activity, this will disrupt them. Examples follow:

— Incomplete presence

Sometimes the virus may clean itself (and the content cache) from the system, leaving no trace at all. In some cases the virus may leave the content cache on the system, and in some times may even decrypt it, and set it visible. This will give to the filesharers, if caught, the opportunity to pretend that they may have unknowingly had the virus, and that it probably cleaned itself at some point without them noticing it.

— Incomplete activity

For random periods of time, the virus may stop all of its network activity. (The periods may vary from seconds to years.) During these periods it may, at random, also delete the local cache, or even ‘lay dormant’ cleaning itself from the system, but leaving an obfuscated watchdog that will reinstall it at the end of the dormancy period.

— Incomplete content presence

Some content units will be marked as ‘unavailable’, for different randomly selected periods of time. Until this time is over, the unit will be unavailable for any kind of access – it will not be reported as present at the node, will not be transmitted on request (unless via indirect routing from another node), will not be deleted due to overpresence, etc. Since the network is expected to be big enough, this will not diminish noticeably its availability for download. At the same time, attempts to clean it from the network will be hampered: even if successful (possibly using types of attacks that are not anticipated today), sleeping copies will continue to appear and be redistributed. The frequency and duration of the unavailability can be varied to achieve the desired characteristics of the retaining process.

— Incomplete adding

On request, sometimes not only the content cache, but all or a part of the machine storage may be checked for content matching the description. Sometimes this content will be sent. Sometimes it will be included into the cache, without being sent. Sometimes it will be ignored.

— Incomplete request processing

Some requests (for data or search) will be processed. Some (randomly) will not be.

— Incomplete request retransmission

Some requests will be transmitted further. Some will not be. Some will be transmitted only to some of the neighboring nodes. Similarly to the content units, some neighboring nodes will be marked as ‘unavailable’, and no exchange with them will be done for a random period of time, even if they try to connect here.

— Stealth

As with all viruses, measures should be taken to hide the virus from discovering. Among these:

* it should not allow connections from programs running on the same host (at least most of the time), or any other filesharing benefit from it
* its contents should be as hard to find and decrypt as possible (as it is a marker for the virus)

– Content handling

In some P2P networks, eg. BitTorrent, the basic content unit is a form of ‘package’ that may contain any number of files. In most, however, the basic content unit is a file. This creates some problems, as a content may consist of more than one file, and some of the described network functions may require appending extra info (signatures, ratings etc) that cannot be inserted into some types of content files.

On these networks it might be a good idea to use as a basic content unit not a single content file, but an archive that contains all necessary files. Since the goal is not compression but packaging, the most appropriate types of archives are these that offer maximal recoverability. For example, if an archive is not received completely, it must be trivial to extract from it all files that were received completely.

The content unit package must be able to contain:

* the content itself (as many files as needed)
* content description block (content name; descriptors – keywords etc; unique ID – most probably a long checksum; etc)
* error / info loss correction codes (to allow recovery in case of transmission errors or lost parts)
* ratings of the content that are signed with the rating node signature
* other types of files, according to the need

—-

NOTES:

Most of the modules require a lot of work, but are more or less straightforward. The Infector and Attacker modules require a good deal of knowledge about writing viruses.

The work of the network will be subject to a lot of attacks with huge financial and administrative resources behind them. The design of the virus, if properly implemented, makes relatively hard to identify and destroy a crucial amount of nodes. This makes the network destruction harder than the attacks of the content in it. So, most (and most dangerous) attacks will be directed towards the content.

This makes the truly non-trivial module the Rating handling, as it is expected to deal with these attacks. Probably an ‘arms race’ will follow between the attackers and defenders. As the attackers will probably have a lot of resources behind them, the defenders may have to be productive and inventive in order to keep a favorable balance.

Currently, the following types of attacks are considered:

– Eliminating content from the networks

In this attack, one or more of the following methods will be used:

— Identifying and eliminating the nodes that carry it. (Countered with a big number of nodes, across many different jurisdictions, with indirect routing and with incomplete exposure)

— Creating a fake impression (through a big number of fake nodes) that a content is too abundant, and may be freely deleted. (Countered with the setting that on each node some randomly selected contents will not be deleted.)

— Creating a fake rating (through a big number of fake nodes) that a content is dummy, and should be deleted. (Countered with decreasing the rating weight of the external data, and increasing the rating weight of the internal analysis. If possible, retaining or increasing the rating weight of external data that is very hard to falsify or destroy its source.)

— Scaring machine owners into cleaning the virus, by false or intimidating information. (Countered with the stealth work of the virus, that makes plausible the explanation that the machine owner didn’t knew about it, and actually hard to detect it.)

– Flooding the networks with false content

— Injecting into the network (through fake nodes) a big amount of dummy content that effectively drowns the real one. (Extremely hard to be countered. File type analysis can detect files that do not match the content type they declare to be of, or otherwise appear to not contain real content, eg. movies shorter than a minute, or sound files that have the characteristics of noise. However, a better fake is impossible to detect automatically. Most rating based systems can be circumvented, and in fact used to flush off the real content, by big numbers of fake nodes. Currently we are aware of no bulletproof defense against this attack.)

— Injecting into the network (through fake nodes) a big amount of partial or advertising content. (Extremely hard to be countered, exactly like the previous kind of attack. In addition, this attack compromises the P2P network even worse than the previous attack.)

Some defense against the injection attacks may be analyzing the visible, non-hidden content presence and dynamics as a guide. If a content unit is visible on the machine, and stays on it for a long time, even after the human users play it to the end, its rating may go up. Alternatively, if a content unit has been deleted from the machine soon after it became visible there, and especially if deleted immediately after being watched partially, its rating may go down. If both things happen, their indicativeness for the content might be considered higher than otherwise. (Other methods of rating the content may be used, too.)

Other defense might be the ability to add / check signed node ratings of the contents. If a node ratings tend to be often similar to the local ratings, they might be considered a boost to a content rating. (Eg. the local rating of this node might go up.)

Using the rating opinions of other nodes to influence the decisions of a node may be dangerous, as a big number of fake nodes will be able to suppress the real content and to promote the fake one. However, the usage of rating opinions may be beneficial, if the ratings difference for the same content is used to rate the other nodes credibility. The process can be fine-tuned so that the nodes with similar opinions tend to ‘group’, and thus to separate real from dummy nodes. (More complex trust topologies can be achieved by this fine-tuning, too). Other methods of rating the network nodes may be used, too.

If done properly, the node rating will isolate to an extent the real nodes and the fake nodes into separate groups. This will decrease the ability of the fake nodes to influence the rating of the content, to flood the network with fake or spam content, and may be used to limit their participation in the network.

Not every filesharing network supports ratings. Most of them, however, have support for content rating, and some have, or may be enhanced with node rating. The data from them can be used to manage the content and node ratings for the entire node.

5 thoughts on “A Filesharing Virus?

  1. Pingback: Grigor Gatchev – A Weblog » Blog Archive » Flame, кибервойните… и още нещо

Leave a Reply

Your email address will not be published. Required fields are marked *