Diversity in Computing

Larry T. Chen, larryc@ics.uci.edu

Introduction: Biological Mechanisms of Diversity

A fundamental reason why biological systems work so well is the fundamental property of diversity. In this paper we explore how the concepts of diversity in biological systems may be applied to construct computer systems, for resulting benefits such as adaptibility and surviviability, which are natural objectives in both biological and computer systems.

Diversity in biological systems occurs on many levels, from the diversity of lymphocytes in the immune system, to the diversity of genes in offspring, to the resulting diversity of individuals in a species, and finally to the diversity among different species.  Each mechanisms of diversity at each levels often works to achieve apparently selfish or altruistic goals, but often results in achieving surviviability at a higher emergent level.  Thus, we can see that diversity occurs at multiple layers in biological systems, often with diversity mechanisms in lower layers supporting diversity mechanisms in higher layers.

Mechanisms

Many surprisingly imperfect yet extremely effective mechanisms for diversity exists in biological systems.  We will cite a few examples from the human immune system and heredity.

Randomization

The human immune system is fundamentally supported by a mechanism of diversity using randomization.  The immune system is responsible for detecting and eliminating foreign agents, thus it is required to distinguish agents of "self" (such as the body's own proteins) from agents of "nonself." (such as harmful viruses) [3].  The mechanism by which this occurs is the randomization of lymphocyte receptors, in which a large population of such receptors with random variability are generated continually so that certain receptors will match and bind to potential foreign agents.  The elegance of such a mechanism is that the randomly generated receptors are able to detect previously unseen and unknown foreign agents.

Renewal

Although lymphocyte receptors are randomly generated, there are not enough lymphocyte receptors present at any one time to cover the space of all possible foreign agents.  Thus, the immune uses a process of renewal to continually generate random lymphocyte receptors and eliminate unresponsive ones to increase the probability of detecting a wider variety of foreign agents.  This process of renewal (or lymphocyte turnover) also increases the probability of detecting a foreign agent the longer it is present in the body because it will encounter a greater variety of lymphocyte detectors. [3]

Recombination

The hereditary process in biological reproduction relies on recombination of the parent's genes to produce unique offspring. [4]  The set of genes (chromosome) active in an offspring is a combination of the father's and mother sets of genes.  In addition, any particular gene is subject to a diversifying process by the mechanism of cross-over, where portions of the paternal gene are swapped with portions of the maternal gene to create a new unique gene.

All of the above mechanisms are not perfect.  For example, randomization of lymphocytes to match foreign agents is a probabilistic process.  However, the emergent result of a massive collaboration of apparently imperfect individual mechanisms turns out to be a rather robust system. See Reproduction for a detailed discussion on generating diversity through reproduction.

Natural Selection

Diversity in biological systems is coupled with natural selection to adapt populations towards more favorable configurations.  Natural selection evolves the population of diverse individuals towards favorability by selecting individuals whose behaviors are beneficial to the population as a whole.  See Natural Selection for a detailed discussion on this topic.

Diversity in Computing

What is the basic unit for diversity in computing?

There are many levels of abstraction in computer systems where diversity is possible.  Consider the following: To further develop the computational-biological metaphor, bits and bytes may be thought of as the proteins comprising genes, while sequences of bytes (strings) may be thought of as genes.  An object or agent, then is a result of the combination of strings of bits and bytes, just as biological individuals are a result of a combination of genes.  At the most fundamental level, variation in biological systems arises from the variations in the genes, and the variations in the combination of genes.

Hierarchical organization

A plausible approach is to not limit ourselves to selecting one of bits, bytes, objects, programs, computers, or networks as the fundamental unit for diversity, but to view each as the unit of diversity for a particular level of abstraction, where diversity at any particular level is an emergent behavior resulting from the diversity at the previous (smaller granularity) level.  This model approaches the reality of biological models, where diversity on many different levels all work in conjunction to achieve the ultimate objective, the survivability of life.  In computer systems we may incorporate diversity at different levels to achieve an end goal, such as the surviviability of a computer system or the suitability of a computer system for some function.

Mechanisms for diversity

We will describe possible mechanisms for implementing diversity at various levels of abstraction in computer systems.

Bits, Bytes, Strings

Sequences of bits in conventional computer systems carry very exact functional meaning, and the slight alteration of a sequence of bits may cause drastic errors in the computer system or render them inoperable. This is true in computer communication, where bits across a network medium conform to specific protocols mutually understood by the communicating parties, as well as data representation, where bits stored on disk conform to specific file formats for data, and finally program representation, where bits stored on disk or in memory represent executable code.  In conventional computer systems, there is not much tolerance for any alteration of bits in message, data, or code representation.

We propose that such tolerance for variation in bits in message, data, or code representation may be essential for building robust computer systems.

Objects, agents

Objects and agents are essentially fragments of code (usually components of a program or software system) which have semantic meaning and functional purpose.  There are two levels of diversity possible here.  One is merely bit-level diversity; that is, altering the bit-pattern signatures of objects or agents but retaining the exact functional and behaviorial characteristics.   The second is behaviorial diversity, where a population of similar code fragments possesses minor variations in behavior.

The most promising mechanisms for generating bit-level diversity in code fragments such as objects and agents are code obfuscation techniques [1][2].  These techniques "scramble" the program code but retain exact functional behavior. In [1], techniques such as variable recomposition, structure dissolving, and re-modularization provide ways to "mess-up" code but retain exact behavior.  In [2], techniques such as function composition and homormophic functions may be applied to "rewrite" code but retain the same functionality.

In a population of objects or agents which are deems to perform the same function, minor variations in behavior among each individual may increase robustness of the population as a whole.

In addition, the object/agent may use varying messaging protocols to communicate with other objects/agents. (See next section.)

Messages

Ensuring interoperability of objects/agents while trying to promote diversity in the messages between them may seem to be a contradiction, but there are several possibilities. One possibility is to utilize self-describing messages, where message format and semantics is entirely self-described within the message itself, thus enabling transmission of critical data elements in varying orders and formats. For example, each sent message may contain random placement of parameters, or a random encoding of parameters.  (Of course, a method for determining the placement and encoding of parameters must still be encoded in the message or mutually understood by the communicating parties).  Such a mechanism has been common in the area of cryptography, where random encryption keys and nonce values are used for each message transmission.

Programs, software systems

Computers, hosts

At the computer or host level, diversity in operating systems is the likely the most useful mechanism.  Since many security attacks rely on flaws in particular operating systems (UNIX buffer overflow, Windows NT kernel bugs...), building computer systems based on a diverse set of operating systems may prove beneficial in avoiding one particular flaw from compromising all hosts in the system.

The need for diversity in operating systems also argues for the need for operating systems to be dynamically self-repairing and self-adaptive. [***]  Possibilities include dynamic OS kernels which are kernels modifiable at runtime, dynamic OS services which are activated or migrate based on need, and OS's which automatically detect and isolate faulty services to repair or replace them, with possible evolutionary changes.

Networks

Currently, the Internet is composed of many interconnected networks, held together by critical infrastructure components such as backbones and routers. Although redundancy was incorporated into the design of the Intenet, diversity is defnitely lacking. The Internet uses standard well-known protocols with vary little variation, thus attackers with detailed understanding of Internet protocols may disrupt the Internet fairly easily. [*]  In addition, such attacks may be repeatedly used on different networks since many rely on the same architecture and protocols.

Diversity in networks may involve topological diversity, functional diversity, and even behaviorial diversity.

Self-organizing, self-repairing networks definitely increase topological diversity among networks.  Mechanisms have been proposed to isolate faulty hosts or hosts under denial-of-service attacks by altering the routing tables to isolate the offending host [3]. Such dynamic network reconfiguration provides diversity in network topology, which provides many benefits in network security, such as keeping attacks on particular network components short-lived, or deterring the attacker altogether due to frequent variation in network configuration and topology.

Ad-hoc wireless networks also provide an opportunity to explore topological variation, since they are by definition malleable in network topology.  New mechanisms for self-repair and reconfiguration may be discovered in an ad-hoc wireless context.

Conclusion

The most challenging issue in designing diversity for computing is the apparent contradiction between the call for diversity and the call for interoperability among all components of the computer system, whether its a society of agents or a society of networks in the Internet.  Despite this apparent difficulty, many mechanisms already exist (and more are waiting to be discovered) for implementing diversity in computer systems.

[1]  Fritz Hohl.  "An approach to solve the problem of malicious hosts."
[2]   Tomas Sander and Christian F. Tschudin.  "On Software Protection Via Function Hiding."  2nd International Workshop on Information Hiding.
[3] Anil Somayaji, Steven Hofmeyr, Stefanie Forrest.  "Principles of a Computer Immune System."
[4] Richard Dawkins.  The Selfish Gene.

September 16, 1998