Diversity in Computing
Larry T. Chen, larryc@ics.uci.edu
Introduction: Biological Mechanisms of Diversity
A fundamental reason
why biological systems work so well is the fundamental property of diversity. In
this paper we explore how the concepts of diversity in biological systems may be
applied to construct computer systems, for resulting benefits such as
adaptibility and surviviability, which are natural objectives in both biological
and computer systems.
Diversity in biological systems occurs on many levels, from the diversity of
lymphocytes in the immune system, to the diversity of genes in offspring, to the
resulting diversity of individuals in a species, and finally to the diversity
among different species. Each mechanisms of diversity at each levels often
works to achieve apparently selfish or altruistic goals, but often results in
achieving surviviability at a higher emergent level. Thus, we can see that
diversity occurs at multiple layers in biological systems, often with diversity
mechanisms in lower layers supporting diversity mechanisms in higher layers.
Mechanisms
Many surprisingly imperfect yet extremely effective
mechanisms for diversity exists in biological systems. We will cite a few
examples from the human immune system and heredity.
Randomization
The human immune system is fundamentally supported by a
mechanism of diversity using randomization. The immune system is
responsible for detecting and eliminating foreign agents, thus it is required to
distinguish agents of "self" (such as the body's own proteins) from agents of
"nonself." (such as harmful viruses) [3]. The mechanism by which this
occurs is the randomization of lymphocyte receptors, in which a large population
of such receptors with random variability are generated continually so that
certain receptors will match and bind to potential foreign agents. The
elegance of such a mechanism is that the randomly generated receptors are able
to detect previously unseen and unknown foreign agents.
Renewal
Although lymphocyte receptors are randomly generated, there are
not enough lymphocyte receptors present at any one time to cover the space of
all possible foreign agents. Thus, the immune uses a process of renewal to
continually generate random lymphocyte receptors and eliminate unresponsive ones
to increase the probability of detecting a wider variety of foreign
agents. This process of renewal (or lymphocyte turnover) also increases
the probability of detecting a foreign agent the longer it is present in the
body because it will encounter a greater variety of lymphocyte detectors. [3]
Recombination
The hereditary process in biological reproduction relies
on recombination of the parent's genes to produce unique offspring. [4]
The set of genes (chromosome) active in an offspring is a combination of the
father's and mother sets of genes. In addition, any particular gene is
subject to a diversifying process by the mechanism of cross-over, where
portions of the paternal gene are swapped with portions of the maternal gene to
create a new unique gene.
All of the above mechanisms are not perfect. For example, randomization
of lymphocytes to match foreign agents is a probabilistic process.
However, the emergent result of a massive collaboration of apparently imperfect
individual mechanisms turns out to be a rather robust system. See Reproduction
for a detailed discussion on generating diversity through reproduction.
Natural Selection
Diversity in biological systems is coupled with
natural selection to adapt populations towards more favorable
configurations. Natural selection evolves the population of diverse
individuals towards favorability by selecting individuals whose behaviors are
beneficial to the population as a whole. See Natural
Selection for a detailed discussion on this topic.
Diversity in Computing
What is the basic unit for diversity in computing?
There are many levels
of abstraction in computer systems where diversity is possible. Consider
the following:
- Bits, Bytes, Strings
- Objects, agents
- Messages
- Programs, software systems
- Computers, hosts
- Networks
To further develop the computational-biological metaphor,
bits and bytes may be thought of as the proteins comprising genes, while
sequences of bytes (strings) may be thought of as genes. An object or
agent, then is a result of the combination of strings of bits and bytes, just as
biological individuals are a result of a combination of genes. At the most
fundamental level, variation in biological systems arises from the variations in
the genes, and the variations in the combination of genes.
Hierarchical organization
A plausible approach is to not limit ourselves
to selecting one of bits, bytes, objects, programs, computers, or networks as
the fundamental unit for diversity, but to view each as the unit of diversity
for a particular level of abstraction, where diversity at any particular level
is an emergent behavior resulting from the diversity at the previous (smaller
granularity) level. This model approaches the reality of biological
models, where diversity on many different levels all work in conjunction to
achieve the ultimate objective, the survivability of life. In
computer systems we may incorporate diversity at different levels to achieve an
end goal, such as the surviviability of a computer system or the suitability of
a computer system for some function.
Mechanisms for diversity
We will describe possible mechanisms for
implementing diversity at various levels of abstraction in computer systems.
Bits, Bytes, Strings
Sequences of bits in conventional computer systems
carry very exact functional meaning, and the slight alteration of a sequence of
bits may cause drastic errors in the computer system or render them inoperable.
This is true in computer communication, where bits across a network medium
conform to specific protocols mutually understood by the communicating parties,
as well as data representation, where bits stored on disk conform to specific
file formats for data, and finally program representation, where bits stored on
disk or in memory represent executable code. In conventional computer
systems, there is not much tolerance for any alteration of bits in message,
data, or code representation.
We propose that such tolerance for variation in bits in message, data, or
code representation may be essential for building robust computer systems.
Objects, agents
Objects and agents are essentially fragments of code
(usually components of a program or software system) which have semantic meaning
and functional purpose. There are two levels of diversity possible
here. One is merely bit-level diversity; that is, altering the bit-pattern
signatures of objects or agents but retaining the exact functional and
behaviorial characteristics. The second is behaviorial diversity,
where a population of similar code fragments possesses minor variations in
behavior.
The most promising mechanisms for generating bit-level diversity in code
fragments such as objects and agents are code obfuscation techniques
[1][2]. These techniques "scramble" the program code but retain exact
functional behavior. In [1], techniques such as variable recomposition,
structure dissolving, and re-modularization provide ways to "mess-up" code but
retain exact behavior. In [2], techniques such as function composition and
homormophic functions may be applied to "rewrite" code but retain the same
functionality.
In a population of objects or agents which are deems to perform the same
function, minor variations in behavior among each individual may increase
robustness of the population as a whole.
In addition, the object/agent may use varying messaging protocols to
communicate with other objects/agents. (See next section.)
Messages
Ensuring interoperability of objects/agents while trying to
promote diversity in the messages between them may seem to be a contradiction,
but there are several possibilities. One possibility is to utilize
self-describing messages, where message format and semantics is entirely
self-described within the message itself, thus enabling transmission of critical
data elements in varying orders and formats. For example, each sent message may
contain random placement of parameters, or a random encoding of
parameters. (Of course, a method for determining the placement and
encoding of parameters must still be encoded in the message or mutually
understood by the communicating parties). Such a mechanism has been common
in the area of cryptography, where random encryption keys and nonce values are
used for each message transmission.
Programs, software systems
Computers, hosts
At the computer or host level, diversity in operating
systems is the likely the most useful mechanism. Since many security
attacks rely on flaws in particular operating systems (UNIX buffer overflow,
Windows NT kernel bugs...), building computer systems based on a diverse set of
operating systems may prove beneficial in avoiding one particular flaw from
compromising all hosts in the system.
The need for diversity in operating systems also argues for the need for
operating systems to be dynamically self-repairing and self-adaptive.
[***] Possibilities include dynamic OS kernels which are kernels
modifiable at runtime, dynamic OS services which are activated or migrate based
on need, and OS's which automatically detect and isolate faulty services to
repair or replace them, with possible evolutionary changes.
Networks
Currently, the Internet is composed of many interconnected
networks, held together by critical infrastructure components such as backbones
and routers. Although redundancy was incorporated into the design of the
Intenet, diversity is defnitely lacking. The Internet uses standard well-known
protocols with vary little variation, thus attackers with detailed understanding
of Internet protocols may disrupt the Internet fairly easily. [*] In
addition, such attacks may be repeatedly used on different networks since many
rely on the same architecture and protocols.
Diversity in networks may involve topological diversity, functional
diversity, and even behaviorial diversity.
Self-organizing, self-repairing networks definitely increase topological
diversity among networks. Mechanisms have been proposed to isolate faulty
hosts or hosts under denial-of-service attacks by altering the routing tables to
isolate the offending host [3]. Such dynamic network reconfiguration provides
diversity in network topology, which provides many benefits in network security,
such as keeping attacks on particular network components short-lived, or
deterring the attacker altogether due to frequent variation in network
configuration and topology.
Ad-hoc wireless networks also provide an opportunity to explore topological
variation, since they are by definition malleable in network topology. New
mechanisms for self-repair and reconfiguration may be discovered in an ad-hoc
wireless context.
Conclusion
The most challenging issue in designing diversity for
computing is the apparent contradiction between the call for diversity and the
call for interoperability among all components of the computer system, whether
its a society of agents or a society of networks in the Internet. Despite
this apparent difficulty, many mechanisms already exist (and more are waiting to
be discovered) for implementing diversity in computer systems.
[1] Fritz Hohl. "An approach to solve the problem of malicious
hosts."
[2] Tomas Sander and Christian F. Tschudin. "On
Software Protection Via Function Hiding." 2nd International Workshop on
Information Hiding.
[3] Anil Somayaji, Steven Hofmeyr, Stefanie
Forrest. "Principles of a Computer Immune System."
[4] Richard
Dawkins. The Selfish Gene.
September 16, 1998