Recent Accomplishments


1. Protocols and Middleware for Supporting Real Time Resource Management

2. Protocols and Infrastructure for Supporting Fault Tolerance

3. Methodology and Protocols for Supporting Security

4. Protocols for Supporting Scalable Real-Time Multicasts

5. Protocols for Supporting Mobile Wireless Envinronemnts

6. Bio-Networking Architectures

7. Middleware for Supporting Application Programs with End-to-end QoS

8. Tools for Supporting Network Simulation/Emulation


1. Protocols and Middleware for Supporting Real Time Resource Management

Real-time CORBA support (Schmidt): Developers of mission critical distributed real-time applications have historically used low-level, non-standard network programming interfaces, application-specific protocols, and customized real-time scheduling mechanisms. This legacy approach mixes real-time network programming throughout application programs, which makes their development non-portable, tedious, and error-prone, and requires highly skilled developers. To address these problems, we have developed and optimized the TAO Object Request Broker (ORB), which is an open-source, standards-compliant implementation of the Real-time CORBA specification that is being used on hundreds of research projects and commercial products world-wide. TAO is designed to meet end-to-end application QoS requirements by vertically integrating distributed object computing middleware with OS I/O subsystems, network protocols, and network interfaces, as shown in the following figure.

 

TAO was the first real-time ORB endsystem to support end-to-end QoS guarantees over high-speed networks (such as ATM), embedded system interconnects (such as VME and Fibrechannel), and QoS-enabled network protocols (such as IntServ, DiffServ, and SCTP).  

The following is a synopsis of the key accomplishments stemming from the Real-time CORBA portion of the PERC project:

  1. An ORB Core that supports deterministic real-time scheduling and dispatching strategies.  The TAO ORB Core concurrency models minimize context switching, synchronization, dynamic memory allocation, and data movement.

  2. An active demultiplexing strategy that associates client requests with target objects in constant time, regardless of the number of objects and operations.

  3. A highly-optimized CORBA GIOP protocol engine and an IDL compiler that generates compiled and/or interpreted stubs and skeletons, which enables applications to make fine-grained time/space tradeoffs. 

Exploitation of LRD for resource and traffic control (Hou): Central to exploitation of the abundant correlation structure for traffic control and measurement is (i) prediction of the future traffic at multiple time scales based on recent measurements and (ii) use of the prediction results in the decision making of traffic control/measurement. We have carried out the research along the following two major directions: on the one hand, we investigate both fractional-model-based and non-fractional-model-based predictors and determine which predictor(s) render the best performance in predicting Internet traffic. The fractional brownian motion model, the fractal ARIMA model, and the multifractal wavelet model are representatives of the former, and the linear minimum mean square error predictor falls in the latter.  This study is conducted with performance measures of accuracy, ease of deployment, computational complexity, and adaptability, and is based on analytical reasoning, ns-2 simulation, and empirical experiments with the large amount of traffic measurement and traces available at the Cooperative Association for Internet Data Analysis (CAIDA). On the other hand, we have investigated how we can harness traffic prediction (obtained with appropriate choices of traffic predictors) in traffic control/measurement in the following three research thrust areas:

(1)   Leveraging traffic predictability in active queue management: We show that the correlation structure present in long-range dependent traffic can be detected on-line and used to predict the future traffic.  We then devise a novel scheme, called TCP with traffic prediction (TCP-TP), that exploits the prediction result to infer, in the context of AIMD steady-state dynamics, the optimal operational point at which a TCP connection should operate.  Through analytical reasoning, we show that the impact of prediction errors on fairness is minimal.  We also conduct ns-2 simulation and FreeBSD 4.1-based implementation studies to validate the design and to demonstrate the performance improvement in terms of packet loss ratio and throughput attained by connections.

(2)   Exploring use of traffic predictability to improve TCP congestion control: We figure in, with the objective of stabling the instantaneous queue length, the prediction results in the calculation of the packet dropping probability in active queue management (AQM). The resulting scheme is termed as predictive AQM (PAQM). Through analytical reasoning, we show that PAQM is a generalized version of the well-known AQM scheme, random early detection (RED) that takes the future arrival rate as a new dimension of congestion index. By stabilizing the queue at a desirable level with consideration of future traffic, PAQM enables the link capacity to be fully utilized, while not incurring excessive packet loss ratio. Through ns-2 simulation, we compare PAQM against existing AQM schemes with respect to different performance criteria, and show that under most cases PAQM outperforms SRED in stabilizing the instantaneous queue length, and adaptive virtual queue (AVQ) in reducing packet loss ratio and better utilizing the link capacity.}  

  back to top


 2. Protocols and Infrastructure for Supporting Fault Tolerance

Interception-Based Fault Tolerant CORBA Infrastructure (Melliar-Smith and Moser): We have designed and implemented the Eternal system that provides fault tolerance for CORBA applications using the interception approach.  The Eternal system intercepts the IIOP messages that would have been sent by the CORBA ORB and, instead, multicasts them to the replicas of an object using a reliable totally-ordered multicast protocol.  Eternal filters duplicate operations, and controls the dispatching of operations to the multiple threads of an object.  It logs messages and checkpoints, and transfers the state of a replica to a new or recovering replica, and logs and replays messages. 

Pluggable Fault Tolerant CORBA Infrastructure (Melliar-Smith and Moser): We have designed and implemented a Pluggable Fault Tolerant CORBA Infrastructure that provides fault tolerance for CORBA applications by utilizing the Pluggable Protocols Framework (PPF) that is available for most CORBA ORBs.  The approach does not require modification to the CORBA ORB, and requires only minimal modification to the application. Moreover, it avoids the difficulty of retrieving and assigning the ORB state, by incorporating the fault tolerance mechanisms into the ORB.

The Pluggable Protocols Framework separates the messaging and network protocols from other parts of the ORB core and from the application objects.  The PPF allows the network protocol to be replaced, and also allows the messaging protocol to be replaced. Thus, the PPF makes it possible to develop CORBA applications for, and to deploy them in, environments for which the standard IIOP protocol is not appropriate.  The PPF also provides CORBA applications with improved quality of service by enabling the use of customized protocols that are tailored to those applications and their environments.

In the Pluggable Fault Tolerant CORBA Infrastructure, we use the PPF to replace the network protocol with a reliable totally-ordered multicast protocol. The infrastructure achieves performance that is better than that of existing Fault Tolerant CORBA systems, while providing strong replica consistency and supporting a wider range of applications.

Totem Redundant Ring Protocol (Melliar-Smith and Moser): We have developed the Totem Redundant Ring Protocol (Totem RRP) that supports three different styles of network replication: Active, Passive and a hybrid Active-Passive network replication style. 

In active replication, all messages and tokens are sent over all networks at the same time. All data are received multiple times. The bandwidth consumption increases linearly with the number N of networks. The maximum throughput equals the throughput of the slowest network. The system is able to mask the loss of a message on up to N-1 networks without any message retransmission delay.  In passive replication, messages are sent alternately over one of the available networks. The bandwidth consumption equals the bandwidth consumption of an unreplicated system. In the fault-free case, the maximum throughput equals the sum of the throughputs of all networks. If one of the networks fails, the maximum throughput is reduced. If a message is lost, Totem must wait until the message has been retransmitted. Active-passive replication is a mixture of active replication and passive replication. Every message or token is sent over K networks simultaneously, 1<K<N. The bandwidth consumption increases K-fold. The system is able to mask the loss of a message on up to K-1 networks without any message retransmission delay.  

Performance measurements show that a system using the Totem RRP with Active replication experiences a decrease in effective throughput, as is to be expected.  In contrast, in a system that uses the RRP with Passive replication, the throughput exceeds the throughput of the unreplicated system while being more resilient to network faults.

Latency of Replication and Multicast Protocols (Melliar-Smith and Moser): We have performed experimental measurements of the Pluggable Fault Tolerant CORBA Infrastructure and a reliable totally-ordered multicast group communication protocol.  We have measured probability density functions for the latency from a request transmitted by a client, through service by a (possibly replicated) server, to reply delivered to the client.  These measurements show that, while the logical token ring protocol does achieve excellent throughput, it has an adverse effect on latency.  The 120 microsecond latency measured for an unreplicated server is increased to 600 to 900 microsecond for a three-way replicated server.  With an alternative, rotating sequencer protocol, the latency is still about 400 microseconds for a three-way replicated server.  This increased latency is not significant in enterprise applications operating over a wide-area network, but is undesirable in real-time embedded applications.

Based on insights derived from these measurements, we have developed a new replication strategy using semi-active replication.  Messages are processed immediately at the primary replica and that message order is communicated to the backup replicas so that they can mimic the behavior of the primary replicas.  Considerable care is required to ensure that a faulty processor coupled with the loss of messages cannot leave the system in an inconsistent state.  Preliminary measurements show a latency of 170 microseconds, which we hope to reduce to about 150 microseconds.

Hub Implemented Protocols (Melliar-Smith and Moser): We are currently investigating novel implementation strategies for fault tolerant protocols.  There have been recent efforts to move the protocol stack from the host computer to the network interface card.  We are investigating moving parts of the protocol stack one step further to the Ethernet switch.  It is still necessary to operate a simple protocol between the switch and the host computer, currently TCP, but many of the difficult issues are more easily handled at the switch, including multicasting, message ordering, retransmission and flow control.  The primary challenge is whether it is possible to achieve adequate performance from the Ethernet switch while it performs these additional functions.  Another challenge is replication of the Ethernet switch to avoid a single point of failure, while still ensuring consistency of the state of the two switches.

back to top


3. Methodology and Protocols for Supporting Security

Protocol Security Specification (Levitt): Traditionally, automatic detection of intrusions is done following one of two methods. First, intrusions can be recognized by looking for features that match a pattern associated with malicious behavior. One collects, in advance, the signatures of suspicious events and monitors for them. While this can detect attacks with fairly low false alarms, this has the disadvantage of not being able to handle previously unseen types of attacks. In the second method, a statistical profile of normal behavior is accumulated over time. Then, alerts are generated for every unusual event not consistent with the profile. While this method can potentially detect new and previously unseen attacks, it triggers indiscriminately on any new behavior, malicious or not, resulting in unacceptably large false-alarm rates.

A new, third approach pioneered by UC Davis (Calvin Ko, now at NAI labs, Karl Levitt and Manfred Ruschitzka) is a specification based approach to intrusion detection. Using this method, the correct, allowed behavior of a system is specified. Events not matching the specification generate an alert. New attacks will be detected, and unusual, but perfectly correct behavior won’t generate spurious false alarms. We are applying the specification-based approach to network protocols to monitor all aspects of the system’s security. To this end, UC Davis has been developing specifications for both low-level network protocols as well as a specification for application level protocols. Both of these efforts will be described below.

(a) A Specification for the ARP Protocol: At the application level, computers communicating over the Internet use ip addressing. Physically, however, computers communicate using devices with hardwired hardware addresses. The ARP protocol allows one to map application level ip-addresses into the physical address that the hardware needs to send the electronic signal. Applications using IP address information as a basis for trust, then, can be fooled into communicating with an untrusted machine if this mapping mechanism can be corrupted.

A machine attempting to connect to another will broadcast an ARP address request to all other machines on its network. The message is of the form,

who-has <destination ip address> Tell <source hardware address>.

The remote host with the requested destination ip address will reply with his hardware address to the specific machine making the query. The reply looks like,

<destination ip address> is-at <destination hardware address>.

Once a machine’s query has been answered it will typically keep a copy of the mapping in its local ARP cache, alleviating the need to continually query for each message. Entries in the cache will timeout after a specified time interval after which a remote machine will need to be requeried for further communication. In this manner, a network of computers automatically handles the physical communications of applications using ip addressing, even when machines leave or join the domain.

Various methods are currently used to protect the ARP mapping protocol. Static tables of hardware to ip address maps are kept on a central authoritative server that handles all address requests. When protocol mappings with contradictory mappings are detected on the network, and alarm is sent. This, however, defeats the automatic support for dynamic address mapping that ARP provides.

We have developed a specification for the ARP protocol itself. With this single specification, all known methods of ARP cache poisoning attacks, spanning multiple versions and operating systems, are detectable as incorrect and, therefore disallowed ARP protocol transactions. We specify correct ARP protocol behavior using a state transition diagram, shown in Figure 1. From the initial state, a ARP request on the network asking for an ip address mapping, moves the system to the “Reply_wait” state for that particular address. Additional address requests coming while in this state have no effect since it is perfectly valid for machines to resend the request if they weren’t answered in a timely fashion. After the response message is seen, the system moves to the “Cached” state. Finally, after the timeout period expires, the system moves back to the initial state.

Figure 1: A state transition diagram specification for the ARP protocol.

We have also implemented a prototype live intrusion detection system using this specification to monitor the network. The system is implemented as a pre-processor to the snort open source IDS. All ARP traffic on a local network is used to move the system through the series of states, keeping track of valid IP address to MAC address mappings.

Due to the relatively small volume of ARP traffic in general, our system is easily able to handle the ARP traffic generated by the ~50 machines in our monitored domain.

To test whether this single specification is sufficient to handle known attacks, we have identified four different types of ARP cache poisoning.

  1. An unsolicited response: Some OS’s blindly accept ARP replies whether or not they have sent a previous request. Forging a fake response message poisons that machines ARP cache.

  2. Malformed request: Oftentimes, a third party machine will cache the information contained in a third party broadcast request even though it isn’t involved in the protocol transaction. Sending requests with fake source information poisons these machines cache.

  3. Bogus response. Attacker waits to see a request on the network and exploits the race condition vulnerability by replying first with his faked mapping.

  4. Both fake request and bogus response. Attacker fakes both the request and response to poison those machines implementing a solution to the unsolicited malformed request and unsolicited response problem.

Using our single specification for correct ARP behavior, our IDS system is able to identify all of these variants of the ARP cache poisoning attack. There are, however, considerable false reports due to DHCP automatic allocation of IP addresses. To remove these false alarms, we are currently developing a specification for DHCP itself. This specification would not only monitor for misuse of the DHCP protocol, but DHCP specific transitions will feed into the ARP protocol specification, indicating that a particular ARP message is perfectly valid in the DHCP context.

(b) Specifying the Network Behavior of HTTP:  As an example of a complex application level protocol, we have also worked on specifications for the HTTP protocol. One of the resources used by system administrators for preventative security maintenance is the vulnerability database.  With access to one of these databases (e.g. www.securityfocus.com) it is possible to identify by vendor and version number vulnerabilities for applications and operating systems that they currently maintain.  Ideally, for any relevant vulnerability that they find the problem can be addressed (with patches or by disabling the service) or at least they will know what attacks to be alert for.  Potential attackers also have access to these databases.  Identifying a target accurately, therefore, gives the attacker a great advantage.  It is common for an attacker to use various probing tools to learn about a system's operating system, version level and configuration so that they can identify targets for exploitation.  One tool for doing this, NMAP, uses vagaries of a system's TCP/IP stack to deduce the identity of a operating system.

One of the most common applications working on the Internet at this time is the web server.  Web servers use HTTP (Hypertext Transfer Protocol) to receive and respond to a wide variety of requests.  Obviously they are used for fetching HTML as the name suggests, but other uses for HTTP are popping up all of the time (distributed authoring (DAV), shared disk space (www.driveway.com), etc).  Because of its ubiquity, it is naturally a target of attacker activity.  If the attacker can determine the type of server and its version number then they can efficiently zero in on possible vulnerabilities.  Currently this information is typically gained by trusting the servers volunteered self description (e.g. this can be done with a netcat one-liner).

echo -e ``GET / HTTP/1.0\n\n'' | nc www.someserver.com 80 | grep Server

Generally this is good enough since at this time most servers reply accurately with this information.  Add-on packages and configuration information is often advertised in this field as well.

We believe that this situation will change.  Since the purpose of protocols is to allow transactions between systems without having a priori information about the vendor or version of the interlocutor on the other end of the communication, it shouldn't be necessary to volunteer this application-specific information.  In fact, because of this, the HTTP/1.1 RFC recommends that giving out server vendor and version number should be a configurable feature. Some sites have already started hiding this information (e.g. www.yahoo.com, www.securityfocus.com).  This trend is bound to increase since it really only gives an advantage to those trying to remotely acquire information about a system.  Simply removing this line from the headers, however, will not provide the desired anonymity for the server.  Our HTTP specification monitoring shows that it is possible to identify with a high degree of confidence the HTTP server vendor and version level by examining only the details of HTTP based communication.  

Using Properties of Network Topology to Detect Malicious Routing Behaivor (Levitt): When considering the secure performance of complex networks, much effort has been placed on developing authentication and cryptographic protections that are essential for such operation. But these approaches are not sufficient, especially in the case of compromised routers. We believe that along with these, it is important to examine the routing processes at the core of these networks for their inherent properties in controlling and dispersing information. A goal of this study is to identify characteristics in the routing that might render some routers more in control of the network, or conversely, more susceptible to degraded performance due to congestion.

In our research, we present a methodology to abstract an intrinsic feature of computer network topology, i.e., the centrality of any one node and the centrality of parts of the network as a snapshot of the dynamic behavior.  Centrality may be defined as capturing the structurally central part of a network. In this project we propose to study the role of centrality analysis in abstracting characteristics of network behavior within the context of routing processes, and especially within the context of malicious routing behavior.

Specific research areas within this component of PERC include:

a)      Analysis of Centrality within Computer Networks: Centrality has been an important structural attribute of social networks. Centrality may be defined as capturing the structurally central part of a network. We believe that these notions in social network analysis resonate strongly when considering certain aspects of wired and wireless networking. Our research suggests that centrality is an important attribute of complex networks, especially in monitoring for intrusions, since it may be used to describe the potential that a node has (or set of nodes have) for control of the communication process. In applying the ideas to the computer networking domain, however, we are addressing a highly dynamic environment. Also, different routing protocols apply different mechanisms for distributing information; they employ differing aspects of the network state information.  Our research, therefore, is developing these notions within the core of an incremental approach, one that constantly refines the relational behavior in the network.  In doing so, we will seek to integrate the centrality computation with the local protocol behavior. The limitations of the protocol to capture state information will then dictate the nature of the centrality algorithms that we will develop - the heuristics we will use, or the approximations we will enforce. 

b)      Centrality Analysis within Specific Routing Protocols: Given known vulnerabilities, we believe that centrality analysis will enable us to identify the changes in the control of flow of routing in the network.  In our current research, we are

  1. developing algorithms for computing centrality in light of the computational and scalability issues and examining  strategies that make the computation more tractable,

  2. integrating these algorithms within existing protocols such as OSPF (in intra-domain routing protocols) and DSR and  AODV (in ad hoc mobile routing protocols),

  3. employing the results of this in detecting malicious routing within these protocols, and

  4. conducting preliminary simulations using ns-2 with the link state protocol. This will be extended to also include experiments with ad hoc routing protocols such as DSR and AODV.

JIGSAW – A Formal Attack Description Language (Levitt): In order to support detection of unknown attack scenarios and build a framework for reasoning about attacks, we are constructing an abstract causal domain model of computer attack scenarios. Such a model allows us to reason about attacks theoretically and can be used for many purposes including attack detection, vulnerability analysis, attack planning, and new methods of event detection. While such a model is not free from ad hoc construction, it provides a level of formalism not previously seen in intrusion detection.

The following example illustrates a simple multistage attack scenario, and highlights the basis for our domain model. Two computers, kafka and sartre, have a trust relation in that kafka may execute commands on sartre using rsh. An attacker on host spock causes a synflood[1] denial-of-service attack on port TCP/11111 on kafka. This effectively prevents kafka from responding to packets sent to this port. Spock then sends syn packets to the rsh port on sartre and monitors what starting sequence number sartre should use in sending the ack packet. The attacker now uses this information and sends a syn packet to the rsh port on sartre, with the source address and port in the packet forged to be kafka, TCP/11111. Sartre sends the syn/ack packet to kafka, but due to the denial-of-service, kafka is not responding (if it responded it would realize it did not start the connection and reset). The attacker now sends an ack packet, again claiming to be from kafka, and using the sequence number he guessed sartre would have sent to kafka. Assuming the sequence number matches, the packet is accepted. Because the data portion of the packet specifies commands to be executed, this attack provides the capability to execute arbitrary code on the target computer.

This example provides a framework for illustrating the primary relations of our domain model. First, each of the various events has preconditions which must be met for the event to occur. For the forged packet send to succeed, both the correct sequence number must be guessed and kafka must be prevented from responding to the forged packets. In our example, for the denial-of-service we used synflood, but any attack that provided this capability will suffice; others include packet-storms or having someone unplug the network cable to kafka. Rather than exploit rsh, a similar attack on NIS password authentication would work. Alternately, rather than using connection spoofing to provide the capability of executing arbitrary code, we could have exploited a vulnerability such as a stack-overflow.

 

The particular system configuration must also be considered. For rsh spoofing to occur, sartre must be running rsh and an rsh trust relation exist between kafka and sartre. For a Sun rpc vulnerability to be exploited sartre would have to be running Solaris and rpcd. For a packet-storm to be useful, of the mentioned hosts, only kafka can be effected.

In this way, lower-level event provides capabilities that support the preconditions of other events that in turn provide the capabilities required by the preconditions of other events. In this way events link together to form complex scenarios such as a worm that requires the ability to transfer and remotely execute code for its propagation.

From this we see that many combinations of events can singly or in combination provide the preconditions for a specific attack. In some cases these are hierarchical in that the effects of one subsumes the effects of others, or lateral in that they have different but equivalent capabilities. Similarly, the capabilities provided by a successful attack can provide the requirements for many different attacks. In this way attack events form a lattice of relations where any subtree defines an attack scenario.

In addition to capabilities, an attack can provide knowledge about a target system. For example, a port scan will tell the attacker what services are active on the target. The result of the scan is to provide knowledge to the attacker, but it is not required for the attack to succeed. However, the particular domain knowledge must be true for the attack to succeed. Also, domain knowledge is not required for attack detection, but is useful in reasoning under uncertainty, in looking for probes or attack planning.

In our model we bind these pre and post conditions into “concept” objects that form the functional components of the domain. These describe security relevant events that can occur if the proper preconditions exist, and the effects the events have on the system. We are developing the language JIGSAW as a tool for specifying these concepts.

Using this language we have written specifications for a number of low-level attacks and several specific scenarios. Additionally, we have identified a number of concepts which link events without regard to a particular scenario. We identify these concepts by studying historical attacks, and goal-to-vulnerability analysis. While a complete domain theory is always desired it is not likely to ever be achieved. Our hope is that the model can be expanded sufficiently to be effectively complete, i.e. useful in all situations we will encounter. Even with a small, preliminary model we have discovered new combinations of events which lead to novel attacks.

While this model was designed primarily for reasoning about attack scenarios, it suggests an execution model for detecting attacks. By recasting the requires and effects sections as publish and subscribe, providing attack sensors for each known low-level vulnerability, and considering each concept as an independent agent, through message passing and local inferencing the population of agents form a detection engine.

This model can also be used for vulnerability analysis. When provided with the configuration for a specific system, scenarios, which exploit vulnerabilities in that configuration, can be entailed. Similarly, by replacing the low-level attack sensors with attack scripts, the model can be used for attack planning and generation.

A Protocol for Inter-Organization Cooperation in Detecting and Responding to Large-Scale Worm Attacks (Levitt): In this work we are developing a framework and testbed for coping with a large-scale worm attack.  As indicated in the introduction to this report, a system that copes with a worm attack will include (1) sensors, associated with hosts or network components, to detect that unexpected activity is occurring on a site observed by the sensor, (2) correlators to determine if the unexpected activity on disparate sites is related and indicative of a propagating attack, (3) response agents that can filter packets or other activity deemed to be associated with the worm.  In previous work, the Intruder Detection and Isolation Protocol (IDIP), each detection agent broadcasts its report to all response agents.

The IDIP work was limited, however, in that it includes an implicit trust relationship between partners. All partners were assumed to comply with all filtering requests of their neighbors and to supply the attack details necessary for that blocking action; this will not be the case for inter-domain cooperation. Each control node will have a collection of cooperating partners, not necessarily connected via network devices participating in the response mitigation protocol. The protocol controlling a cooperative mitigating response is complex and needs to take into account,

  1. The varying trust relationship between members,

  2. The cost of response, including false alarms,

  3. Mutations and polymorphism in the attack signatures.

As a preliminary model of this dynamic response system, we have implemented a simulation of a cooperative network of organizational nodes. This simulation is based upon the Swarm simulation package. Swarm simulates large collections of concurrently interacting agents, using an architecture in which one can implement a large variety of agent-based models. In our simulations, agents are cooperating peer-to-peer mitigating response network devices. In addition, worm agents are included that propagate and spread themselves to vulnerable host nodes that the mitigation agents can protect. We use this simulation to study the effects of different types of cooperative response strategies upon different models of worm spread.

Our preliminary world model consists of 729 mitigating response agents. Each response agent protects an organization with eight vulnerable hosts, for a total of 5,832 vulnerable hosts in the world model. The worm propagates pseudorandomly, and is parameterized by,

  1. Its initial density in the world,

  2. Likelihood of infection – how many vulnerable machines are encountered upon propagation,

  3. Its scanning method (random or permutative) and,

  4. Speed of infection – how many time steps between initial infection and propagation attempts.

The model of response is based upon mitigation enabled network boundary controllers, such as firewalls and filtering routers. In our preliminary simplified model, these mitigation agents cooperate by sharing reports of detected worms with a collection of cooperating “friend” agents. Upon receipt of a worm report from a friend, agents who may have not seen the attacks themselves have the opportunity to filter the malicious traffic themselves and additionally broadcast the worm message to their collection of friends in turn. If a specified time period has elapsed with no new receipt of attack reports, response agents will back off, removing the potentially costly filtering rule. The only two parameters controlling the mitigating response strategy in our simplified model, then, are

  1. Average number of cooperating friends and,

  2. Time steps till response back off.

Hierarchical Mitigation (Levitt): Peer-to-peer cooperative control produces a response to an Internet-scale attack that depends only upon a policy set locally. This is an advantage considering that unrelated and potentially untrusting organizations can exchange data, but with the disadvantage that if even only a few members decide that cooperative mitigating responses aren’t in their best interest, an Internet-scale attack may still continue to spread. An alternative to this peer-to-peer cooperation is a hierarchical model of control. Using our experience with the GrIDS system, we have a preliminary model of hierarchical mitigating response to an Internet scale attack.

In this model, each organization is a sub-node or child of parent node controller. The parent functions both as a report aggregator and a mitigating response policy disseminator for a collection of child organizations. When a child detects worm behavior, it forwards the alert to its parent. Parents, furthermore, are themselves children of other control nodes; the control architecture forms a true hierarchy. The parent itself then, has a policy that determines the conditions under which he forwards aggregated reports further up. For example, reports may only be forwarded if two or more children have reported similar activity. This reduces independent false alarm reports moving up the hierarchy of control nodes. This type of hierarchical aggregation can provide early detection of very sparse, stealthy worm activity in widely separated areas of the network. Each parent can also do attack correlation on multiple reports, potentially recognizing polymorphic worm spread and other malicious variants.

In addition to report aggregation and correlation, each parent contains a policy as to when a mitigating response is warranted. Once the decision to respond is made, the parent propagates the mitigation strategy to its children. For example, suppose that a node in the control hierarchy has 10 child devices that it controls. These devices are organizational firewalls, say. Now, in the early stages of an Internet scale worm attack, one of the children sees suspicious spreading activities consistent with a worm and reports it to the parent. The parent’s policy is not to request a mitigating response for single alarms, lest the ambient false alarm level in all children lead to unintentional self-inflicted denial of service. A very short time later, however, a second child reports similar activity. Since a second confirming report has been received by the parent node, its policy states that all children be requested to block the offending traffic for a specified time period. Reports from only two children, then, result in all 10 receiving the information needed to protect themselves from further attack. The parent node also propagates the alarms up to its own parent control node. A very wide spread worm, then, is blocked over a very wide number of sites in a relatively short time period. Such an effect is difficult to produce using strictly peer-to-peer models of control.

Deception (Levitt): Deception has been touted as an effective technique against a human attacker, in that (1) it can increase his workload, (2) keep him away from the objects he is after, and (3) not impact and, in fact be invisible to, normal users.  Cohen suggests using Shannon's theory of information to determine measures of uncertainty afforded the attacker with respect to his goals. In addition, Cohen suggests the use of attack trees to model the options open to the attacker and those of a normal user.

We believe deception can also be useful as a response against a suspected worm, both to (1) slow down the worm's rate of infection of target sites, and (2) to provide further confirmation (or denial) of the presence of a worm, as opposed to normal traffic that happens to resemble a worm or a diversionary attack.  A worm being automated will have less "intelligence" than a human attacker, rendering it more amenable to deception.

A key aspect of our deception approach is to provide "bait" machines to which the worm will be attracted.   Typically, worms infect only particular cultures, such as servers of a certain kind running on certain version of operating system.  Not knowing in advance what machines a worm will attack, it will be necessary to configure machines in real time to look to the worm like the machine it is seeking.  We suggest that this configuration be accomplished by providing wrappers around machines that will be designated as bait.  Key characteristics of the machines to which the worm is attracted will have to be identified in real time and transmitted to wrappers.  The "friends" protocol we are using to transmit worm alerts, blocking signatures, and backoff suggestions can also be used to transmit wrapper configuration information.

Once configured and attacked, the bait machines can keep the worm occupied so as to slow down its propagation rate, and possibly direct traffic only to other bait sites thus keeping the worm away from normal machines. Possibly, filters associated with normal servers can reject traffic from bait sites. A key aspect of the research is to determine how many bait machines are required to have a major impact on the worm.


[1] By sending a SYN packet to a port, but never responding to its SYN/ACK, the host waits for you to respond. By sending large numbers of these you fill up the waiting queue on the host. This prevents the host from responding to new connections. They are discarded unanswered.

back to top


4. Protocols for Supporting Scalable Real-Time Multicasts

Multicast Group Communication Engine and Bridge (Melliar-Smith and Moser): Multicast group communication protocols are needed for fault-tolerant distributed systems to maintain strong replica consistency.  However, different multicast protocols are appropriate for different applications and environments.  As part of this MURI project, we have defined a multicast group communication engine and bridge that allows use of multiple group communication protocols concurrently.  The group communication engine uses Lamport timestamps for message ordering and  heartbeat (null) messages for liveness.  Timestamps and heartbeat messages provide a convenient mechanism by which multiple group communication protocols can coexist without constraining the on-the-wire message formats and internal algorithms of the protocols

The group communication engine places timestamps on messages, and multicasts messages to groups, using one or more group communication protocols.  Each multicast group communication protocol reliably delivers its timestamped messages in timestamp order to the group communication engine.  The group communication engine integrates the streams of messages into a single stream of messages for delivery in timestamp order.  No forwarding or conversion from one multicast protocol to another is necessary, and the multicast protocols never communicate directly with one another or even know of each others' existence.

QoS-driven multicast routing (Hou): We have investigated the problem of QoS provisioning in shared multicast routing protocols such as Core-Based Tree, Simple Multicast (SM) and Protocol Independent Multicast Sparse Mode (PIM-SM), and devised a unified QoS extension framework for this type of multicast routing protocols.  Specifically,  the QoS requirement may be heterogeneous among receivers, and is specified as either (R1) an upper/lower bound of an additive, multiplicative, or concave QoS parameter (e.g., the end-to-end delay bound), or (R2) an upper bound of the inter-receiver difference of a QoS parameter (e.g., the inter-receiver delay jitter bound). We consider these QoS requirements and devise a set of enhancements in the member join/leave and state update/refresh procedures to facilitate the deployment of additive (e.g., end-to-end delay bound), multiplicative (e.g., packet loss ratio along a path) and concave (e.g., minimum bandwidth available) QoS. In particular, we (i) devise eligibility tests to verify whether or not a new member can join a multicast tree at adequate QoS, without violating the existing QoS to the other on-tree members; (ii) determine the set of states kept at each router in order to conduct eligibility tests; (iii) devise a state update/refresh procedure that is based on soft state and can be readily integrated with the tree maintenance mechanism that already exists in most core based multicast routing  protocols; and (iv) implement, based on the CBT daemon in FreeBSD (developed by British Telecom), and empirically evaluate  the performance of the proposed QoS framework. 

We have reported our evaluation results in the Inter-domain multicast routing working group of Internet Engineering Task Force.  The presentation was very well received.

Scalable, Real-time Video Multicast Protocols (Suda): In this research, We have proposed and investigated a novel protocol, called Source-Adaptive Multi-layered Multicast (SAMM), to control congestion caused by the real-time video multicast over the Internet [AVS01, VAS00]. SAMM relies on the exchange of feedback between end systems (a video source and multicast destinations) and rate adjustment at a video source. In SAMM, multi-layered video encoding is deployed, and raw video data is encoded into one or more layers of differing priority. Video sources adapt the number of video layers they generate as well as the transmission rate of each video layer in response to congestion feedback from the network and receivers. Using simulations that incorporate multi-layered video codecs, we have demonstrated that SAMM exhibits better scalability and responsiveness to congestion than existing protocols. Accomplishments in this work include design of the SAMM protocols, a simulator to evaluate the proposed protocols and simulation results confirming the performance of the proposed protocols.

back to top


5. Protocols for Supporting Mobile Wireless Envinronemnts

Mitigating misbehavior in ad hoc routing protocols (Baker): Gaining reasonable traffic throughput in ad hoc networks requires the nodes in the network to participate in packet forwarding on behalf of other nodes.  The presence of nodes that misbehave in this respect degrades network performance.  Misbehaving nodes can assume at least two forms: nodes that agree to forward packets on behalf of others but then fail to do so, or merely selfish nodes that attempt to preserve their own resources by not agreeing to forward packets.

 

We have developed two general types of routing enhancements to handle such misbehaviors.  The first set of techniques, called Watchdog and Pathrater, allows us to isolate and route around nodes that agree to participate but fail to do so.  The second technique, called OCEAN (Observation-based Cooperation Enforcement in Ad hoc Networks), attempts to limit both forms of misbehavior through punishing bad nodes by denying their traffic.

 

Our approach in both avoiding misbehaving nodes and in providing participation incentives is to avoid modifying existing routing protocols, which would be error-prone and complex.  Instead, we provide information to an existing routing protocol to allow it to make a better choice than the default route choice.

 

A second goal, especially with the incentive-based techniques, is that the protocol enhancements be lightweight.  This means that in forming opinions about the behavior of other nodes, we rely only on directly observed events, and not on reputation information as reported by other parties.  Such direct observation allows us to avoid the overhead of propagating reputation information throughout the network.  In addition, this means that unlike Blazevic’s work we do not need a tamper-resistant layer on each node, and unlike Buchegger’s work we do not need signing or other encryption.

 

We simulated Watchdog and Pathrater in the Dynamic Source Routing protocol (DSR) implementation in the ns-2 simulator. The Watchdog and Pathrater algorithms when used together in an ad hoc network running the Dynamic Source Routing protocol (DSR) improve packet throughput by 17% to 27% (depending upon the level of mobility) in the presence of even 40% misbehaving nodes while doubling the routing protocol overhead.

 

Our incentive-based techniques are still under development.  So far we have evaluated these techniques within a high-level Java simulation environment specifically designed for testing such protocols quickly.  This simulator does not give us accurate timing results, since it has no model for the underlying wireless network, but it allows us to measure overall throughput during the course of the simulation.  Within these simulations, we find that OCEAN improves the traffic flows of well-behaved nodes by up to 15% while dramatically lowering the throughput of misbehaving nodes (by up to 75%).

 

Improving ad hoc network performance through new packet scheduling algorithms (Baker): We have investigated improving the overall performance of ad hoc networks through new packet scheduling.  For this study, we examined the queuing dynamics of nodes in an ad hoc network across a wide mobility spectrum (from static to non-stop motion) in both on-demand (DSR) and pro-active protocols (Greedy Perimeter Stateless Routing, or GPSR [Karp]) as implemented in the ns-2 simulator.  For this packet scheduling work, we again do not modify the routing protocols themselves but instead modify only the processing of packet queues at a node.  We find that the common scheduling practice of giving priority to routing control packets over data packets is an advantage in on-demand routing protocols such as DSR, but that it can actually reduce performance in proactive routing protocols.  Most ad hoc network packet schedulers do not distinguish between different types of data packets, but we find that it is useful to do so.  In particular, giving priority to data packets with short distance metrics (fewest remaining hops in DSR and shortest remaining distance in GPSR) shows the smallest delay – about a 30% reduction over standard DSR and standard GPSR and the highest throughput without increasing routing overhead.

 

Analytical model for collision avoidance protocols (Garcia-Luna-Aceves): We have developed the first analytical model for collision avoidance protocols with omni-directional transmission mode, all-directional transmission mode, and combinations of both.  This work extends the analytical model first introduced by Takagi and Kleinrock for the analysis of CSMA in multi-hop networks with omni-directional transmissions.  Extensive simulations in GloMoSim of the popular 802.11 MAC protocol and variants of it have been used to validate the analytical model.  The results show that the overall performance of the sender-initiated collision avoidance scheme degrades rather rapidly when the number of competing nodes allowed within a region increases, in contrast to the case of fully-connected networks and networks with limited hidden terminals reported in the literature. We have also extended this model to networks with directional antennas. The collision-avoidance scheme based on directional handshakes and data transfer was shown to achieve much better spatial reuse (thus higher throughput) and less time in waiting (thus less delay) at the expense of higher collision rate.

 

We have started to analyze the interaction between broadcast and unicast traffic in multihop networks that use collision avoidance for channel access.

 

New channel access protocols capable of providing QoS in ad hoc networks (Garcia-Luna -Aceves): We designed and analyzed a new family of channel access protocols based on dynamic transmission scheduling. These protocols establish node-activation and link-activation schedules at each network node based on the identifiers of those nodes in the two-hop neighborhood of the node running the protocol.  The Hybrid Activation Multiple Access (HAMA) protocol was introduced, which is the first known channel access protocol capable of establishing dynamically node- and link-activation schedules for both broadcast and unicast traffic. HAMA is remarkably simple, requires only two-hop neighborhood information, and avoids the complexities of prior conflict-free scheduling approaches that demand global topology information. We have also specified and analyzed a neighbor protocol for coordinating two-hop neighbor information in networks with dynamic topologies. The performance of HAMA was compared by analyses or simulations with that of idealized CSMA and CSMA/CA, dynamic scheduling based on node activation, and UxDMA. The results of our analyses clearly show that HAMA is far more effective than CSMA, CSMA/CA and scheduling based on node activation, and that it renders comparable performance to that of UxDMA without requiring to maintain complete topology information at each node.

 

We also introduced the receiver-oriented multiple access (ROMA) channel access scheduling protocol for ad hoc networks with directional antennas. Unlike random access schemes, ROMA determines a number of links for activation in every time slot using only two-hop topology information without complex handshakes or signal scanning to resolve communication targets. Exploiting the multiple directional beam forming capability of directional antennas in both transmission and reception, we have shown that significant improvements on network throughput and delay can be achieved.

 

Hybrid routing for ad hoc networks (Garcia-Luna-Aceves): We developed  node-centric approaches to hybrid routing for ad hoc networks in which normal nodes are distinguished from special nodes, called netmarks, hosting popular network services or functioning as points of attachment to the Internet.  With node-centric hybrid routing, netmarks force other common nodes to maintain routing information for them by either advertising their routing information as in table-driven routing protocols, or by requiring nodes to maintain routing entries towards them for extended periods of time. This reduces the network-wide flooding and the corresponding delay for route set up every time a session needs to be established between a normal node and a netmark.  Routes between peer nodes are set up on-demand.  Two node-centric routing solutions were developed based on partial link state information. One solution, called NEST, maintains table-driven routing for netmarks and on-demand routing is supported for common nodes.  Simulation results using ns2 show that NEST performs much better than on-demand routing protocols based on distance vectors, path information or link-state information.

 

We developed a new routing metric, called the energy drain rate, which can be used as part of the route selection mechanism in routing protocols for ad hoc networks in order to preserve the remaining battery life of untethered nodes.

 

Bandwidth efficient multicast protocol and multipath transmission protocol for smooth handoff (Suda): We have investigated key protocol design issues in both infrastructure based wireless networks and ad hoc wireless networks.

  1. Bandwidth Efficient Multicast Protocol in Ad Hoc networks. In this research, we have proposed and investigated a bandwidth-efficient multicast routing protocol for ad-hoc networks [OKS01]. The proposed protocol achieves low communication overhead (i.e., it requires a small number of control packet transmissions for route setup and maintenance). The proposed protocol also achieves high multicast efficiency (i.e., it delivers multicast packets to receivers with a small number of transmissions). Simulation results confirmed key features of the proposed protocol. Accomplishments in this work include design of the proposed protocol, and a simulator to evaluate the proposed and other ad hoc network protocols.  

  2. Multipath Transmission Protocol for Smooth Wireless Handoff. The current mobile wireless protocols often fail to provide smooth handoffs due to packet rerouting and different amounts of available bandwidth in different wireless cells. In order to provide smooth handoffs for real-time video applications, the PI has proposed an end-to-end soft handoff protocol with multipath transmission. In the proposed protocol, the sender encodes video using a layered coding technique. Multiple paths are established from the sender to the mobile node (destination) when the destination is in a handoff area (i.e., an area that two wireless cells overlap). The sender transmits (duplicated) base layer video packets on all the paths to the destination to minimize the base layer video packet loss. Enhanced layer video packets are transmitted only through paths with additional bandwidth to increase the video throughput. The proposed protocol reduces the negative impact of the base layer video packet loss and increases the video throughput during handoffs. Accomplishments in this work include design of the proposed protocol, a simulation model for the proposed protocol, and the simulator that evaluates the proposed protocol and existing handoff protocols.

Protocols for Small-Device Networks in Homeland Security Scenarios (Suda): In order to provide efficient and uninterrupted communication services among victims, rescue team members, and law enforcement officers in disaster scenarios, it is important to restore connectivity among sub-networks that become segmented due to cable and communication equipment failures caused by disasters. It is also important to efficiently route information over these segmented sub-networks while taking into consideration the severely limited capabilities of the available communication devices (such as limited processing and transmission power capabilities of mobile phones that victims carry, or of sensors imbedded in building infrastructures).

 

In order to regain connectivity between segmented sub-networks, we consider two approaches: one approach involves deploying intelligent robot-type communication devices that autonomously move and place themselves in strategic locations, and the other approach involves deploying a massive number of simple communication devices (such as sensor type devices) scattered from, for instance, an aircraft flying over the disaster struck area. The PI is currently investigating protocols for communication between intelligent robot-type devices, as well as protocols for simple devices to relay information between segmented sub-networks. In considering such protocols, special attention is paid to the fact that simple devices may not be able to identify their locations in a precise manner and that some devices may be more critical than others in maintaining network connectivity. Accomplishments in this work include initial designs of protocols for simple devices to exchange coordinate information, and routing protocols for simple devices to relay information while considering the limitations of the devices.

 

back to top


6. Bio-Networking Architectures

Bio-Networking Architecture (Suda): In the Bio-Networking Architecture, complex network applications emerge through interactions among multiple objects (i.e., service components). Since objects may be developed independently by different designers, it is important to ensure that independently developed objects can communicate and interact to collectively provide an application. We are currently developing a set of specifications (called Flexible Interface Definition) that defines the interfaces and protocols of each object in a flexible manner so that objects implementing different interfaces and protocols can communicate and interact. We have shown that Flexible Interface Definition increases the number and variety of network applications that objects can create. Accomplishments in this work include design and preliminary implementation of Flexible Interface Definition.

 

We are also developing the Bio-Networking platform, a middleware that provides reusable software components for deploying and executing objects (i.e., service components) and communication protocols between objects. Platform components abstract low-level operating and networking details (e.g. message marshalling/unmarshalling, I/O, concurrency, and network connection management) and provide a series of runtime services that objects frequently use for performing their services and invoking their protocols. Accomplishments in this work include the design and preliminary implementation of the Bio-Networking platform, as well as preliminary measurements showing the efficiency and scalability of the Bio-Networking platform. The design of the Bio-Networking platform and its protocols has been proposed to OMG for possible standards for Super Distributed Objects specifications.

 

Peer-to-Peer Discovery Protocols (Suda): Peer-to-peer systems contain distributed objects (users, data, and services) with highly dynamic properties. For example, objects may dynamically enter or leave the peer-to-peer network, or attributes of the object may change. Discovery protocols in peer-to-peer systems define how to locate specific objects based on the content or attributes of the objects. We have developed and evaluated two distinct discovery protocols that perform discovery to locate objects with particular keywords. Both discovery protocols perform forwarding of queries along logical links (relationships) between the objects in a distributed manner.

 

In this first discovery protocol that we developed, discovery queries are forwarded based on keyword similarity and discovery history of relationships between objects. In the second discovery protocol, a user’s (i.e., discovery originator’s) evaluation of the received discovery hits is used to guide forwarding of queries. In both discovery protocols, relationships are weighted or selected in order to enhance the forwarding properties of the protocol. Accomplishments for this work include: designs for each discovery protocol; simulations demonstrating protocol efficiency, the quality of results returned, the robustness to dynamic environments and the scalability of these protocols.  

 

back to top


7. Middleware for Supporting Application Programs with End-to-end QoS

TAO pluggable protocols framework (Schmidt): One of the problems with conventional middleware solutions is that they hard-code the distributed object programming model together with the use of general-purpose network protocols (such as TCP/IP) that are inadequate to meet the QoS needs of distributed real-time applications.  To overcome this problem, another part of our PERC project has focused on enhancing the TAO pluggable protocols framework shown in the following figure so that developers can concentrate on designing their application-specific logic without having to worry about timing, fault tolerance, mobility, or security issues. 


 

 

 

 

 

 

 

 

 

 

 

 

 

 

The key TAO pluggable protocols framework components illustrated in the figure above are described briefly below:

  1. ORB Messaging Component: This component is responsible for implementing ORB messaging protocols, such as the standard CORBA General InterORB Protocol (GIOP) ORB messaging protocol, as well as custom Environmentally-Specific InterORB Protocols (ESIOPs).  An ORB messaging protocol defines a data representation, an ORB message format, an ORB transport protocol or transport adapter, and an object addressing format.  Within this framework, ORB protocol developers are free to implement optimized Inter-ORB protocols and enhanced transport adaptors, as long as they respect the standard CORBA interfaces.

  2. ORB Transport Adapter Component: This component maps a specific ORB messaging protocol, such as GIOP or DCE-CIOP, onto a specific instance of an underlying transport protocol, such as TCP, SCTP, or ATM.  The figure above shows an example in which TAO's transport adapter maps the GIOP messaging protocol onto TCP – this standard mapping is called IIOP.  In this case, the ORB transport adapter combined with TCP corresponds to the transport layer in the Internet reference model.  However, if ORBs are communication over an embedded interconnect, such as a VME bus, the bus driver and DMA controller provide the transport layer in the communication infrastructure.

  3. ORB Policy Control Component: We have defined QoS APIs that allow applications to specify their QoS requirements using industry-standard CORBA IDL interfaces. In particular, TAO's pluggable protocols framework provides an extensible policy control component that implements the QoS framework defined in the CORBA Messaging and Real-time CORBA specifications.  This component allows applications to control the QoS attributes of configured ORB transport protocols. Example policies relevant for pluggable protocols include buffer pre-allocations, fragmentation, bandwidth reservation, and maximum transport queue sizes. Policies in CORBA can be set at the ORB, thread, or object level. Thus, application developers can set global policies that take effect for any request issued in a particular ORB.  Moreover, these global settings can be overridden on a per-thread basis, a per-object basis, or even before a particular request.  CORBA's Policy framework provides very fine-grained control over the ORB behavior while providing simplicity for the common case. 

QoS-enabled network protocols supported by TAO’s pluggable protocols framework (Schmidt): We have implemented and/or integrated a wide range of network protocols into TAO’s pluggable protocols framework. These protocols can be classified into the following main categories:

  1. General-purpose network protocols, such as TCP/IP, SSL over TPC/IP, UDP/IP, UNIX domain sockets, and shared memory

  2. Embedded system interconnects, such as VME, Fibrechannel, and SCRAMNet.

  3. QoS-enabled network protocols, such as IntServ, DiffServ, and the Stream Control Transmission Protocol (SCTP).

Our two most recent and interesting accomplishments involve integrating DiffServ and SCTP support into TAO’s pluggable protocols framework. By adding DiffServ capabilities to the TAO ORB and then mapping Real-time CORBA priority values to Diffserv service classes, we have provided a rich middleware platform for developing and deploying QoS-enabled distributed real-time applications that can preserve task priorities via OS schedulers and network prioritization mechanisms end-to-end. Similarly, the Signal Control Transmission Protocol (SCTP) provides a highly configurable, connection-oriented, message and bytestream transport service. SCTP exposes a large set of parameters that can be configured via TAO’s QoS policy framework to customize the connection performance to specific application requirements. Properties that can be customized include message ordering semantics (e.g., ordered or unordered), reliability semantics (e.g., retransmit timeouts and max retransmit tries), connection multiplexing (e.g., number of streams), and network path multiplexing (e.g., network interface set). SCTP's customizable properties can be leveraged by distributed real-time CORBA middleware and applications to reduce the complexity of developing high performance, fault tolerant systems.

 

 

Dependable Communication Middleware (Hou): In order to make the multiple networking technologies transparent to the applications, we are currently developing a communication middleware that supports fault-tolerant and secure communication. In the middleware, abstractions, such as dependable pipe and secure pipe, respectively are developed. Dependable pipe will employ multiple communication mediums and secure pipe will use encryption.   Moreover, we have identified and developed a set of large-grained middleware services (building blocks) with precise specifications and well-defined interfaces, and will export a collection of modular and composable middleware services with a rich, well-defined API. The following figure illustrates the components in the suite of middleware services from the user prospective. Each layer exports a well-defined interface to other services or to applications that are built on top of it and supports standard protocols. Related services are grouped together to illustrate functional dependencies among them:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1. QoS negotiation and adaptation: provides negotiation services of parameterized QoS at the application level. Applications may specify and negotiate QoS in terms of quantitative measures, and qualitative terms.

 

2. QoS monitoring and notification: provides support and feedback from lower-layer protocols to adaptive higher-layer protocols and applications. As part of the dependability-related operations implemented in a protocol module, the protocol module will cause an up call into each other protocol module and application that has registered interest in such changes.

 

3. Fault detection and recovery management: is closely related to fault-tolerant communication services. Associated with each protocol module is an application-specific recovery procedure that may be invoked when a failure is detected through timeouts or lack of heartbeat messages.  Applications and higher-layer protocols may register specifying the fault/error latency and the level of fault tolerance desired.

 

4. Dependable communication service: exploits spatial redundancy (edge-disjoint backup paths) or temporal redundancy (pre-allocations of both primary and alternate transmission time intervals for a connection) to achieve fault tolerance without sacrificing the predictability of the system. An application may invoke the service, specifying its fault tolerance requirements in terms of packet loss, fault tolerance latency, and the number of replicated messages desired.

 

5. Predictable unicast/multicast services: provide a collection of network services with deterministic or statistical temporal QoS for sending messages to a single destination or a group of destinations. The set of services to be exported include unicast with end-to-end delay bound, unicast with inter-message delay jitter bound, (datagram or casual) multicast with delay bound, inter-destination delay jitter bound, or a combination thereof, and (datagram or casual) multicast with TCP-friendly congestion control.

 

6. System resource tracking and management: provides interfaces for tracking both active and passive resources involved in a communication activity, for determining their availability, and for supporting application-specified management/security policies.  Based on the system resource available and the dependability/QoS required, an application can then compose the protocol modules for admission control, resource reservation/adaptation, and/or traffic control in a plug-and-play fashion.  

 

back to top


8. Tools for Supporting Network Simulation/Emulation

Simulation of large distributed application-level protocols (Baker): In our work with ad hoc networks, as well as other work with peer-to-peer networks, we have found a need for high-level simulation tools, that is, tools that can simulate application-level protocols in large networks with large numbers of flows over long periods of time.  Packet-level simulation tools, such as ns-2 and the new JavaSim (work in this project), are designed to provide a level of detail that is not geared, for instance, for simulation of a month’s worth of application behavior.  We have thus designed and begun implementation of Narses [Giuli], a new simulator targeted towards large distributed applications.  The goal of Narses is to validate the behavior of large applications efficiently using network models of varying levels of detail.  For efficiency, our lowest-level construct is not a packet but an application “flow.” We introduce several assumptions that allow us to simulate many flows over many nodes over long periods of time.  One assumption is that the bandwidth bottlenecks will be the last-hop links and not links internal to the Internet.  These assumptions mean that Narses is not appropriate for all topologies, but it is appropriate for all of the topologies we have so far desired to simulate, including large Internet topologies with thousands of nodes and tens of thousands of flows.  While fine-grained accuracy of simulated runtimes is not the main goal of Narses, early results with our most detailed network model, as compared to ns-2, show up to a five times speed up in simulation time, with a 53% decrease in memory consumption while maintaining a reasonable degree of accuracy (within 8% on average).  

 

Design, implementation, and module enhancement of JavaSim (Hou): We have laid a component-based software architecture, called autonomous component architecture (ACA), that deploys a message-passing, independent execution model to more closely mimic hardware systems, in terms of how components are specified and assembled and how components interact with one another.  To fully specify, implement, and evaluate ACA, and to explore its application in building large-scale, truly extensible network simulation/emulation environments, we have carried out tasks along the following two thrusts of research:

  1. We have implemented a proof-of-concept version of ACA in Java. Through the lessons we learned from the implementation and experimentation, we have continued to refine/enrich the implementation.  For example, we have designed and implemented an execution context management and scheduling mechanism in the ACA runtime. We have also fine-tuned the performance of the ACA implementation to reduce the component overhead in terms of execution time and memory usage. 

  1. To explore how ACA facilitates the building of large-scale, extensible, and reusable software systems, we have built, on top of the ACA implementation, a compositional network simulation environment, JavaSim. Succinctly, we have devised a packet-based network modeling framework, called extensible internetworking framework (EIF), and implemented on top of the ACA and EIF implementation, an (almost) complete suite of essential network protocols in the Internet best-effort service, integrated service, and differentiated service architectures.   Table 1 gives the set of protocol classes currently supported in JavaSim.  Very recently we have also extended JavaSim to include components in mobile wireless environments, i.e., antenna propagation models, terrain models, IEEE 802.11 and power saving mode, ad hoc routing classes (DSR/AODV), and will make another release in September 2002.

Network Architecture

 

Application

Socket Layer

Transport

Routing

Traffic Model

Tagger Marker

Buffer Management

NI Scheduling

Best Effect Services

 

FTP, FSP
WWW

BSD 4.3

TCP-Reno
TCP-Tahoe
TCP-Vegas
TCP Sack
UDP

RIP (DV)
OSPFv2
Multicast shortest path tree
Multicast Steiner tree
DVMRP
MOSPF
CBT

 

 

Drop-Tail

FIFO

Differentiated Services

 

 

 

 

 

 

Token Bucket
TSW
ETSW

RED
FRED
SRED
BRED

FIFO

Integrated Services

 

 

 

RSVP

Unicast QoS routing
QoS-enhanced OSPFv2
QoS-enhanced CBT

Periodic message (CBR)
Peak rate model
Leaky bucket model
Token bucket model
IETF/Intserv Flowspec
(r,t)-smooth model
(C,D)-smooth model

 

 

RM
EDF
DCTS
VirtualClock
LFVC
PGPS
STFQ
WF2Q

FTP: file transfer protocol

FSP: file service protocol

BSD: Berkeley socket distribution

TCP: transmission control protocol

RIP: routing information protocol

OSPF: open shortest path first

DVMRP: distance vector multicast routing protocol         

MOSPF: multicast extension to OSPF

CBT: core based tree protocol

FIFO: first in first out

TSW: time sliding window

ETSW: enhanced time sliding window

RED: random early drop

FRED: fair random early drop

SRED: stable random early drop

BRED: balanced random early drop

RSVP: Resource reservation protocol

CBR: constant bit rate

RM: rate monotonic

EDF: earliest deadline first

DCTS: distance constrained task system

LFVC: leap forward virtual clock

SCFQ: self-clocked fair queueing

PGPS: packet-by-packet generalized processing sharing

STFQ: start time fair queueing

WF2Q: worst-case fair weighted fair queueing

Table 1: Protocol classes supported in JavaSim

We have conducted extensive stress tests, compared the performance of JavaSim against ns-2 and SSFNET, and provided a detailed qualitative and quantitative comparison report in http://www.javasim.org/comparison.html.  In spite of the overhead inherited from the component architecture, JavaSim demonstrates better scalability (both in terms of simulation completion time and experiment setup time in large simulation scenarios (e.g., in the case that the number of nodes >= 5000) on a dual-Pentium III-600 MHz, 256MB RAM PC.

 

back to top