Network knowledge from the edge (isen.com)

Intelligence at the Edge #13

NETWORK KNOWLEDGE FROM THE EDGE

A system of monitoring stations reveals performance of unknown networks.

By David S. Isenberg

From America's Network, August 1, 1999
http://www.americasnetwork.com/issues/99issues/990801/990801_intellig.htm

Monitoring the performance of the old, circuit-switched network was easy. The telco owned the network elements and the routing tables. It knew the route of every call. If there were network impairments, offending elements could be detected directly, taken out of service and fixed.

With the Internet, service providers don’t necessarily own the network. They don’t know its shape or size. They don’t control routing. They can’t reach into the middle to upgrade slow network elements — especially when they don’t own them.

But if services get flaky, customers complain. Better network monitoring gives service providers an advantage. The network may be stupid; nonetheless, knowledge of it is power.

How do you monitor a network that you don’t own, don’t control and don’t even know much about? Christian Huitema, chief scientist of the Telcordia — formerly Bellcore — Internet Architecture Research Laboratory, is leading the three-year, six-investigator, DARPA-funded Felix project into its final year of work on the problem.

Intelligence by inference

Felix monitor stations are attached to various points at the edge of the network. These stations, connected by Internet Protocol (IP), each with a unique IP address, perform their task via a higher-level protocol written by the Felix team.

When deployed, each monitor station generates trains of short packets to each of the others. Every packet’s precise transit time is measured. For each pair of stations, transit times are accumulated. These are used to discover the topology of the network.

It works like this: Suppose that four monitoring stations — A, B, C and D — are placed at various locations around the edge of the network. Suppose that packets one through five travel from A to B in 10 milliseconds, but that packet six takes 80 milliseconds. Now suppose that six other packets, launched at the same time, travel from Station A to Station C with a similar pattern of transit times. Now suppose that a third set of packets travels from Station A to Station D but shows a very different pattern.

From these data, the Felix team infers that packets traveling from A to B pass through the same sources of delay as those traveling from A to C. Thus, the route for AB and AC must share common elements. By this logic, they conclude that the route from A to D, because it has a different pattern of transit times, shares fewer elements in common with AB or AC.

(At this point I object, voicing my understanding that Internet routing is dynamic, that it can change packet by packet. Huitema agrees, but says that observational work by other investigators shows that in practice routing is usually stable for many minutes or even hours.)

To discover the shape of the network, the Felix team analyzes the correlations among the time series of many pairs of monitoring stations. Huitema compares this approach to a CT scan. In a CT scan of the human body, a series of X-ray "slices" are taken at regular intervals and stored digitally. Algorithms permit interpolation from slice to slice, so a new slice through the body — one that has never actually been photographed — can be constructed.

Huitema says, "The amount of inference that you can make is dependent on the richness and precision of the measurements you take." Rich measurements demand robust, scalable analytic techniques. With many stations and many routes for each station pair, the task rapidly grows complex. Furthermore, transit times are not normally distributed — Huitema calls them "heavy tailed" — so new mathematical tools must be developed. "It is a hard problem," he says.

New ways of network management

The official name of the Felix project, "Independent Monitoring for Network Survivability," illuminates a big reason for Department of Defense funding: In the event of attack or other widespread outage (Y2K?), Felix would be poised to reveal the topology and performance of the remaining network.

The project has other applications, too. For example, it could help identify optimal routes. It could assist network engineering efforts by identifying sources of congestion. It could help the provider of a distributed service decide which server should serve certain information to a given client. And it could be used to specify and enforce service level contracts.

As the Internet has profoundly altered the business models of network service, so it will irrevocably transform the practice of network management. The Felix project lays a foundation that doesn’t introduce complication or proprietary technology to current Internet platforms — it follows the principles that continue to make the Internet great.

There’s more about Felix at http://govt.argreenhouse.com/felix.
David S. Isenberg’s Web site is http://www.isen.com/.