A network management framework using mobile agents

In this paper, we present a framework that has the ability to perform network management tasks on heterogeneous networks. Our framework handles the inability of many network devices to run mobile agents. To cope with fast evolution, we present a modular approach that provides tools to build quickly network management agents to perform virtually any task. While the main focus of the project is the framework, we present an example of mobile agents that are able to locate a fixed set of network failure and detect the possible causes accurately.


INTRODUCTION
Mobile agents still do not seem to take over the world of networks with real, convincing and useful applications. Meanwhile, network management of heterogeneous components in large networks is still hard to achieve automatically and efficiently. It is now a fact that networks will not grow using simple and similar technologies but tend to use complex protocols and many different devices fiom many different vendors.
Mobile agents have the ability to migrate on nodes and cope with different technologies. Also, they decentralize management and distribute load over the network. They also provide a flexible way to create quick solutions in fast-evolving environments enabling the possibility to automate network management tasks efficiently. While our aim is large scale networks where many tasks are still executed by humans, this ability is crucial.
Many research groups have worked on network management solutions using mobile agents. The objective of the JAMES project [l] is to create an efficient mobile agent platform mainly used for network management. It provides many optimizations in comparison to commercial mobile agent platform that are mandatory to achieve efficiency and high performance in the domain of network management. The Network Management and Artificial Intelligence Laboratory of Carleton University [2] is doing research on many applications and models to bring mobile agents to network management. Their aims are large and they have studied many aspects of the subject such as the need for a uniform management interface, the use of the simple network management protocol (SNMP) [3], fault location [4] and intelligent agents [4]. Recently, a novel architecture bas been proposed, called ECOMOBILE [5], that use mobile agents to execute task objectives, but these agents are not by themselves network tasks. They have a live, compete with each other, die, exchange, take or leave task objectives at any time. One interest of this architecture is its capability to regulate its population while being able to achieve network management tasks. On the concept of proximity, that we will introduce later, a research group [6] has studied efficient ways to place mobile agents on a network combining both mobility and remote monitoring. Monitoring is one part of network management and may be used for fault management as well.
In Section 2, we present the aims of the project and the 6amework. After that, in Section 3, we will present briefly mobile agents that will be used to show their capacities to achieve real network management tasks in a network involving heterogeneous devices. In Section 4, we present preliminafy results 6om these mobile agents as well as the test network. Finally, we conclude with a brief summary and future work.

Goals and Aims
As stated in the introduction, our goal was to create a network management M e w o r k using mobile agents. Our primary interest was to prove the utility of them in real networks and applications. This is, from our point of view, a fust step to validate efforts done on mobile agents and network management. Because of this, the framework bas to be able to cope with network components that cannot execute and receive mobile agents. Also, it was important to show the ability of those agents to manage heterogeneous devices. One of our goals was to give agents as few information on their environment as possible, enablig them to manage an evolving network.
Another important aspect is performance. Mobile agents should not impose a much higher overhead on the network than their client-server counterpart. Performance is then an important goal and is achieved in part by not letting mobile agent transport the network topology or nodes information.

Framework Model
Our framework can easily integrate multiple mobile agents in existing networks by using an expert system and using existing management facilities.
The model suggests using typically one or two agents per network management task. This allow us to limit inter agent communications that can be costly if the architecture tends to use too much small but inefficient agents.
Network management is done using network tools already available, so no development has to be done in that field.
Mobile agents tend to use these tools more efficiently and in new ways. We present the important aspects of the framework in this section.

Framework Concepts
A place where network management is done is called a management station. In mobile agent terms, it could he also named an agency, place or region and must be able to run a mobile agent platform. This usually implies a station that could run a Java virtual machine. A component is what the mobile agent has to manage. It may also be a management station. Also, a network station could manage itself, being itself a component. The framework was built with these concepts in mind.

Management Table
The management table is the only knowledge of the network that the framework provides. Any other howledge is taken from the network as needed by mobile agents. This table implements the link between a network component and a network management station. For our experiments, the associations in this table are static, meaning that one management station is bind permanently to one or many components. The association is based on proximity. However, that does not mean that a component is always managed by the same station. It only tells the mobile agent the preferred management station for a particular component. The framework could use a dynamic update for this tahle and a way to adapt to network modifications made on topology or behavior. One research [6] offers inkresting ideas on how this aspect could he improved.
For optimization purposes, these tables are installed on each management station, freeing mobile agent s memory to save bandwidth. This also allows local optimizations when it is not clear whether a central component must he managed by one station or another depending on the point of view.

Network Management Interfaces
To limit the size of mobile agents and mostly their building complexity, a minimum of uniform management is needed. To allow more flexibility, the fiamework. offer many standard interfaces. Because standards never evolve as fast as networks, mobile agents are not tied to theses interfaces. This let us introduce two kinds of niobile agents, general agents and specialized agents. The ganeral agent will mostly use uniform interfaces, managing the network with a limited set of tools. The specialized agent is able to do lot more tasks and use specialized features, to the expense of complexity and size. What is hidden behind these interfaces are stationary agents, which is the next subsection.

Technology Stationary Agents and Abilities
Stationary agents are used to implement network management code that has to be dynamic, hut may be totally inefficient to move with mobile agents. By dynamic, we means that they could keep a state, be modified easily, keep local information in cache for fast and efficient retrieving. This code is moved once, and stays permanently on the station.
Technology stationary agents implement a set of uniform management interfaces introduced in the previous subsection. The management table that keep references between management station and components, also keep a set of network management abilities for a component. Such abilities could he an operating system application programming interface (API), a protocol like SNMP or any other ways to manage a component.

A Small Example
By now, to really show h a t the fiamework does for an agent, a small example is needed. A mobile agent that starts on a station will ask which station manages the next component. The management table returns which station does and how (abilities) this component may be managed. The mobile agent then moves to the given station and by to manage this. component using an interface. To do so, the mobile agent will search in the list of technology stationary agents. It will select the fmt stationary agent that matches two criteria: implementation of the interface and of one of the management abilities for this component. If the agent cannot move, it may tries to manage: the component remotely.

Optimizations
Depending on the need for dynamism, mobile agent code may be stored on each management station so that only memory and state are moved. This increases: the responsiveness of the system and limits the traffic. This could also be done using version checking, downloading recent agents as needed, similar to the architechue of JAMES [I].

Mobile Agent Intelligence
Mobile agents need a great load of intelligence to be able to manage networks of heterogeneous devices. Although the framework uses uniform interfaces, it is still difficult to give agents sufficient intelligence to let them manage confidently those networks. Many researches tend to use artificial intelligence. Our focus was to use an expert system, but the framework is not limited to a specific form of intelligence.

Security and Fault Tolerance
The 6amework is meant for closed network, meaning that security level may be lowered. For networks where agents should move on user stations, the 6amework suggests, but does not implement yet, to let only approved mobile agents code to get back 6om user stations. For confidentiality purposes, mobile agents should give up sensible information &om the network before entering a usm station. Quotas may also be used to counter flooding.
Since the framework is on top of any network management system, it does not interfere with those systems and is not needed for management. Fault management mobile agents described in next section are able to tell if the network management system is faulty, hut they cannot recover completely 6om such a failure.

FAULT MANAGEMENT AGENTS
To validate the framework, we have implemented two mobile agents that are able to find a set of network failures in a given network. These mobile agents never claim to be able to find all possible errors. Since it is an expert system, they should be instructed to do so. The first mobile agent, called Diagnostic, tries to go as far as possible in a nehuork to find a failure cause. The capability of this agent could he easily reproduced by its client-server counterpart. The second agent, named Search, is used to pinpoint more precisely the cause of a network failure.
For the validation, the mobile agent is l i t e d to single failure scenarios and use networks with static routing, no default routes and no route updates of any kind. These choices were made to limit the scope of errors that the agent has to deal with. This setup unravels areas where mobile agents are better suited for than remote management: path finding and searching.
The mobile agents described in the next subsections are used when a connection failure occurs from a source to a destination. The mobile agents are informed of these two parameters, as well as the port for the connection and nothmg more.

The Diagnostic Agent
One fact stated in many fault management papers are that a network failure could cause many alarms and cause many direct or indirect failures. The diagnostic agent never stops on the fust failure. For this reason, its fmt task is not to diagnose, but accumulate a series of proofs containing facts and places where those proofs are found. Then, at the end of the prooffinding phase, it can establish a diagnostic. The proof finding phase ends when the diagnostic agent is unable to move further, have moved on or near the destination or has no clues on bow to continue (management system down or no route to host). Before terminating, it may try to launch a search agent that r e m s with an alternate path to the next component. If this agent takes too long, a timeout tell the diagnostic agent to continue without waiting longer. The last phase is called the diagnostic phase.

The Search Agent
The searcb agent clones itself on each route it finds on a given node. Its goal is to find the destination using another path in the network that routing table may not contain. When it finds the destination, it then tries to come back to the source using routing tables. When it fmds a point where it cannot move using these tables, this let it know that this may be the other end that the diagnostic agent was trying to reach. It then reuses the altemate patb to come back to the place where the diagnostic agent is to give it the extra information. The diagnostic agent then suspends its prooffinding phase to move to the component found by the search agent. Arrived at destination, it restarts its proof finding phase.
In case the searcb agent never returns, the diagnostic agent is still able to give a good estimation of the problem just like a remote management solution. The Search agent is an addition that takes advantage of multi-path networks. This Search agent is therefore costly in traffc. To limit its spawn, a hop counter is implemented to terminate itself after too many jumps.

Test Setup
The test setup used to evaluate and validate our fault management agent is shown in Figure 1. It uses three Cisco routers of models 3640, a Cisco ATM switch of model Catalyst 8540, a hub and three management stations using Windows 2000. All links are Ethemet unless stated otherwise. Mobile agent platform are using GrassHopper and Java.  to manage faulty devices. When every devices on the path are healthy, response times are fast (about 6 to 7 seconds).
To have a taste of how our diagnostic mobile agent compare to a remote management system, we m;ide a single reading for the service not started test (Test 1 I). We found a response time of I seconds and a total traffic of 53.3 kbytes. These results seem promising since the diagnostic agent has to come hack, it does not involve much network management traffic and it may still he optimized. However, we are aware that more tests are needed to conclude.

CONCLUSIONS
In this paper, we presented, after a brief introdul:tion, our network management framework using mobile agents and we presented important parts of that framework. We then show two mobile agents that we have used to validate and prove the network management capabilities of mobile agents in real and heterogeneous networks. These mobile agents do not provide the hest way to make fault management, hut our goal was to show the interest of using mobile agent for network management. We then presented some preliminary results.
We already stated some possible optimization and interesting papers in this article that could he used to improve furthermore this architecture. The topic of determining optimal association between component and mobile station is a research field by itself. Also, many more mobile agents could he done to implement ,other network management tasks. Tested agents could also be reworked for other technologies andor type of networks. They could also be improved furthermore to use dynamic routing capabilities and fmd multiple failures.