Application Discovery and Mapping: Exploring Different Approaches

An Application Dependency Map illustrates how software, services, and applications interact both with one another and with the physical, virtual, and cloud servers and the network components that support them.

Creating a map for one or two applications on a few servers isn’t hard; however, as the numbers of applications, services, and servers increase, the complexity increases exponentially. Even a relatively small infrastructure can be very hard to map initially and to keep up to date. As soon as you have more than a handful of servers, an automated Application Dependency Mapping (ADM) tool that can analyze the large amount of data traversing a busy network, and connect the dots between the hardware and software creating that traffic and all of the inter-dependencies, is a must.

There are four general categories of methods for discovering the software, services, and applications that are running in an IT infrastructure and their inter-relationships and interdependencies. Each method has advantages and disadvantages. This post attempts to explain the pros and cons of the different methods.

The methods to be discussed include:

Point-In-Time Discovery
- Agent-less
- Agent Based
NetFlow
Nmap
Packet Capture

Point-In-Time Discovery

Agent-less

WMI is a Windows protocol and SSH is a Linux protocol that enable a tool like Device42 to “crawl” a network and login to all the physical, virtual, and cloud servers and PC’s on the network and to discover which services are running, discover which applications (often comprised of multiple services) are running, discover application information (e.g. config files, installed apps on a web server and instances and named pipes on a database), and to determine which services are using which ports. Device42 agent-less discovery is a powerful tool and provides great application mapping. However, by itself, it does have some limitations:

Credentials must be entered for all machines into Device42. Where all machines on a network segment use the same credentials, only one entry needs to be made.
The discovery is point-in-time. This means that only what is running at the time of the discovery is found. However, discoveries can be scheduled to run over and over, thus improving the picture of application dependencies over time.

When agentless discovery is augmented with data gathered via Netflow and nMap discovery (see below) and all information is automatically combined as is done by Device42, you have the best of both worlds.

Point-In-Time Discovery

Agent Based

Agent-based discovery operates similarly to agentless discovery. However, instead of crawling the network, agents are placed on the machines themselves and report back to a tool like Device42. In Device42, agent-based discovery discovers all of the same service and application information as agentless discovery, and seamlessly sends the information back to the Device42 appliance where it is seamlessly combined with service and application data from other sources.
Many Device42 customers primarily use agentless discovery, but augment it with agent-based discovery for machines that are sometimes offline (e.g. user PC’s) and for machines in secure network segments.

NetFlow

NetFlow was originally introduced by Cisco as a feature that enabled analysis of traffic flowing through Cisco routers and switches. By analyzing NetFlow data, it is possible to determine that whatever application is running on an IP address on a certain port is communicating with whatever application is running on another IP address on another port. IF one has a separate list of which applications are running on which IP/port combinations, one can use the combined data to create a list of which applications interact with which other applications.

While NetFlow is a very valuable tool for creating application dependency maps, using NetFlow as the sole source of data is a fairly limited approach due to the following:

It can be a huge manual effort to assemble the list of which applications are running on which IP/port combinations. Then it’s a huge effort to match these up with the Netflow data. Worse still, the IP/port list can changes regularly. For Netflow to be useful, you really need a tool like Device42 that always has an up-to-date list of IPs/Ports and which applications are running on them. And more importantly, Device42 integrates this information seamlessly and presents it in nice graphical formats automatically.
NetFlow can’t “see” application interactions that take place inside a physical/virtual/cloud server. NetFlow can only see interactions that go through a router or a switch. So, many dependencies will be missed.
While NetFlow works well for physical routers and switches, it is not great for the virtual routers and switches found in hyper-visors because many hyper-visors do not support NetFlow.
On routers and switches, Netflow must be setup for every segment. If some segments are not setup, the application interactions will not be found.

Nmap

Nmap is a tool primarily used for security scanning. However, it can be used to “guess” which services are running on which ports. Device42 uses Nmap to discover which services are running on which ports and automatically marries this data to NetFlow data to automatically create a map of services and application dependencies.

In Device42, NetFlow and Nmap can be used by themselves, together, or in combination with Point-In-Time discovery. Using NetFlow and Nmap data together but without Point-In-Time discovery results in a good services dependency mapping capability. However, just using these two sources of data is still quite limited in the following ways:

(1)A map of service inter-dependencies and interrelationships can be created. However, many services often combine to form an application. For example, there might be multiple Oracle services plus configuration files that together form the Oracle Application. Applications and associated information (e.g. installed apps on a web server and instances and named pipes on a database) cannot be discovered by the NetFlow/Nmap combination.

(2)The services that Nmap finds are guesses and the guessed version number is probably wrong as often as it is right.

(3)Some enterprises have such restrictive firewall rules that Nmap will discover few if any services.

(4)NetFlow can’t “see” application interactions inside a physical/virtual/cloud server. NetFlow can only see interactions that go through the router. So, many dependencies will be missed.

(5)While NetFlow works well for physical routers and switches, it is not great for the virtual routers and switches found in hyper-visors because many hyper-visors do not support NetFlow.

(6)On routers and switches, NetFlow must be setup for every segment. If some segments are not setup, the application interactions will not be found.

To overcome these limitations, it is better to use NetFlow/Nmap in conjunction with Point-In-Time discovery.

Packet Capture

Unlike Netflow, which captures just the IP and port data, packet capture methods analyze entire packets and can determine what is happening in the individual packets. Network Packet Capture can lead to discovery of applications, services, and their inter-relationships. However, packet capture has some limitations:

It can be resource intensive. One method of packet capture is to mirror every switch port and connect listening servers to those switch ports. This method requires duplicating every switch port. A second method is to put a server inline (similar to a network tap). One needs a separate server for every path to be analyzed.
Like NetFlow, packet capture can’t “see” application interactions inside a physical/virtual/cloud server. NetFlow can only see interactions that go through the router. So, many dependencies will be missed.
While packet capture works well for physical routers and switches, many hyper-visors do not support packet capture.
On routers and switches, packet capture must be setup for every port and/or route. Application interactions using ports/routes not setup will not be found.

Another form of packet capture involves deploying agents on a specific server (or servers). Machine Packet Capture works in conjunction with agents deployed on each machine. These agents look at all networking traffic flowing between the operating system’s kernel, to and from the physical NIC card.

Despite these limitations, packet capture can still be a useful discovery method when combined with other methodologies.

The Device42 Solution

Device42 will comprehensively discover your entire infrastructure leveraging all of auto-discovery methods described above, and automatically correlating the gathered information. Unlike many other products on the market, Device42 is also a comprehensive CMDB solution.

Offering more than just the ability to discover your infrastructure, Device42 is your one-stop single source of truth because it includes a fully integrated suite of enterprise CMDB features: With DCIM support, IPAM, Password Management, Certificate Management, Inventory and Lifecycle Management, Automatic Rack Diagrams, Cable Management, Drag-and-drop Room Layouts, Advanced Visualizations, Powerful, Customizable Reporting, and an advanced, well documented RESTful API. Download a trial today to try Device42 for yourself.