Discovery Troubleshooting


Collection Validation

The Collection Validation feature is the primary troubleshooting tool for issues relating to data collection for Discovery or Performance collection.

The Collection Validation feature is available on the RN150 Virtual Appliance. It allows the user an in-depth review of what data collection activities are performed against a device, and details about any operations that are not successful. The Collection Validation feature is a compliment to the existing Credential Test feature on the RN150, and a crucial part of troubleshooting during the Discovery stage of an engagement.

Currently, the Collection Validation feature is only available for the SSH Collection Module, for Linux/UNIX devices using SSH, and for the Windows Collection module, using the WMI and SMB protocols for collection from Windows servers and workstations.


Credential Test

For each credential type, the Test button is available when entering a new credential or editing an existing credential. The Credential Test feature accepts the IP address of a device, and runs a simple connection and authentication test against that device using the credential entry being operated on. The Credential Test does not test whether the device will respond to ICMP, and only runs the minimum operations required to confirm whether the credential can be used to communicate with the device. The Collection Validation feature has been introduced to go beyond this simple test and to provide extensive detailed feedback, although the Credential Test feature remains a valuable tool in many conditions. The Credential Test feature will, upon a successful test, automatically add the new credential entry or apply a modification to the existing credential entry.

Collection Validation

The Collection Validation feature, like the Credential Test, can be used when entering a new credential entry or editing an existing credential entry. Selecting the Validate button will open a dialog where the IP address of a device you would like to test against is entered. Once the validation process is complete, a report will be displayed on the screen detailing each operation that was performed, the level of success of that operation, and details on any operations that failed. Unlike the Credential Test feature, Collection Validation will not automatically add a credential entry or update an existing entry.

The Collection Validation process is run in the context of a particular credential entry, against a particular device. Once the Validate button has been selected, you will be asked to enter the IP address of a device that you would like to run the Validation process against. Make sure that the IP address you select corresponds to the proper type of device. For instance, if running Collection Validation from the Windows credential section, select a Windows Server or Workstation.

The process may take up to a couple of minutes to complete. The Discovery, Inventory, and Performance collection processes will be run against the selected IP address. These processes exactly match the operations that are run during the normal scanning and performance collection stages of the engagement, although the Collection Validation feature does not store any collected data and will not affect any aspect of the platform, such as Assets or reported performance analytics.

The validation process will start by confirming that the device will be discovered during a scan of the environment. This involves an ICMP Echo Request (ping) to determine if the device exists on the network, combined with a TCP/UDP port test to discover what protocols are available on the device. If an ICMP Echo Reply is not received, then the validation process will stop. Similarly, if the type of protocol associated with the type of credential being tested is not available, for instance if WMI (TCP 135) is not available when testing a Windows credential, then the validation process will stop, as it will not be able to communicate with the device.

Next, the Inventory process will run, which is responsible for collecting the data from the device that is shown in the Assets page. This may include data such as the Operating System type and version, network interfaces and addresses, disk storage, etc. The operations that collect this data will be logged, and the result of each collection operation will be shown. Details on the result codes are provided below. The Discovery process checks to see whether the relevant protocol is reported to be available, but the Inventory process tests the ability to communicate and authenticate with that device using the appropriate protocol. If communication or authentication fails, then the validation process will be stopped, and any available details on why the attempt failed will be displayed.

Finally, the Performance collection process will run. When a device is licensed for performance collection, this process is periodically run against that device for the duration of the performance collection period, typically as long as the device is licensed. Like the Inventory process, each operation that is issued to the device for data collection will be logged, along with the result of that operation.

Once the collection processes have finished, a report is then presented to provide the details of what operations were performed, the Result Code of each operation, an Overall Status describing how complete the collection activity was, and any available details for operations that did not complete successfully. See below for details on the Overall Status, and the failure details section. This report can also be copied into the clipboard, in case the report needs to be saved or copied to one of our support staff. In order to do this, the Text Version button should be selected, which displays the report as plain text, and then the Copy button should be selected to copy the text into your clipboard.

Overall Status

The Overall Status provides a high-level status of how complete the data collection activity was. The possible Overall Statuses, and their descriptions are:

Overall Status
Description
SUCCESSAll collection operations completed successfully, and the device is fully prepared for participation in the engagement. No further action is required.
PARTIALSome non-critical operations did not complete successfully. The device is prepared for participation in the engagement, but some further action may be required to ensure the best possible data. Further action is at the user's discretion.
FAILAt least one critical operation failed. The device may not be prepared for participation in the engagement, or some critical data will not be available for the device. Further action is required.

Result Codes

For each operation run during the validation process, what operation is performed and the result of that operation is logged. Understanding the result codes is the key to understanding the output of the Collection Validation feature.

All operations may result in the SUCCESS code, which indicates that the operation completed successfully and collected the associated data. Any other result code shows the manner in which the operation failed, in respect to the outcome of that failure.

Result Code
Description
Overall Status
Cause for Concern
SUCCESSThe operation was successful.-No
FAILThe operation failed, where either the collection attempt was not successful, or data was not returned. A FAIL code is always cause for concern and further action.FAILYes
EXPLOREThe operation failed, but the specific operation or the data it is intended to collect is not critical. Often, this indicates an operation that tests for the availability of some data or collection method.-No
FALLBACKThe operation failed, but a fallback operation to collect equivalent data will immediately follow. Used in cases where a preferred operation may not be available, and an alternative method is available.-Maybe (see below)
INCOMPLETEThe operation failed, which results in a failure to collect non-critical data. Correcting the issue is recommended for the best experience with the platform, but not required.PARTIALUser Discretion

Any operation that reports a FAIL result should be investigated and corrected. Under normal collection activity, this may prevent data collection activity for the device or critical data will be missing from the platform.

Operations that result in an EXPLORE status can be safely ignored. A failure from an EXPLORE operation will not negatively affect data collection, and is shown in the Collection Validation report for logging purposes.

Operations that result in INCOMPLETE may or may not be an issue, depending on what data is interesting in the context of a specific engagement. An operation related to critical data for the platform will never result in an INCOMPLETE. For instance, a failure to collect hardware platform information from a Linux/UNIX device using the SSH Collection Module is not critical for the core value of the platform, but in cases where this data is important for the goals of an engagement further action to correct the issue may be necessary.

The FALLBACK result warrants the most explanation. This is reported for an operation that failed, but another (typically less preferred) operation is available to be run immediately afterwards to attempt to collect equivalent data. This means that the operation that is logged immediately following one that reported a FALLBACK result will indicate whether further action is required. If a FALLBACK is immediately followed by a SUCCESS, then the second operation was successful and no further action is required. If a FALLBACK is immediately followed by a FAIL, then all attempts to collect particular critical data were exhausted, and the issue must be corrected. If a FALLBACK is immediately followed by an INCOMPLETE, then all attempts to collect particular non-critical data were exhausted, and further action to correct the issue is at the user's discretion. Collection of particular data may have multiple fallback operations, so a FALLBACK may be immediately followed by another FALLBACK. Such a string of operations will always terminate with a result code other that FALLBACK, which indicates the ultimate outcome of the attempt to collect that data.

Failure Details

If any operations did not complete with a SUCCESS result code, further details on the operation are provided at the top of the report. The data is broken into sections for the Inventory and Performance colleciton processes, although some operations may be common to both components any may display twice in the report. The particular format of the data depends on the protocol in use, but generally will include the operation itself, the class of failure encountered, and any error messages or other output produced by the operation.

As an example using the SSH Collection Module, the output may be:

ReasonCommand Failure
ResultFAIL
Commandsudo ifconfig -a
Exit Code1
Standard Output
Standard Errorsudo: no tty present and no askpass program specified

In this case, the command 'sudo ifconfig -a' returned a non-zero exit status, indicating that the command did not complete successfully. As this command is critical for proper data collection, a failure from the command results in a FAIL result status, which in turn sets the Overall Status to FAIL as well. No output on the standard output stream (STDOUT) was produced, but the standard error stream (STDERR) produced the error message 'sudo: no tty present and no askpass program specified ', indicating that sudo has not been properly configured to allow the command. Further action is required, in this case to review the sudo configuration to ensure that the ifconfig command is properly permitted, according to the SSH Collection Module documentation.

Common Issues Resulting In A PARTIAL Status

Some operations have a higher than normal likelihood of issues that cause an Overall Status of PARTIAL. Some of these errors are documented below, with the degree of severity and recommendations on how to resolve them.

SSH

Commandcat /sys/class/net/eth0/speed
Errorcat: /sys/class/net/eth0/speed: Invalid argument
ResultINCOMPLETE
DescriptionThe Linux kernel cannot report the speed of a network interface, in this case the eth0 interface. Typically this data is not available to Linux itself, which is usually seen with older kernels on virtualization environments
RecommendationThis issue does not currently have a resolution, so no action is required
Commandwhich dmidecode
Error
which: no dmidecode in (/usr/local/bin:/usr/bin)
ResultINCOMPLETE
DescriptionThis will typically follow the command /sys/devices/virtual/dmi/id, with a result of FALLBACK. Newer Linux kernels (2.6 and above) expose hardware platform data (hardware vendor, product, serial, etc) under the sysfs filesystem, which is the perferred approach for collection. When this is not available, the collection process checks for the availability of the dmidecodeutility to collect the same data. This utility is typically not present in the default installation of most Linux distributions, so a failure to detect this utility as the fallback method will result in the exhaustion of the methods for collection that data, and an INCOMPLETE status.
RecommendationIf hardware platform data is desired, which is shown in the Assets page, installing the dmidecode utility and making it available using sudo, as described in the SSH Collection Module documentation, will resolve this behavior.

Common Issues Resulting In An ERROR Status

The following are commonly seen issues that result in an ERROR status, and the potential for an inability to continue collection on a device.

SSH

Commanddf -P
Errordf: /some/mount/point: Stale file handle
ResultFAIL
DescriptionThe df utility is used to collect the filesystems present on a Linux/UNIX device. In some cases, particularly related to NFS, the kernel thinks that a filesystem is mounted when it is not, or an NFS filesystem has been detached. According to the POSIX standards, the df utility will return a non-zero exit status on an error, which it does in this case. In order to ensure that all filesystems mounted on the system are properly accounted for, in the case that df returns an error it is considered a critical command failure.
RecommendationInvestigate the system in question to determine why df is unable to report on certain filesystems, and fully mount or unmount the problematic filesystem. It may also be possible to cause the system's cache of mounted filesystems to resync.

Protocol Specific Troubleshooting

For specifics on troubleshooting certain devices please proceed to the troubleshooting subsections of the following pages:

Windows Collection Module

SSH Collection Module

Discovery Help

If you have any questions or need further assistance that is not covered in the documentation then please open a ticket with our support team.  When opening a ticket regarding devices not being discovered, the following information is required:

Windows Devices

From HostResultCommand
Target HostIP Informationipconfig /all

Firewall InformationScreenshot of windows firewall status

WBEM Test localHow to Use wbemtest.exe to validate Windows Credentials – Video
wbemtest.exe doc

Service ListScreenshot of all services that are running (for 3rd party firewall check)
Source Host (some other box on the customer’s network)IP Information – This box should be on the same subnet as the RN150ipconfig /all

Ping to Target hostping x.x.x.x

WBEM Test RemoteHow to Use wbemtest.exe to validate Windows Credentials – Video
wbemtest.exe doc

Linux Devices

From HostResultCommand
Target HostIP Informationifconfig -a

Firewall Informationsudo iptables –L –n
Source Host (some other box on the customer’s network)IP Information – This box should be on the same subnet as the RN150ifconfig -a

Ping to Target hostping x.x.x.x

SNMP Walk of All MIBSUCD: snmpwalk -v2c -cstring x.x.x.x ucdavis
UCD-DISKIO: snmpwalk -v2c -cstring x.x.x.x ucddiskiomib
HOST-RESOURCES: snmpwalk -v2c -cstring x.x.x.x host
TCP: snmpwalk -v2c -cstring x.x.x.x tcp
IF: snmpwalk -v2c -cstring x.x.x.x interfaces