> Monitoring and Troubleshooting on Tribblix

When looking at a system, the questions you might ask might fall into several categories.

What have I got?

It helps to have a good understanding of what you're looking at before you start. And I'm just talking about the basics - how much memory, how many of what type of CPUs, disk, network, and, at a high level, how it's structured, along with a rough understanding of what types of tasks the system is performing.

CPU

To get a listing of the CPUs present in your system, the command is

    /usr/sbin/psrinfo -vp

which will tell you how many CPUs you have, how many cores and threads each one has, and the model details of each CPU.

Memory

The easiest way to get the total memory is

    /usr/sbin/prtconf -m

which gives the total memory in megabytes. (This might be a few megabytes smaller than the marketing capacity, as the system can reserve memory.)

You can see details of which memory slots are occupied in the output from

    /usr/sbin/prtdiag

although the output can be erratic. Even more details of the components can be found by looking through the output of /usr/sbin/smbios. Optionally, the memconf tool may be installed, which can parse these information sources into more easily understood forms.

A partner to memory is swap space, you can get a summary with

    /usr/sbin/swap -sh

and a list of swap devices and their usage with

    /usr/sbin/swap -lh

Disk

There are a couple of ways to get information on the installed disk (storage) devices.

If you're root, the easiest way is to type

    diskinfo

and you'll get a list, with the properties of each disk.

If you're not root, then

    iostat -En

will give information on all storage devices known to the system. The weakness of this method is that the information is a little hard to read, and that it will include removable media devices, such as CD-ROM drives, whether or not there is media in the drive. (In some circumstances it will also report on phantom or removed devices, especially if connected to external storage arrays.)

There's then the question of what the disk devices are used for. With illumos, it's most likely to be ZFS. You can see the available zfs pools

    /usr/sbin/zpool list

and their status, and how they're constructed from the available disk devices

    /usr/sbin/zpool status

You can also see how the storage pool is split up into file systems using

    /usr/sbin/zfs list

Generally, if you have problems, zpool status will tell you about hardware failures, while zpool list will give utilization - it's important to keep usage of a ZFS pool below about 90% capacity, as performance can decline sharply above that point.

Network

The simplest and most traditional way to see the state of the network is to use

    /usr/sbin/ifconfig

(traditionally, ifconfig on its own produced a usage message, but now the default output is the same as that given by the -a flag).

The two other tools available are dladm and ipadm, for working with network interfaces and configuring IP on those interfaces respectively.

Running these commands without arguments produces a usage message. To see what's going on, there are a variety of show- subcommands that are available. For example

    /usr/sbin/dladm show-phys
    /usr/sbin/dladm show-aggr
    /usr/sbin/dladm show-vnic
    /usr/sbin/dladm show-etherstub
    /usr/sbin/dladm show-wifi

will show physical interfaces, aggregations (LACP), virtual NICs, virtual networks, and wifi interfaces. There are more objects that can be shown. Usually, if the system has none of the appropriate objects, nothing will be output. At the IP level, you'll probably want to start with

    /usr/sbin/ipadm show-if
    /usr/sbin/ipadm show-addr

To see how the system will route traffic, use

    netstat -nr

which will print the IPv4 and IPv6 routing tables.

Zones

You should also see what, if any, zones are present

    /usr/sbin/zoneadm list -icv

The important point here is that certain resources (Networking, ZFS datasets) may be delegated to zones, so those resources may not be fully visible in the global zone. Also, the workloads running in zones need to be taken into account.

Of course, one fundamental question is whether you're actually looking at a non-global zone. The zonename command will output the string "global" if you're in the global zone, or the name of the zone if you're in a non-global zone.

Processes

A general view of what's actually running can be given by

    top

which will list the running processes and how much memory and CPU each is using. For an alternative view, try htop.

While the load average isn't a terribly accurate measure of the load on a system, a load average in excess of the number of CPUs in the system likely indicates that the system is overloaded.


Index | Previous Section | Next Section


tribblix@gmail.com :: GitHub :: Privacy