When designing the graphical interface of the WATCHDOG system,
we considered that using visual graphics for monitoring is a very important aspect of IT operations monitoring systems.
It should be user-friendly and professional, making it easy to understand for both experts and non-experts,
and it should meet the advanced debugging and tracking【Debug】needs of professional engineers.
Integrating the overall information equipment information correlation is a basic condition.
WATCHDOG System - Graphical Monitoring Can Be Classified By Use:
➢ Real-time Information War Room Monitoring Chart
➢ Real-time Status Charts for Each Monitoring Category
➢ Traffic Analysis Charts
➢ Performance Analysis Charts
➢ Data Analysis Charts
➢ Circular Ratio Analysis Charts
➢ Ranking Statistical Analysis Charts
➢ Alarm Analysis Charts
➢ Long-term Data Lists
➢ Ranking Analysis Charts
➢ Text Type List Details Charts
Below will introduce the diversified graphical interfaces within the Watchdog system, covering key interfaces from the central control center, alarm center to various detection screens.
WATCHDOG System - IT Operations Center Homepage
The operations center homepage is designed as a one-stop overview platform, aiming to provide a comprehensive summary of system status,
including the total number of detection items, the number of current alarm events, and multi-dimensional statistical analysis charts.
In addition, the platform integrates quick links to various detection devices,
enabling users to easily perform in-depth inspections or management, ensuring real-time system monitoring and data analysis capabilities.
Server Host - Alarm Ranking Analysis Chart - Top 10 Servers in the Last Three Months
Specially designed for the server hosts under each IP address, ranking and comparing the number of alarms triggered in the past three months.
In addition, the system also provides annual alarm data analysis charts to support long-term trend monitoring and evaluation, thereby optimizing server operation and maintenance strategies.
Alarm Center
Specially designed for real-time alarm monitoring and management, providing users with a centralized view to monitor and track triggered alarm events.
This interface supports direct links to the detection location of related devices, and allows viewing of memo data and detailed information of the alarm events,
including occurrence time, trigger reason and other key data, to facilitate efficient fault diagnosis and decision-making.
Monitoring Categories (Major Items)
Each icon represents an independent monitoring indicator, showing the system's health status through color coding.
When all sub-items under a monitoring item are operating normally, the system interface will display a green indicator light, symbolizing that the item is stable;
conversely, if any sub-item triggers an alarm, the monitoring item immediately changes to a red indicator light, instantly alerting managers to potential problems.
Server Host - Overall
In the homepage monitoring interface of the server host, the displayed information can be divided into three sizes: small, medium, and large, according to different monitoring needs. Especially in the large image mode,
each host can display about 30 different status information.
VMHost - Virtual Host
For VMHost (ESXi) virtual host environment, its monitoring interface can be displayed independently, dedicated to in-depth monitoring and analysis.
This detection page covers key monitoring information such as the list of virtual guests (Guest hosts) and their resource allocation status.
Server Host - Detection Items
The monitoring architecture of the server host is divided into three levels, aiming to provide comprehensive detection and monitoring coverage.
➢Hardware layer: Hardware indicators and management interfaces, such as IPMI and ILO technology, as well as RAID status, temperature, power, and fan conditions.
➢Operating System layer: Monitoring of core system resources, including CPU usage, memory allocation, etc.
➢Application System layer: Focus on the operation status of application programs, event alarms, and data collection, supporting high-level application performance monitoring and analysis.
Network Equipment
Including switches (Switch), firewalls (Firewall), and load balancers (Load Balancer).
Through the graphical interface, managers can monitor the connection status in real-time and obtain rich information.
This solution particularly emphasizes monitoring of switches,
providing visual tools such as switch topology maps, traffic load ratios, and Core Switch topology maps, to facilitate comprehensive evaluation and management of network status.
For comprehensive monitoring of switches, the system supports displaying key indicators including VLAN distribution charts, switch port speed distribution charts, and port status charts.
If the switch's SNMP includes information such as voltage, fan, temperature, etc., it will also be displayed.
Server Hardware - IPMI
Covers key monitoring indicators such as temperature sensors, fan speeds, voltage and current monitoring, and power supply status.
Including HP's iLO (Integrated Lights-Out), IBM's IMM (Integrated Management Module), and Dell's iDRAC (Integrated Dell Remote Access Controller).
By integrating IPMI into monitoring, managers can not only detect and handle hardware-related issues caused by failures in real-time,
but also issue power on/off commands to server hosts through IPMI in the WATCHDOG system, and further implement【one-click shutdown】 for hundreds of hosts using the【command gateway】function in combination with IPMI.
Packet Test
Packet testing is mainly used to measure the reliability and efficiency of network connections.
This method focuses on two core indicators:
➢Packet loss percentage (loss)
➢Packet response time (time)
Automate the process of【routine ping operations performed by system administrators】
(i.e., sending packets to test the response of network devices), transforming it into regular automatic detection.
Furthermore, through network path tracing technology, packet testing can precisely locate faults in network connections, promoting quick and effective problem resolution, and ensuring the stable operation of the network.
IP Ports
Regularly conducts socket connection monitoring for network service-capable devicesto confirm whether the network service programs or connections of the host are interrupted,
and checks whether the waiting connection service is an illegal Listening status.
Website Detection
Website detection uses regular sampling, collecting data from up to 10 web pages per website,
ensuring that every node in the network connection is smooth to ensure the normal operation of the website,
while checking the network connection and DNS resolution, WEB Server service status such as: Apache and IIS, middleware such as Java AP
backend database connection or system status, time taken to obtain web pages and comparing web page content to ensure that the web pages have not been tampered with.
Uninterruptible Power Supply - UPS
【Power Equipment-UPS】is one of the auxiliary monitoring functions to help monitor the computer room environment,
and plays an indispensable role in maintaining the uninterruptible power supply system (UPS) of the information center.
The Watchdog IT operations monitoring system not only focuses on the daily monitoring of information equipment,
but also emphasizes the power equipment in the environmental monitoring items to ensure the continuous operation and optimal performance of the overall operation.
Network Cabling
The network cabling diagram shows all the nodes (nodes) between two network devices, using the ping command to test the nodes in the network cabling diagram that have not been excluded one by one,
to find out the problematic network devices.
Temperature/Humidity Sensors
In the Watchdog IT operations monitoring system, the auxiliary monitoring function of temperature and humidity sensors is not the main monitoring item,
but Watchdog still provides corresponding auxiliary monitoring.
Other items are also included below, you can click on each detection function item for more detailed reference.