Quick Start: Installing and Configuring STIMS Disk Insider
This quick-start guide walks you through installing and configuring STIMS Disk Insider so you can begin collecting and analyzing disk/telemetry data quickly and reliably.
Before you begin
- System requirements: 64-bit Windows Server or Linux (assume recent LTS), 8+ GB RAM, 50+ GB free disk, network access to target devices.
- Permissions: Administrator/root access on the host where you’ll install the agent and access credentials for any devices you’ll monitor.
- Network: Ensure required ports are open between the Disk Insider server and endpoints (default ports: 443 for HTTPS, plus any custom management ports).
1. Download the installer
- Obtain the latest STIMS Disk Insider installer package for your OS from your vendor portal or internal distribution.
- Verify the package checksum (SHA256) matches the provided checksum to ensure integrity.
2. Install the server component (central collector)
Linux (systemd example):
bash
sudo dpkg -i stimsdiskinsider-server.deb # Debian/Ubuntu# orsudo rpm -ivh stimsdiskinsider-server-.rpm # RHEL/CentOSsudo systemctl enable –now stimsdiskinsider
Windows:
- Run the MSI as Administrator and follow the installer prompts.
- Allow the service through Windows Firewall when prompted.
3. Configure basic server settings
- Open the web console at https://:443 (or the configured management port).
- Set an administrative password and configure TLS with your organization’s certificate (recommended) or use the provided self-signed cert for testing.
- Configure storage paths and retention policy: set local cache directory and how long raw telemetry is kept (e.g., 90 days).
4. Install endpoint agents
Linux agent example:
bash
sudo dpkg -i stimsdiskinsider-agent*.debsudo systemctl enable –now stimsdiskinsider-agent
Windows agent:
- Run agent MSI as Administrator.
- During agent setup, point the agent to the server hostname/IP and provide the enrollment token generated in the server console.
5. Enroll and verify endpoints
- In the server web console, open Devices → Enrollment to generate tokens or invite links.
- Apply a token during agent install or paste it in the agent UI.
- Verify each endpoint shows as “Online” in the Devices list and that basic health metrics are populated (disk usage, IO rates, SMART status).
6. Configure collection policies
- Navigate to Policies → Collection.
- Create a policy that selects target device groups and configures:
- Sampling interval (e.g., 60s for high-frequency telemetry).
- Types of data collected: SMART attributes, disk read/write stats, partition maps, error logs.
- Local retention before upload (helps for intermittent connectivity).
- Apply the policy to device groups.
7. Alerts and notifications
- Go to Alerts → Create Alert and define conditions (e.g., SMART attribute threshold, disk latency > 50ms, available space < 10%).
- Configure notification channels: email, Slack, or webhook. Test notifications to ensure delivery.
8. Dashboards and reporting
- Use the built-in dashboards to view:
- Fleet health overview (percent with warnings, top failing disks).
- Per-device timelines of I/O, latency, and error counts.
- Create a scheduled report (weekly or daily) summarizing critical alerts and disk replacements needed.
9. Security hardening (recommended)
- Replace any default credentials immediately.
- Enable role-based access control (RBAC) and create least-privilege accounts for operators.
- Enforce TLS for agent-server communication and rotate enrollment tokens periodically.
- Limit network access to the management interface by IP allow-listing.
10. Troubleshooting checklist
- Agent not connecting: verify DNS, network ports, and that the enrollment token matches; check agent logs at /var/log/stimsdiskinsider/agent.log or Windows Event Viewer.
- Missing metrics: confirm collection policy applies to the device and sampling interval is appropriate.
- High disk usage on server: review retention and increase storage or reduce retention window.
Quick operational tips
- Start with a conservative sampling interval (e.g., 60–300s) and increase only where high resolution is required.
- Group devices by role (production, test, archival) to apply different retention and alert rules.
- Schedule maintenance windows to suppress non-actionable alerts during planned work.
Next steps
- Add integrations (ticketing, CMDB) so alerts create automated work items.
- Pilot the system on a subset of devices for 1–2 weeks, then scale across your fleet after tuning policies.
If you want, I can generate example alert rules, an enrollment token rotation schedule, or a sample retention policy tuned for a 500-node fleet.
Leave a Reply