Frequently Asked Question
Virtualisation Specific Examples
Virtualisation Specific Examples
Below are genuine Proxmox incidents that would be escalated to a third‑line technical support provider. Use these as templates when opening tickets – the clearer the description, the faster the resolution.
1. Cluster node loses quorum and VMs fail to start
Symptoms
pveclustershows “Node X is down” in the GUI.- All VMs on the affected node report “Failed to start – no suitable host”.
pvecm statusreturns “Quorum: 0/3”.
Ticket description
“Cluster of three Proxmox VE 8 nodes (node01, node02, node03) entered split‑brain. Node01 lost connectivity to the shared storage (Ceph RBD pool) and the cluster quorum dropped to 0. All VMs on node01 stopped with ‘no suitable host’ errors. Request assistance to restore quorum and bring VMs back online.”
2. VM hangs on suspend/resume after a host reboot
Symptoms
- VM defined with “Suspend when host shuts down” enabled.
- After a scheduled host reboot, the VM remains stuck at “paused” and cannot be resumed.
vzdumpshows the VM’s disk as “locked”.
Ticket description
“VM 101 (Debian 12) configured with ‘Suspend on host shutdown’ hangs on resume after host reboot. The VM shows status ‘paused’ and cannot be started or resumed. Disk appears locked. Need assistance to recover the VM and prevent recurrence.”
3. Unexpected VM panic with no clear log entry
Symptoms
- VM abruptly powers off with “Kernel panic - not syncing” on console.
- No corresponding entry in
/var/log/syslogon the host. - Host logs only show “VM 202: power state changed to off”.
Ticket description
“VM 202 (Windows Server 2022) experienced an unexplained kernel panic; the VM powered off without any panic message in the host logs. Request investigation into possible hardware or configuration cause and guidance on reproducing the issue.”
4. Backup job fails with “No space left on device” despite sufficient storage
Symptoms
- Proxmox backup job ( vzdump ) exits with error “No space left on device”.
df -hon the backup storage shows >30 % free.- The backup repository is a ZFS dataset with compression enabled.
Ticket description
“Scheduled backup of VM 305 to ZFS repository ‘rpool/backups’ fails with ‘No space left on device’ even though df -h shows ample free space. Need troubleshooting of ZFS quota/compression settings and resolution.”
5. HA resource group fails to migrate due to “Node not authorized”
Symptoms
- Attempt to move a VM from node01 to node02 via HA group results in error “Node not authorized”.
pvecm add <node2‑IP>was run, but the token expired.- Cluster config shows mismatched
cluster_name.
Ticket description
“HA group migration from node01 to node02 fails with ‘Node not authorized’. The cluster token on node02 expired after a recent network outage, causing a mismatch in cluster_name. Assistance required to re‑establish token and complete migration.”
6. Container fails to start after host kernel upgrade
Symptoms
- LXC container 101 crashes immediately after host kernel update to 6.5.
lxc-start -n 101returns “Failed to mount cgroup2”.dmesgshows “cgroup2: unknown filesystem type ‘cgroup2’”.
Ticket description
“LXC container 101 fails to start after upgrading Proxmox VE kernel to 6.5. The container crashes with ‘Failed to mount cgroup2’ and dmesg reports unknown cgroup2 filesystem. Need assistance to restore container operation post‑upgrade.”
7. VM network bridge loses connectivity after NIC teaming change
Symptoms
- Physical NIC eno1 is part of a bonding interface (mode 802.3ad).
- After adding eno2 to the bond, VMs on bridge
vmbr0cannot obtain DHCP leases. bridge linkshows the bridge still up, buttcpdumpon eno2 shows no traffic.
Ticket description
“After adding NIC eno2 to an existing 802.3ad bond (eno1+eno2) for increased bandwidth, VMs on bridge vmbr0 lose network connectivity and cannot obtain DHCP leases. Bridge appears up but traffic on eno2 is not forwarded. Assistance required to correct bonding configuration and restore VM networking.”
8. Proxmox VE fails to boot after ZFS pool import error
Symptoms
- System hangs at “Importing ZFS pool ‘rpool’… failed”.
- Boot proceeds to initramfs prompt.
zpool statusshows pool offline due to missing device.
Ticket description
“Proxmox VE 8 server does not boot; ZFS pool ‘rpool’ fails to import with ‘device not found’. System stops at initramfs prompt. Need help diagnosing missing ZFS device and restoring boot.”
9. High availability failover takes >30 minutes, causing extended downtime
Symptoms
- Primary node experiences network partition; HA group should failover to secondary.
- Monitoring shows VMs remain down for >30 minutes before secondary takes over.
pveha-managerlogs indicate “resource lock timeout”.
Ticket description
“During a network partition, HA failover from primary to secondary node took over 30 minutes, resulting in prolonged VM downtime. pveha-manager logs show ‘resource lock timeout’. Assistance required to tune HA parameters and reduce failover time.”
10. VM snapshot fails with “Permission denied” on read‑only disk
Symptoms
- Attempt to take a snapshot of VM 402 fails with error “Permission denied (read-only file system)”.
- Disk is marked as “readonly” in the VM config.
- Host file system shows no errors.
Ticket description
“Snapshot operation on VM 402 fails with ‘Permission denied (read-only file system)’ despite the VM’s disk being configured as read‑only. Need guidance on correcting the permission issue or alternative method to snapshot the VM.”
Virtualisation Specific Examples
Below are real‑world incidents that would normally be escalated to a third‑line technical support team. Each example shows the exact wording you can use when opening a ticket, so the support engineers have all the information they need to reproduce and resolve the problem.
1. Bridge appears up but traffic on eno2 is not forwarded
Ticket description
“The Linux bridgevmbr0shows as UP inbrctl show, but traffic arriving on the physical interface eno2 is not being forwarded to the VMs attached to the bridge.tcpdumpon the bridge captures packets, yet the VMs do not receive them. Assistance required to verify the bonding/bridge configuration and restore normal networking.”
Key points to include
- Output of
brctl showandip addr show vmbr0 - Output of
cat /proc/net/devfor eno2 - Contents of
/etc/network/interfaces(or Netplan YAML) for the bridge and physical NIC - Result of
ethtool -i eno2and any bonding mode configured - Whether the bridge is part of a bond or a VLAN trunk
2. Proxmox VE fails to boot after ZFS pool import error
Ticket description
“Proxmox VE 8 server hangs at the message ‘Importing ZFS pool ‘rpool’… failed’ and drops to an initramfs prompt. Running zpool status from the prompt shows the pool is offline because the underlying device is missing. Need help diagnosing the missing ZFS device and restoring the boot process.”
Key points to include
- Exact boot log excerpt (e.g.,
dmesg | grep -i zfs) - Output of
zpool status -v - Contents of
/etc/fstaband/etc/default/grub(ZFS options) - Whether the pool is mirrored, RAID‑Z, or single‑disk
- Any recent hardware changes or disk replacements
3. High‑availability failover takes >30 minutes, causing extended downtime
Ticket description
“During a simulated network partition, the HA group should have failed over from the primary node to the secondary node within a few seconds. Instead, VMs remained down for over 30 minutes before the secondary node took over.pveha-managerlogs contain the entry ‘resource lock timeout’. Assistance required to tune HA parameters (e.g.,ha-manager,ha-cpu,ha-storage) and reduce the failover time.”
Key points to include
- Output of
pveha-manager statuson both nodes - Configuration of the HA group (resource IDs, thresholds)
- Current values of
ha-managersettings in/etc/pve/ha-manager.conf - Timestamps from
/var/log/pve-ha-manager.logshowing the timeout - Whether the issue occurs only on specific VMs or across the whole cluster
4. VM snapshot fails with “Permission denied” on read‑only disk
Ticket description
“Attempting to take a snapshot of VM 402 fails with the error ‘Permission denied (read‑only file system)’. The VM’s disk is explicitly marked as read‑only in the VM configuration, yet the host file system shows no errors. Need guidance on correcting the permission issue or an alternative method to snapshot the VM.”
Key points to include
- Full command used (
qm snapshot 402 ...) and its output - The relevant section of the VM’s config file (
/etc/pve/qemu-server/402.disk0.conf) showingreadonly=1 - Output of
ls -l /var/lib/vz/images/...for the VM’s disk files - Whether the host’s storage backend is Ceph, LVM, ZFS, or NFS
- Any recent changes to storage permissions or ACLs
How to use these examples when raising a ticket
- Copy the “Ticket description” verbatim (or adapt it to your exact situation).
- Add the “Key points to include” as separate bullet items in the ticket body.
- Attach relevant command outputs (e.g.,
brctl show,zpool status, logs). - Mention the impact (e.g., “Network traffic loss for all VMs on bridge vmbr0”, “Extended downtime of critical service”).
- Specify the desired outcome (e.g., “Restore bridge forwarding”, “Recover boot”, “Reduce HA failover time”, “Enable successful snapshots”).
By providing this level of detail, third‑line engineers can quickly pinpoint the root cause and apply the appropriate fix.
If any of the required outputs or configuration files are missing, check the next relevant item in the list above before submitting the ticket.
Virtualisation Specific Examples
1. Bridge forwarding broken after network change
Ticket description
“The VMs on bridge vmbr0 are no longer receiving network traffic after a recent host network re‑configuration. All VMs attached to the bridge show ‘no link’ and cannot obtain IP addresses.”
Key points to include
- Full command used to create/modify the bridge (e.g.,
brctl show,ip link add name vmbr0 type bridge). - Output of
brctl showbefore and after the change. - Relevant section of
/etc/network/interfacesor Netplan YAML showing the bridge definition and member ports. - Whether any VLANs or VXLANs were added/removed.
- Impact: loss of connectivity for critical services (e.g., monitoring, web servers).
- Desired outcome: restore forwarding so VMs can communicate normally.
2. VM fails to boot after storage migration
Ticket description
“After migrating the storage of VM 105 from an LVM volume to a Ceph RBD image, the VM will not boot and stays at the GRUB rescue prompt.”
Key points to include
- Command used for migration (
qm migrate 105 ...orrbd export). - Output of the migration command and any warnings.
- Content of
/etc/pve/qemu-server/105.disk0.confshowing the newdisk0path pointing to the RBD device. - Result of
ls -l /var/lib/vz/images/...for the migrated image. - Storage backend: Ceph (RBD) or LVM.
- Recent changes to Ceph pool settings or LVM flags.
- Impact: service outage for the application running in VM 105.
- Desired outcome: restore normal boot of the VM.
3. HA failover takes longer than SLA
Ticket description
“During a simulated node failure, the HA manager took 12 minutes to migrate VM 219 to the standby node, exceeding our 5‑minute SLA.”
Key points to include
- Command or UI action that triggered the failover (e.g.,
pvecluster status,ha-manager status). - Log excerpt from
/var/log/pve-ha-manager.logshowing the start and end timestamps. - Current HA configuration for VM 219 (e.g.,
ha: 1,max_restart = 2). - Resource usage on the target node at the time of failover (CPU, memory, storage).
- Storage backend: shared storage (Ceph, NFS, or iSCSI).
- Any recent changes to network latency or firewall rules.
- Impact: prolonged service interruption beyond acceptable limits.
- Desired outcome: reduce failover time to ≤ 5 minutes.
4. VM snapshot fails with “Permission denied” on read‑only disk
Ticket description
“Attempting to take a snapshot of VM 402 fails with the error ‘Permission denied (read‑only file system)’. The VM’s disk is explicitly marked as readonly=1 in the VM configuration, yet the host file system shows no errors.”
Key points to include
- Full command used (
qm snapshot 402 -snapshotname backup1). - Output of the command, highlighting the permission error.
- Relevant section of
/etc/pve/qemu-server/402.disk0.confshowingreadonly=1. - Output of
ls -l /var/lib/vz/images/...for the VM’s disk files (permissions, ownership). - Storage backend type (Ceph, LVM, ZFS, NFS).
- Any recent changes to storage permissions, ACLs, or Ceph pool settings.
- Impact: inability to create backups or perform maintenance on the VM.
- Desired outcome: either remove the read‑only flag safely or provide an alternative snapshot method.
How to use these examples when raising a ticket
- Copy the “Ticket description” verbatim (or adapt it to your exact situation).
- Add the “Key points to include” as separate bullet items in the ticket body.
- Attach relevant command outputs (e.g.,
brctl show,qm snapshot,ls -l,cat /etc/pve/...conf). - Mention the impact (e.g., “Network traffic loss for all VMs on bridge vmbr0”, “Extended downtime of critical service”).
- Specify the desired outcome (e.g., “Restore bridge forwarding”, “Recover boot”, “Reduce HA failover time”, “Enable successful snapshots”).
Providing this level of detail enables third‑line engineers to quickly pinpoint the root cause and apply the appropriate fix.
If any of the required outputs or configuration files are missing, check the next relevant item in the list above before submitting the ticket.
Virtualisation Specific Examples
Example 1 – Proxmox VM snapshot fails with “Permission denied (read‑only file system)”
Ticket description (copy‑and‑paste into the ticket body)
When attempting to take a snapshot of VM 402 the command fails with the error
“Permission denied (read‑only file system)”. The VM’s disk is explicitly marked
as readonly=1 in the VM configuration, yet the host file system shows no errors.
Key points to include
- Full command used: qm snapshot 402 -snapshotname backup1
- Output of the command, highlighting the permission error
- Relevant section of /etc/pve/qemu-server/402.disk0.conf showing readonly=1
- Output of ls -l /var/lib/vz/images/... for the VM’s disk files (permissions,
ownership)
- Storage backend type (Ceph, LVM, ZFS, NFS)
- Any recent changes to storage permissions, ACLs, or Ceph pool settings
- Impact: inability to create backups or perform maintenance on the VM
- Desired outcome: either remove the read‑only flag safely or provide an
alternative snapshot method
How to use this example when raising a ticket
- Copy the “Ticket description” verbatim (or adapt it to your exact situation).
- Add the “Key points to include” as separate bullet items in the ticket body.
- Attach relevant command outputs (e.g.,
qm snapshot,ls -l,cat /etc/pve/...conf). - Mention the impact (e.g., “Extended downtime of critical service”).
- Specify the desired outcome (e.g., “Enable successful snapshots”).
Example 2 – Bridge interface vmbr0 loses forwarding after network change
Ticket description
The bridge interface vmbr0 on host node01 has stopped forwarding traffic.
All VMs attached to vmbr0 lose network connectivity. The issue appeared after
changing the underlying physical NIC from eth0 to eno1 and updating the
/etc/network/interfaces file accordingly.
Key points to include
- Full command used: ip link set dev eno1 master vmbr0
- Output of `brctl show` before and after the change
- Relevant section of /etc/network/interfaces showing the bridge definition
- Output of `iptables -L -v -n` (no unexpected DROP rules)
- Storage backend type (N/A – pure networking)
- Any recent changes to firewall or VLAN tagging
- Impact: loss of network access for all VMs on that bridge
- Desired outcome: restore forwarding on vmbr0 and prevent recurrence
How to use this example when raising a ticket
- Paste the ticket description into the ticket form.
- Insert the bullet list under a “Key points to include” heading.
- Attach screenshots or command output for
brctl show,ip link, and
/etc/network/interfaces.
- State the impact clearly (e.g., “All services on the affected VLAN are
unreachable”).
- Request a specific resolution (e.g., “Re‑apply bridge ports correctly”).
Example 3 – VM fails to boot after host OS upgrade
Ticket description
VM 127 does not power on after the recent upgrade from Debian 11 to Debian 12
on the host node. The VM console shows “Failed to mount root filesystem”.
The VM’s disk is stored on an LVM volume that was migrated to a new VG during
the upgrade.
Key points to include
- Full command used: qm start 127
- Output of the command, highlighting the mount error
- Relevant section of /etc/pve/qemu-server/127.disk0.conf showing the disk
path (e.g., /dev/vg_new/lv_root)
- Output of `lvdisplay` for the affected logical volume (VG name, LV name,
attributes)
- Storage backend type: LVM on local disk
- Any recent changes to LVM configuration or fstab inside the VM
- Impact: extended downtime of a critical business application
- Desired outcome: restore ability to boot the VM or provide a migration
path to a compatible disk layout
How to use this example when raising a ticket
- Insert the description verbatim (or adapt it).
- Add the bullet list as “Key points to include”.
- Attach the command outputs (
qm start,lvdisplay,cat /etc/pve/...conf). - Clearly state the impact (e.g., “Service outage affecting order processing”).
- Request the desired outcome (e.g., “Recover VM boot” or “Provide migration
instructions”).
If any of the required outputs or configuration files are missing, check the next relevant item in the list above before submitting the ticket.
Virtualisation Specific Examples
Below are three genuine incidents that would normally be escalated to a third‑line technical support provider. Use them as templates when opening tickets – copy the wording, add the required details, and attach the requested artefacts.
1. Bridge interface mis‑configuration causing loss of connectivity
Ticket description
Host: pve-node01 (Proxmox VE 8.2)
Bridge: vmbr0 (used by VMs on VLAN 10)
Issue: After a recent network change, all VMs attached to vmbr0 lost network
connectivity. The bridge still shows as “up” but no traffic passes.
Key points to include
- Output of `brctl show` (or `bridge link`) – note any ports listed as “blocked”
or missing.
- Output of `ip link show vmbr0` – check state and assigned IP.
- Content of `/etc/network/interfaces` – the stanza defining vmbr0.
- Impact: All services on VLAN 10 are unreachable; critical monitoring tools
cannot poll the host.
- Desired outcome: Re‑apply bridge ports correctly and restore connectivity.
How to use this example when raising a ticket
- Paste the description verbatim (or adapt the host/bridge names).
- Add the bullet list as “Key points to include”.
- Attach the command outputs (
brctl show,ip link show vmbr0,cat /etc/network/interfaces). - State the impact clearly (e.g., “All services on the affected VLAN are unreachable”).
- Request a specific resolution (e.g., “Re‑apply bridge ports correctly”).
2. VM fails to boot after host OS upgrade
Ticket description
VM 127 does not power on after the recent upgrade from Debian 11 to Debian 12
on the host node. The VM console shows “Failed to mount root filesystem”.
The VM’s disk is stored on an LVM volume that was migrated to a new VG during
the upgrade.
Key points to include
- Full command used: `qm start 127`
- Output of the command, highlighting the mount error.
- Relevant section of `/etc/pve/qemu-server/127.disk0.conf` showing the disk
path (e.g., `/dev/vg_new/lv_root`).
- Output of `lvdisplay` for the affected logical volume (VG name, LV name,
attributes).
- Storage backend type: LVM on local disk.
- Any recent changes to LVM configuration or `/etc/fstab` inside the VM.
- Impact: Extended downtime of a critical business application.
- Desired outcome: Restore ability to boot the VM or provide a migration path
to a compatible disk layout.
How to use this example when raising a ticket
- Insert the description verbatim (or adapt the VM ID and host details).
- Add the bullet list as “Key points to include”.
- Attach the command outputs (
qm start,lvdisplay,cat /etc/pve/qemu-server/127.disk0.conf). - Clearly state the impact (e.g., “Service outage affecting order processing”).
- Request the desired outcome (e.g., “Recover VM boot” or “Provide migration
instructions”).
3. Unexpected loss of HA resource after node failure
Ticket description
Node: pve-node02 (Proxmox VE 8.2) – part of a 3‑node HA cluster.
Event: Node lost power at 14:32 UTC; HA manager automatically fenced the node.
After the node came back online, the virtual IP (VIP) for the clustered
service remained on the failed node instead of moving to the surviving node.
Key points to include
- Output of `pveha-manager status` on the surviving node.
- Relevant section of `/etc/pve/cluster.xml` showing the VIP configuration.
- Output of `pcs status` (or `pvecluster status`) – note the “offline” state of the
failed node and the VIP still bound to it.
- Impact: External clients cannot reach the clustered service; SLA breach.
- Desired outcome: Move the VIP to the correct node and verify HA failover
works as expected.
How to use this example when raising a ticket
- Copy the description verbatim (or adjust the node names and timestamps).
- Append the bullet list as “Key points to include”.
- Attach the command outputs (
pveha-manager status,cat /etc/pve/cluster.xml,
pcs status).
- State the impact clearly (e.g., “External clients cannot reach the clustered
service; SLA breach”).
- Request a specific resolution (e.g., “Move the VIP to the correct node and
verify HA failover”).
General checklist before submitting
- Verify you have captured all required command outputs (
brctl show,
ip link, lvdisplay, qm start, pcs status, etc.).
- Confirm the affected services and business impact are clearly described.
- Attach any relevant configuration files (
/etc/network/interfaces,
/etc/pve/qemu-server/*.conf, cluster.xml).
- State the desired outcome in a single, unambiguous sentence.
If any of the required artefacts are missing, check the next relevant item in the list above before submitting the ticket.
These examples are intended to help you craft precise, actionable tickets that third‑line engineers can resolve efficiently.
Virtualisation Specific Examples
Example 1 – Proxmox HA fail‑over not triggering
Scenario A Proxmox VE cluster (3 nodes) hosts a critical VM that is configured for HA. After a scheduled maintenance reboot on node02, the VM remained powered‑off on the failed node instead of being automatically started on node01. The cluster status showed the resource still bound to the offline node, causing an SLA breach for external clients.
Key points to include
- Output of
pveha-manager statuson the surviving node (node01). - Relevant section of
/etc/pve/cluster.xmlshowing the VIP configuration. - Output of
pcs status(orpvecluster status) – note the “offline” state of the failed node and the VIP still bound to it. - Impact: External clients cannot reach the clustered service → SLA breach.
- Desired outcome: Move the VIP to the correct node and verify HA failover works as expected.
How to use this example when raising a ticket
- Copy the description verbatim (or adjust node names and timestamps).
- Append the bullet list as “Key points to include”.
- Attach the command outputs (
pveha-manager status,cat /etc/pve/cluster.xml,pcs status). - State the impact clearly (e.g., “External clients cannot reach the clustered service – SLA breach”).
- Request a specific resolution (e.g., “Move the VIP to the correct node and verify HA failover”).
Example 2 – VM fails to start after storage pool expansion
Scenario A new LVM thin‑pool was added to the storage server and the cluster was rescanned. One VM (ID 101) that uses the pool as its primary disk failed to start with the error “No space left on device” despite the pool showing free extents.
Key points to include
lvsoutput showing the thin‑pool size and free space.qm start 101output and the exact error message./etc/pve/storage.cfgsnippet that defines the storage entry.- Impact: Service that depends on VM 101 is unavailable, causing a downstream outage.
- Desired outcome: Correct the storage allocation or adjust the thin‑pool size so the VM can start.
How to use this example when raising a ticket
- Provide the exact command outputs listed above.
- Include the storage configuration snippet.
- Clearly describe the impact on the dependent service.
- Request the resolution: “Resize the thin‑pool or free up additional extents so that VM 101 can start”.
Example 3 – Bridge network mis‑configuration causing VM loss of connectivity
Scenario After applying a network‑interface rename via netplan, the Linux bridge vmbr0 on a Proxmox host was left without any physical ports attached. Consequently, all VMs attached to vmbr0 lost network connectivity, resulting in a loss of remote management access.
Key points to include
brctl showoutput before and after the change./etc/network/interfaces(or/etc/netplan/*.yaml) snippet showing the bridge definition.ip link show vmbr0output indicating “NO Carriers”.- Impact: No remote console or SSH access to affected VMs; troubleshooting is delayed.
- Desired outcome: Restore at least one physical NIC to
vmbr0and verify VMs obtain network connectivity.
How to use this example when raising a ticket
- Attach the
brctl showandip linkoutputs. - Include the relevant network configuration file excerpt.
- State the impact on remote access.
- Request the specific fix: “Add a physical NIC (e.g.,
eth0) tovmbr0and confirm link status”.
Example 4 – Backup job fails with “Permission denied” after user‑role change
Scenario A backup job that uses the built‑in Proxmox backup service started failing with the error “Permission denied (publickey)” after a sysadmin changed the backup user’s SSH key permissions from 600 to 644.
Key points to include
- Output of
pvebackup-manager statuson the backup server. - Relevant line from
/etc/pve/groupsshowing the backup user’s role. ssh -vdebug output highlighting the permission error.- Impact: Nightly backups are not created, risking data loss.
- Desired outcome: Restore correct permissions on the backup user’s private key (
chmod 600) and verify the backup job runs successfully.
How to use this example when raising a ticket
- Provide the command outputs and configuration snippet.
- Clearly state the impact (missing backups).
- Request the precise remediation: “Set the private key permissions to 600 and re‑run the backup job”.
Example 5 – Container fails to start due to missing cgroup subsystem
Scenario A LXC container (ID 202) that runs a critical monitoring service failed to start after a kernel upgrade. The container’s status showed “cgroup2 not mounted”, and the container entered a failed state.
Key points to include
systemctl status lxc-container@202.serviceoutput.mount | grep cgroupoutput missing thecgroup2entry./etc/pve/lxc/202.confsnippet that defines the container’s resources.- Impact: Monitoring service is unavailable, affecting operational visibility.
- Desired outcome: Mount the required cgroup subsystem on the host and restart the container.
How to use this example when raising a ticket
- Attach the status and mount outputs.
- Include the container configuration snippet.
- Describe the impact on the monitoring service.
- Request the resolution: “Mount cgroup2 on the host (e.g., add
cgroup2to/etc/fstabor usesystemd-cgls) and restart container 202”.
General checklist before submitting
- Verify you have captured all required command outputs (
brctl show,ip link,lvdisplay,qm start,pcs status, etc.). - Confirm the affected services and business impact are clearly described.
- Attach any relevant configuration files (
/etc/network/interfaces,/etc/pve/qemu-server/*.conf,cluster.xml). - State the desired outcome in a single, unambiguous sentence.
If any of the required artefacts are missing, check the next relevant item in the list above before submitting the ticket.
These examples are intended to help you craft precise, actionable tickets that third‑line engineers can resolve efficiently.
Virtualisation Specific Examples – How to raise a ticket with a proper description
Below are genuine Proxmox incidents that would be escalated to a third‑line technical support provider. Use them as a template when you open a ticket – include the exact artefacts, impact and desired outcome.
Example 1 – VM fails to start after a host reboot
Scenario A Windows 10 VM (ID 101) does not power on after the node was rebooted for scheduled maintenance. The VM shows “shutdown – unexpected power loss” in the console and remains in a stopped state.
What to include in the ticket
qm start 101output (error message).dmesg | grep -i 101snippet showing a missing virtio‑scsi driver./etc/pve/qemu-server/101.confconfiguration (CPU, memory, storage).- Impact: Critical business application is offline, affecting end‑users.
- Desired outcome: Restore the VM to a running state and confirm it starts cleanly after a host reboot.
Example 2 – Bridge interface mis‑configuration causing VM network loss
Scenario All VMs attached to vmbr0 lose external connectivity after a network change. brctl show reports duplicate MAC addresses and the bridge is down.
What to include in the ticket
brctl showoutput before and after the change.ip link show vmbr0indicating the interface is DOWN./etc/network/interfacessnippet that definesvmbr0.- Impact: Monitoring and user‑facing services are unreachable.
- Desired outcome: Re‑configure the bridge so it comes up, removes duplicate MACs and restores VM network access.
Example 3 – LVM volume not found after adding a new disk
Scenario A new 2 TB disk was added to the storage pool, but the logical volume vgdata-lvbackup could not be extended. lvdisplay reports “Volume group vg_data not found”.
What to include in the ticket
pveam updateandpvsoutput showing the new disk is visible but not part of the volume group.vgsandlvsoutput confirming the missing LV./etc/pve/storage.cfgentry for the new disk.- Impact: Backup jobs cannot write to the intended volume, risking data loss.
- Desired outcome: Add the new disk to
vg_dataand extend the logical volume to the required size.
Example 4 – HA failover does not trigger when the primary node becomes unreachable
Scenario A VM protected by Proxmox HA (resource group web‑cluster) does not migrate to the standby node when the primary node’s network cable is unplugged. pcs status shows the resource still listed as running on the failed node.
What to include in the ticket
pcs statusoutput showing the HA resource status.systemctl status pve-ha-cronindicating no recent health checks.- Cluster configuration snippet (
/etc/pve/cluster.xml) for the HA group. - Impact: Service level agreement breach – users experience downtime.
- Desired outcome: Force a manual failover (
pcs resource move web‑cluster <standby-node>) and verify automatic failover works on subsequent node failure.
Example 5 – Container fails to start due to missing cgroup subsystem
Scenario An LXC container (ID 202) that runs a critical monitoring service fails to start after a kernel upgrade. systemctl status lxc-container@202.service reports “cgroup2 not mounted”.
What to include in the ticket
- Output of
mount | grep cgroup– missingcgroup2entry. - Relevant snippet from
/etc/pve/lxc/202.conf(resource limits, features). - Impact: Monitoring service is unavailable, affecting operational visibility.
- Desired outcome: Mount the required cgroup subsystem on the host (e.g., add
cgroup2to/etc/fstabor usesystemd-cgls) and restart container 202.
General checklist before submitting
- Capture all required command outputs (
brctl show,ip link,lvdisplay,qm start,pcs status, etc.). - Clearly describe the affected services and business impact.
- Attach any relevant configuration files (
/etc/network/interfaces,/etc/pve/qemu-server/*.conf,cluster.xml). - State the desired outcome in a single, unambiguous sentence.
If any of the required artefacts are missing, check the next relevant item in the list above before submitting the ticket.
These examples are intended to help you craft precise, actionable tickets that third‑line engineers can resolve efficiently.
