Frequently Asked Question

Virtualisation Specific Examples

Last Updated 3 months ago

Virtualisation Specific Examples

Below are genuine Proxmox incidents that would be escalated to a third‑line technical support provider. Use these as templates when opening tickets – the clearer the description, the faster the resolution.

1. Cluster node loses quorum and VMs fail to start

Symptoms

pvecluster shows “Node X is down” in the GUI.
All VMs on the affected node report “Failed to start – no suitable host”.
pvecm status returns “Quorum: 0/3”.

Ticket description

“Cluster of three Proxmox VE 8 nodes (node01, node02, node03) entered split‑brain. Node01 lost connectivity to the shared storage (Ceph RBD pool) and the cluster quorum dropped to 0. All VMs on node01 stopped with ‘no suitable host’ errors. Request assistance to restore quorum and bring VMs back online.”

2. VM hangs on suspend/resume after a host reboot

Symptoms

VM defined with “Suspend when host shuts down” enabled.
After a scheduled host reboot, the VM remains stuck at “paused” and cannot be resumed.
vzdump shows the VM’s disk as “locked”.

Ticket description

“VM 101 (Debian 12) configured with ‘Suspend on host shutdown’ hangs on resume after host reboot. The VM shows status ‘paused’ and cannot be started or resumed. Disk appears locked. Need assistance to recover the VM and prevent recurrence.”

3. Unexpected VM panic with no clear log entry

Symptoms

VM abruptly powers off with “Kernel panic - not syncing” on console.
No corresponding entry in /var/log/syslog on the host.
Host logs only show “VM 202: power state changed to off”.

Ticket description

“VM 202 (Windows Server 2022) experienced an unexplained kernel panic; the VM powered off without any panic message in the host logs. Request investigation into possible hardware or configuration cause and guidance on reproducing the issue.”

4. Backup job fails with “No space left on device” despite sufficient storage

Symptoms

Proxmox backup job ( vzdump ) exits with error “No space left on device”.
df -h on the backup storage shows >30 % free.
The backup repository is a ZFS dataset with compression enabled.

Ticket description

“Scheduled backup of VM 305 to ZFS repository ‘rpool/backups’ fails with ‘No space left on device’ even though df -h shows ample free space. Need troubleshooting of ZFS quota/compression settings and resolution.”

5. HA resource group fails to migrate due to “Node not authorized”

Symptoms

Attempt to move a VM from node01 to node02 via HA group results in error “Node not authorized”.
pvecm add <node2‑IP> was run, but the token expired.
Cluster config shows mismatched cluster_name.

Ticket description

“HA group migration from node01 to node02 fails with ‘Node not authorized’. The cluster token on node02 expired after a recent network outage, causing a mismatch in cluster_name. Assistance required to re‑establish token and complete migration.”

6. Container fails to start after host kernel upgrade

Symptoms

LXC container 101 crashes immediately after host kernel update to 6.5.
lxc-start -n 101 returns “Failed to mount cgroup2”.
dmesg shows “cgroup2: unknown filesystem type ‘cgroup2’”.

Ticket description

“LXC container 101 fails to start after upgrading Proxmox VE kernel to 6.5. The container crashes with ‘Failed to mount cgroup2’ and dmesg reports unknown cgroup2 filesystem. Need assistance to restore container operation post‑upgrade.”

7. VM network bridge loses connectivity after NIC teaming change

Symptoms

Physical NIC eno1 is part of a bonding interface (mode 802.3ad).
After adding eno2 to the bond, VMs on bridge vmbr0 cannot obtain DHCP leases.
bridge link shows the bridge still up, but tcpdump on eno2 shows no traffic.

Ticket description

“After adding NIC eno2 to an existing 802.3ad bond (eno1+eno2) for increased bandwidth, VMs on bridge vmbr0 lose network connectivity and cannot obtain DHCP leases. Bridge appears up but traffic on eno2 is not forwarded. Assistance required to correct bonding configuration and restore VM networking.”

8. Proxmox VE fails to boot after ZFS pool import error

Symptoms

System hangs at “Importing ZFS pool ‘rpool’… failed”.
Boot proceeds to initramfs prompt.
zpool status shows pool offline due to missing device.

Ticket description

“Proxmox VE 8 server does not boot; ZFS pool ‘rpool’ fails to import with ‘device not found’. System stops at initramfs prompt. Need help diagnosing missing ZFS device and restoring boot.”

9. High availability failover takes >30 minutes, causing extended downtime

Symptoms

Primary node experiences network partition; HA group should failover to secondary.
Monitoring shows VMs remain down for >30 minutes before secondary takes over.
pveha-manager logs indicate “resource lock timeout”.

Ticket description

“During a network partition, HA failover from primary to secondary node took over 30 minutes, resulting in prolonged VM downtime. pveha-manager logs show ‘resource lock timeout’. Assistance required to tune HA parameters and reduce failover time.”

10. VM snapshot fails with “Permission denied” on read‑only disk

Symptoms

Attempt to take a snapshot of VM 402 fails with error “Permission denied (read-only file system)”.
Disk is marked as “readonly” in the VM config.
Host file system shows no errors.

Ticket description

“Snapshot operation on VM 402 fails with ‘Permission denied (read-only file system)’ despite the VM’s disk being configured as read‑only. Need guidance on correcting the permission issue or alternative method to snapshot the VM.”

Virtualisation Specific Examples

Below are real‑world incidents that would normally be escalated to a third‑line technical support team. Each example shows the exact wording you can use when opening a ticket, so the support engineers have all the information they need to reproduce and resolve the problem.

1. Bridge appears up but traffic on eno2 is not forwarded

Ticket description

“The Linux bridge vmbr0 shows as UP in brctl show, but traffic arriving on the physical interface eno2 is not being forwarded to the VMs attached to the bridge. tcpdump on the bridge captures packets, yet the VMs do not receive them. Assistance required to verify the bonding/bridge configuration and restore normal networking.”

Key points to include

Output of brctl show and ip addr show vmbr0
Output of cat /proc/net/dev for eno2
Contents of /etc/network/interfaces (or Netplan YAML) for the bridge and physical NIC
Result of ethtool -i eno2 and any bonding mode configured
Whether the bridge is part of a bond or a VLAN trunk

2. Proxmox VE fails to boot after ZFS pool import error

Ticket description

“Proxmox VE 8 server hangs at the message ‘Importing ZFS pool ‘rpool’… failed’ and drops to an initramfs prompt. Running zpool status from the prompt shows the pool is offline because the underlying device is missing. Need help diagnosing the missing ZFS device and restoring the boot process.”

Key points to include

Exact boot log excerpt (e.g., dmesg | grep -i zfs)
Output of zpool status -v
Contents of /etc/fstab and /etc/default/grub (ZFS options)
Whether the pool is mirrored, RAID‑Z, or single‑disk
Any recent hardware changes or disk replacements

3. High‑availability failover takes >30 minutes, causing extended downtime

Ticket description

“During a simulated network partition, the HA group should have failed over from the primary node to the secondary node within a few seconds. Instead, VMs remained down for over 30 minutes before the secondary node took over. pveha-manager logs contain the entry ‘resource lock timeout’. Assistance required to tune HA parameters (e.g., ha-manager, ha-cpu, ha-storage) and reduce the failover time.”

Key points to include

Output of pveha-manager status on both nodes
Configuration of the HA group (resource IDs, thresholds)
Current values of ha-manager settings in /etc/pve/ha-manager.conf
Timestamps from /var/log/pve-ha-manager.log showing the timeout
Whether the issue occurs only on specific VMs or across the whole cluster

4. VM snapshot fails with “Permission denied” on read‑only disk

Ticket description

“Attempting to take a snapshot of VM 402 fails with the error ‘Permission denied (read‑only file system)’. The VM’s disk is explicitly marked as read‑only in the VM configuration, yet the host file system shows no errors. Need guidance on correcting the permission issue or an alternative method to snapshot the VM.”

Key points to include

Full command used (qm snapshot 402 ...) and its output
The relevant section of the VM’s config file (/etc/pve/qemu-server/402.disk0.conf) showing readonly=1
Output of ls -l /var/lib/vz/images/... for the VM’s disk files
Whether the host’s storage backend is Ceph, LVM, ZFS, or NFS
Any recent changes to storage permissions or ACLs

How to use these examples when raising a ticket

Copy the “Ticket description” verbatim (or adapt it to your exact situation).
Add the “Key points to include” as separate bullet items in the ticket body.
Attach relevant command outputs (e.g., brctl show, zpool status, logs).
Mention the impact (e.g., “Network traffic loss for all VMs on bridge vmbr0”, “Extended downtime of critical service”).
Specify the desired outcome (e.g., “Restore bridge forwarding”, “Recover boot”, “Reduce HA failover time”, “Enable successful snapshots”).

By providing this level of detail, third‑line engineers can quickly pinpoint the root cause and apply the appropriate fix.

If any of the required outputs or configuration files are missing, check the next relevant item in the list above before submitting the ticket.

Virtualisation Specific Examples

1. Bridge forwarding broken after network change

Ticket description

“The VMs on bridge vmbr0 are no longer receiving network traffic after a recent host network re‑configuration. All VMs attached to the bridge show ‘no link’ and cannot obtain IP addresses.”

Key points to include

Full command used to create/modify the bridge (e.g., brctl show, ip link add name vmbr0 type bridge).
Output of brctl show before and after the change.
Relevant section of /etc/network/interfaces or Netplan YAML showing the bridge definition and member ports.
Whether any VLANs or VXLANs were added/removed.
Impact: loss of connectivity for critical services (e.g., monitoring, web servers).
Desired outcome: restore forwarding so VMs can communicate normally.

2. VM fails to boot after storage migration

Ticket description

“After migrating the storage of VM 105 from an LVM volume to a Ceph RBD image, the VM will not boot and stays at the GRUB rescue prompt.”

Key points to include

Command used for migration (qm migrate 105 ... or rbd export).
Output of the migration command and any warnings.
Content of /etc/pve/qemu-server/105.disk0.conf showing the new disk0 path pointing to the RBD device.
Result of ls -l /var/lib/vz/images/... for the migrated image.
Storage backend: Ceph (RBD) or LVM.
Recent changes to Ceph pool settings or LVM flags.
Impact: service outage for the application running in VM 105.
Desired outcome: restore normal boot of the VM.

3. HA failover takes longer than SLA

Ticket description

“During a simulated node failure, the HA manager took 12 minutes to migrate VM 219 to the standby node, exceeding our 5‑minute SLA.”

Key points to include

Command or UI action that triggered the failover (e.g., pvecluster status, ha-manager status).
Log excerpt from /var/log/pve-ha-manager.log showing the start and end timestamps.
Current HA configuration for VM 219 (e.g., ha: 1, max_restart = 2).
Resource usage on the target node at the time of failover (CPU, memory, storage).
Storage backend: shared storage (Ceph, NFS, or iSCSI).
Any recent changes to network latency or firewall rules.
Impact: prolonged service interruption beyond acceptable limits.
Desired outcome: reduce failover time to ≤ 5 minutes.

4. VM snapshot fails with “Permission denied” on read‑only disk

Ticket description

“Attempting to take a snapshot of VM 402 fails with the error ‘Permission denied (read‑only file system)’. The VM’s disk is explicitly marked as readonly=1 in the VM configuration, yet the host file system shows no errors.”

Key points to include

Full command used (qm snapshot 402 -snapshotname backup1).
Output of the command, highlighting the permission error.
Relevant section of /etc/pve/qemu-server/402.disk0.conf showing readonly=1.
Output of ls -l /var/lib/vz/images/... for the VM’s disk files (permissions, ownership).
Storage backend type (Ceph, LVM, ZFS, NFS).
Any recent changes to storage permissions, ACLs, or Ceph pool settings.
Impact: inability to create backups or perform maintenance on the VM.
Desired outcome: either remove the read‑only flag safely or provide an alternative snapshot method.

How to use these examples when raising a ticket

Copy the “Ticket description” verbatim (or adapt it to your exact situation).
Add the “Key points to include” as separate bullet items in the ticket body.
Attach relevant command outputs (e.g., brctl show, qm snapshot, ls -l, cat /etc/pve/...conf).
Mention the impact (e.g., “Network traffic loss for all VMs on bridge vmbr0”, “Extended downtime of critical service”).
Specify the desired outcome (e.g., “Restore bridge forwarding”, “Recover boot”, “Reduce HA failover time”, “Enable successful snapshots”).

Providing this level of detail enables third‑line engineers to quickly pinpoint the root cause and apply the appropriate fix.

If any of the required outputs or configuration files are missing, check the next relevant item in the list above before submitting the ticket.

Virtualisation Specific Examples

Example 1 – Proxmox VM snapshot fails with “Permission denied (read‑only file system)”

Ticket description (copy‑and‑paste into the ticket body)

When attempting to take a snapshot of VM 402 the command fails with the error
“Permission denied (read‑only file system)”. The VM’s disk is explicitly marked
as readonly=1 in the VM configuration, yet the host file system shows no errors.

Key points to include
- Full command used: qm snapshot 402 -snapshotname backup1
- Output of the command, highlighting the permission error
- Relevant section of /etc/pve/qemu-server/402.disk0.conf showing readonly=1
- Output of ls -l /var/lib/vz/images/... for the VM’s disk files (permissions,
  ownership)
- Storage backend type (Ceph, LVM, ZFS, NFS)
- Any recent changes to storage permissions, ACLs, or Ceph pool settings
- Impact: inability to create backups or perform maintenance on the VM
- Desired outcome: either remove the read‑only flag safely or provide an
  alternative snapshot method

How to use this example when raising a ticket

Copy the “Ticket description” verbatim (or adapt it to your exact situation).
Add the “Key points to include” as separate bullet items in the ticket body.
Attach relevant command outputs (e.g., qm snapshot, ls -l, cat /etc/pve/...conf).
Mention the impact (e.g., “Extended downtime of critical service”).
Specify the desired outcome (e.g., “Enable successful snapshots”).

Example 2 – Bridge interface vmbr0 loses forwarding after network change

Ticket description

The bridge interface vmbr0 on host node01 has stopped forwarding traffic.
All VMs attached to vmbr0 lose network connectivity. The issue appeared after
changing the underlying physical NIC from eth0 to eno1 and updating the
/etc/network/interfaces file accordingly.

Key points to include
- Full command used: ip link set dev eno1 master vmbr0
- Output of `brctl show` before and after the change
- Relevant section of /etc/network/interfaces showing the bridge definition
- Output of `iptables -L -v -n` (no unexpected DROP rules)
- Storage backend type (N/A – pure networking)
- Any recent changes to firewall or VLAN tagging
- Impact: loss of network access for all VMs on that bridge
- Desired outcome: restore forwarding on vmbr0 and prevent recurrence

How to use this example when raising a ticket

Paste the ticket description into the ticket form.
Insert the bullet list under a “Key points to include” heading.
Attach screenshots or command output for brctl show, ip link, and

/etc/network/interfaces.

State the impact clearly (e.g., “All services on the affected VLAN are

unreachable”).

Request a specific resolution (e.g., “Re‑apply bridge ports correctly”).

Example 3 – VM fails to boot after host OS upgrade

Ticket description

VM 127 does not power on after the recent upgrade from Debian 11 to Debian 12
on the host node. The VM console shows “Failed to mount root filesystem”.
The VM’s disk is stored on an LVM volume that was migrated to a new VG during
the upgrade.

Key points to include
- Full command used: qm start 127
- Output of the command, highlighting the mount error
- Relevant section of /etc/pve/qemu-server/127.disk0.conf showing the disk
  path (e.g., /dev/vg_new/lv_root)
- Output of `lvdisplay` for the affected logical volume (VG name, LV name,
  attributes)
- Storage backend type: LVM on local disk
- Any recent changes to LVM configuration or fstab inside the VM
- Impact: extended downtime of a critical business application
- Desired outcome: restore ability to boot the VM or provide a migration
  path to a compatible disk layout

How to use this example when raising a ticket

Insert the description verbatim (or adapt it).
Add the bullet list as “Key points to include”.
Attach the command outputs (qm start, lvdisplay, cat /etc/pve/...conf).
Clearly state the impact (e.g., “Service outage affecting order processing”).
Request the desired outcome (e.g., “Recover VM boot” or “Provide migration

instructions”).

If any of the required outputs or configuration files are missing, check the next relevant item in the list above before submitting the ticket.

Virtualisation Specific Examples

Below are three genuine incidents that would normally be escalated to a third‑line technical support provider. Use them as templates when opening tickets – copy the wording, add the required details, and attach the requested artefacts.

1. Bridge interface mis‑configuration causing loss of connectivity

Ticket description

Host: pve-node01 (Proxmox VE 8.2)  
Bridge: vmbr0 (used by VMs on VLAN 10)  

Issue: After a recent network change, all VMs attached to vmbr0 lost network
connectivity. The bridge still shows as “up” but no traffic passes.

Key points to include
- Output of `brctl show` (or `bridge link`) – note any ports listed as “blocked”
  or missing.
- Output of `ip link show vmbr0` – check state and assigned IP.
- Content of `/etc/network/interfaces` – the stanza defining vmbr0.
- Impact: All services on VLAN 10 are unreachable; critical monitoring tools
  cannot poll the host.
- Desired outcome: Re‑apply bridge ports correctly and restore connectivity.

How to use this example when raising a ticket

Paste the description verbatim (or adapt the host/bridge names).
Add the bullet list as “Key points to include”.
Attach the command outputs (brctl show, ip link show vmbr0, cat /etc/network/interfaces).
State the impact clearly (e.g., “All services on the affected VLAN are unreachable”).
Request a specific resolution (e.g., “Re‑apply bridge ports correctly”).

2. VM fails to boot after host OS upgrade

Ticket description

VM 127 does not power on after the recent upgrade from Debian 11 to Debian 12
on the host node. The VM console shows “Failed to mount root filesystem”.
The VM’s disk is stored on an LVM volume that was migrated to a new VG during
the upgrade.

Key points to include
- Full command used: `qm start 127`
- Output of the command, highlighting the mount error.
- Relevant section of `/etc/pve/qemu-server/127.disk0.conf` showing the disk
  path (e.g., `/dev/vg_new/lv_root`).
- Output of `lvdisplay` for the affected logical volume (VG name, LV name,
  attributes).
- Storage backend type: LVM on local disk.
- Any recent changes to LVM configuration or `/etc/fstab` inside the VM.
- Impact: Extended downtime of a critical business application.
- Desired outcome: Restore ability to boot the VM or provide a migration path
  to a compatible disk layout.

How to use this example when raising a ticket

Insert the description verbatim (or adapt the VM ID and host details).
Add the bullet list as “Key points to include”.
Attach the command outputs (qm start, lvdisplay, cat /etc/pve/qemu-server/127.disk0.conf).
Clearly state the impact (e.g., “Service outage affecting order processing”).
Request the desired outcome (e.g., “Recover VM boot” or “Provide migration

instructions”).

3. Unexpected loss of HA resource after node failure

Ticket description

Node: pve-node02 (Proxmox VE 8.2) – part of a 3‑node HA cluster.  
Event: Node lost power at 14:32 UTC; HA manager automatically fenced the node.  
After the node came back online, the virtual IP (VIP) for the clustered
service remained on the failed node instead of moving to the surviving node.

Key points to include
- Output of `pveha-manager status` on the surviving node.
- Relevant section of `/etc/pve/cluster.xml` showing the VIP configuration.
- Output of `pcs status` (or `pvecluster status`) – note the “offline” state of the
  failed node and the VIP still bound to it.
- Impact: External clients cannot reach the clustered service; SLA breach.
- Desired outcome: Move the VIP to the correct node and verify HA failover
  works as expected.

How to use this example when raising a ticket

Copy the description verbatim (or adjust the node names and timestamps).
Append the bullet list as “Key points to include”.
Attach the command outputs (pveha-manager status, cat /etc/pve/cluster.xml,

pcs status).

State the impact clearly (e.g., “External clients cannot reach the clustered

service; SLA breach”).

Request a specific resolution (e.g., “Move the VIP to the correct node and

verify HA failover”).

General checklist before submitting

Verify you have captured all required command outputs (brctl show,

ip link, lvdisplay, qm start, pcs status, etc.).

Confirm the affected services and business impact are clearly described.
Attach any relevant configuration files (/etc/network/interfaces,

/etc/pve/qemu-server/*.conf, cluster.xml).

State the desired outcome in a single, unambiguous sentence.

If any of the required artefacts are missing, check the next relevant item in the list above before submitting the ticket.

These examples are intended to help you craft precise, actionable tickets that third‑line engineers can resolve efficiently.

Virtualisation Specific Examples

Example 1 – Proxmox HA fail‑over not triggering

Scenario A Proxmox VE cluster (3 nodes) hosts a critical VM that is configured for HA. After a scheduled maintenance reboot on node02, the VM remained powered‑off on the failed node instead of being automatically started on node01. The cluster status showed the resource still bound to the offline node, causing an SLA breach for external clients.

Key points to include

Output of pveha-manager status on the surviving node (node01).
Relevant section of /etc/pve/cluster.xml showing the VIP configuration.
Output of pcs status (or pvecluster status) – note the “offline” state of the failed node and the VIP still bound to it.
Impact: External clients cannot reach the clustered service → SLA breach.
Desired outcome: Move the VIP to the correct node and verify HA failover works as expected.

How to use this example when raising a ticket

Copy the description verbatim (or adjust node names and timestamps).
Append the bullet list as “Key points to include”.
Attach the command outputs (pveha-manager status, cat /etc/pve/cluster.xml, pcs status).
State the impact clearly (e.g., “External clients cannot reach the clustered service – SLA breach”).
Request a specific resolution (e.g., “Move the VIP to the correct node and verify HA failover”).

Example 2 – VM fails to start after storage pool expansion

Scenario A new LVM thin‑pool was added to the storage server and the cluster was rescanned. One VM (ID 101) that uses the pool as its primary disk failed to start with the error “No space left on device” despite the pool showing free extents.

Key points to include

lvs output showing the thin‑pool size and free space.
qm start 101 output and the exact error message.
/etc/pve/storage.cfg snippet that defines the storage entry.
Impact: Service that depends on VM 101 is unavailable, causing a downstream outage.
Desired outcome: Correct the storage allocation or adjust the thin‑pool size so the VM can start.

How to use this example when raising a ticket

Provide the exact command outputs listed above.
Include the storage configuration snippet.
Clearly describe the impact on the dependent service.
Request the resolution: “Resize the thin‑pool or free up additional extents so that VM 101 can start”.

Example 3 – Bridge network mis‑configuration causing VM loss of connectivity

Scenario After applying a network‑interface rename via netplan, the Linux bridge vmbr0 on a Proxmox host was left without any physical ports attached. Consequently, all VMs attached to vmbr0 lost network connectivity, resulting in a loss of remote management access.

Key points to include

brctl show output before and after the change.
/etc/network/interfaces (or /etc/netplan/*.yaml) snippet showing the bridge definition.
ip link show vmbr0 output indicating “NO Carriers”.
Impact: No remote console or SSH access to affected VMs; troubleshooting is delayed.
Desired outcome: Restore at least one physical NIC to vmbr0 and verify VMs obtain network connectivity.

How to use this example when raising a ticket

Attach the brctl show and ip link outputs.
Include the relevant network configuration file excerpt.
State the impact on remote access.
Request the specific fix: “Add a physical NIC (e.g., eth0) to vmbr0 and confirm link status”.

Example 4 – Backup job fails with “Permission denied” after user‑role change

Scenario A backup job that uses the built‑in Proxmox backup service started failing with the error “Permission denied (publickey)” after a sysadmin changed the backup user’s SSH key permissions from 600 to 644.

Key points to include

Output of pvebackup-manager status on the backup server.
Relevant line from /etc/pve/groups showing the backup user’s role.
ssh -v debug output highlighting the permission error.
Impact: Nightly backups are not created, risking data loss.
Desired outcome: Restore correct permissions on the backup user’s private key (chmod 600) and verify the backup job runs successfully.

How to use this example when raising a ticket

Provide the command outputs and configuration snippet.
Clearly state the impact (missing backups).
Request the precise remediation: “Set the private key permissions to 600 and re‑run the backup job”.

Example 5 – Container fails to start due to missing cgroup subsystem

Scenario A LXC container (ID 202) that runs a critical monitoring service failed to start after a kernel upgrade. The container’s status showed “cgroup2 not mounted”, and the container entered a failed state.

Key points to include

systemctl status lxc-container@202.service output.
mount | grep cgroup output missing the cgroup2 entry.
/etc/pve/lxc/202.conf snippet that defines the container’s resources.
Impact: Monitoring service is unavailable, affecting operational visibility.
Desired outcome: Mount the required cgroup subsystem on the host and restart the container.

How to use this example when raising a ticket

Attach the status and mount outputs.
Include the container configuration snippet.
Describe the impact on the monitoring service.
Request the resolution: “Mount cgroup2 on the host (e.g., add cgroup2 to /etc/fstab or use systemd-cgls) and restart container 202”.

General checklist before submitting

Verify you have captured all required command outputs (brctl show, ip link, lvdisplay, qm start, pcs status, etc.).
Confirm the affected services and business impact are clearly described.
Attach any relevant configuration files (/etc/network/interfaces, /etc/pve/qemu-server/*.conf, cluster.xml).
State the desired outcome in a single, unambiguous sentence.

If any of the required artefacts are missing, check the next relevant item in the list above before submitting the ticket.

These examples are intended to help you craft precise, actionable tickets that third‑line engineers can resolve efficiently.

Virtualisation Specific Examples – How to raise a ticket with a proper description

Below are genuine Proxmox incidents that would be escalated to a third‑line technical support provider. Use them as a template when you open a ticket – include the exact artefacts, impact and desired outcome.

Example 1 – VM fails to start after a host reboot

Scenario A Windows 10 VM (ID 101) does not power on after the node was rebooted for scheduled maintenance. The VM shows “shutdown – unexpected power loss” in the console and remains in a stopped state.

What to include in the ticket

qm start 101 output (error message).
dmesg | grep -i 101 snippet showing a missing virtio‑scsi driver.
/etc/pve/qemu-server/101.conf configuration (CPU, memory, storage).
Impact: Critical business application is offline, affecting end‑users.
Desired outcome: Restore the VM to a running state and confirm it starts cleanly after a host reboot.

Example 2 – Bridge interface mis‑configuration causing VM network loss

Scenario All VMs attached to vmbr0 lose external connectivity after a network change. brctl show reports duplicate MAC addresses and the bridge is down.

What to include in the ticket

brctl show output before and after the change.
ip link show vmbr0 indicating the interface is DOWN.
/etc/network/interfaces snippet that defines vmbr0.
Impact: Monitoring and user‑facing services are unreachable.
Desired outcome: Re‑configure the bridge so it comes up, removes duplicate MACs and restores VM network access.

Example 3 – LVM volume not found after adding a new disk

Scenario A new 2 TB disk was added to the storage pool, but the logical volume vgdata-lvbackup could not be extended. lvdisplay reports “Volume group vg_data not found”.

What to include in the ticket

pveam update and pvs output showing the new disk is visible but not part of the volume group.
vgs and lvs output confirming the missing LV.
/etc/pve/storage.cfg entry for the new disk.
Impact: Backup jobs cannot write to the intended volume, risking data loss.
Desired outcome: Add the new disk to vg_data and extend the logical volume to the required size.

Example 4 – HA failover does not trigger when the primary node becomes unreachable

Scenario A VM protected by Proxmox HA (resource group web‑cluster) does not migrate to the standby node when the primary node’s network cable is unplugged. pcs status shows the resource still listed as running on the failed node.

What to include in the ticket

pcs status output showing the HA resource status.
systemctl status pve-ha-cron indicating no recent health checks.
Cluster configuration snippet (/etc/pve/cluster.xml) for the HA group.
Impact: Service level agreement breach – users experience downtime.
Desired outcome: Force a manual failover (pcs resource move web‑cluster <standby-node>) and verify automatic failover works on subsequent node failure.

Example 5 – Container fails to start due to missing cgroup subsystem

Scenario An LXC container (ID 202) that runs a critical monitoring service fails to start after a kernel upgrade. systemctl status lxc-container@202.service reports “cgroup2 not mounted”.

What to include in the ticket

Output of mount | grep cgroup – missing cgroup2 entry.
Relevant snippet from /etc/pve/lxc/202.conf (resource limits, features).
Impact: Monitoring service is unavailable, affecting operational visibility.
Desired outcome: Mount the required cgroup subsystem on the host (e.g., add cgroup2 to /etc/fstab or use systemd-cgls) and restart container 202.

General checklist before submitting

Capture all required command outputs (brctl show, ip link, lvdisplay, qm start, pcs status, etc.).
Clearly describe the affected services and business impact.
Attach any relevant configuration files (/etc/network/interfaces, /etc/pve/qemu-server/*.conf, cluster.xml).
State the desired outcome in a single, unambiguous sentence.

If any of the required artefacts are missing, check the next relevant item in the list above before submitting the ticket.

These examples are intended to help you craft precise, actionable tickets that third‑line engineers can resolve efficiently.

This FAQ was generated and/or edited by GAIN, GENs Artificial Intelligence Network and should not be considered 100% accurate. Always check facts and do your research, things change all the time. If you are unsure about any information provided, please raise a support ticket for clarification.

Frequently Asked Question

Virtualisation Specific Examples

Virtualisation Specific Examples

1. Cluster node loses quorum and VMs fail to start

2. VM hangs on suspend/resume after a host reboot

3. Unexpected VM panic with no clear log entry

4. Backup job fails with “No space left on device” despite sufficient storage

5. HA resource group fails to migrate due to “Node not authorized”

6. Container fails to start after host kernel upgrade

7. VM network bridge loses connectivity after NIC teaming change

8. Proxmox VE fails to boot after ZFS pool import error

9. High availability failover takes >30 minutes, causing extended downtime

10. VM snapshot fails with “Permission denied” on read‑only disk

Virtualisation Specific Examples

1. Bridge appears up but traffic on eno2 is not forwarded

2. Proxmox VE fails to boot after ZFS pool import error

3. High‑availability failover takes >30 minutes, causing extended downtime

4. VM snapshot fails with “Permission denied” on read‑only disk

How to use these examples when raising a ticket

Virtualisation Specific Examples

1. Bridge forwarding broken after network change

2. VM fails to boot after storage migration

3. HA failover takes longer than SLA

4. VM snapshot fails with “Permission denied” on read‑only disk

How to use these examples when raising a ticket

Virtualisation Specific Examples

Example 1 – Proxmox VM snapshot fails with “Permission denied (read‑only file system)”

Example 2 – Bridge interface vmbr0 loses forwarding after network change

Example 3 – VM fails to boot after host OS upgrade

Virtualisation Specific Examples

1. Bridge interface mis‑configuration causing loss of connectivity

2. VM fails to boot after host OS upgrade

3. Unexpected loss of HA resource after node failure

General checklist before submitting

Virtualisation Specific Examples

Example 1 – Proxmox HA fail‑over not triggering

Example 2 – VM fails to start after storage pool expansion

Example 3 – Bridge network mis‑configuration causing VM loss of connectivity

Example 4 – Backup job fails with “Permission denied” after user‑role change

Example 5 – Container fails to start due to missing cgroup subsystem

General checklist before submitting

Virtualisation Specific Examples – How to raise a ticket with a proper description

Example 1 – VM fails to start after a host reboot

Example 2 – Bridge interface mis‑configuration causing VM network loss

Example 3 – LVM volume not found after adding a new disk

Example 4 – HA failover does not trigger when the primary node becomes unreachable

Example 5 – Container fails to start due to missing cgroup subsystem

General checklist before submitting

Loading ...

9. High availability failover takes >30 minutes, causing extended downtime

3. High‑availability failover takes >30 minutes, causing extended downtime

Example 1 – VM fails to start after a host reboot

Example 2 – Bridge interface mis‑configuration causing VM network loss

Example 3 – LVM volume not found after adding a new disk

Example 4 – HA failover does not trigger when the primary node becomes unreachable

Example 5 – Container fails to start due to missing cgroup subsystem