Troubleshooting
Troubleshooting guide for common problems when setting up and running SREGym
Docker and Kind Issues
Docker Issues
Ensure Docker is running within your WSL2 environment. Verify with docker ps to list running containers.
Troubleshooting:
- Check Docker Desktop is running (if using WSL2)
- Verify Docker daemon is accessible:
docker info - Restart Docker if needed
Cluster Creation Failures
Check that Docker is correctly installed and that your system has enough resources (CPU, memory). Examine the output of kind export logs <cluster-name> for details.
Common causes:
- Insufficient system resources (CPU/memory)
- Docker not running or misconfigured
- Port conflicts
- Incorrect kind configuration file
Solutions:
- Check system resources:
docker stats - Review kind logs:
kind export logs <cluster-name> - Verify kind configuration file references correct image
- Ensure no port conflicts on required ports
Deployment Problems
Use kubectl logs <pod-name> to view pod logs and diagnose application issues. Make sure that your kind-config.yaml file references the correct image.
Troubleshooting steps:
- Check pod status:
kubectl get pods --all-namespaces - View pod logs:
kubectl logs <pod-name> -n <namespace> - Describe pod for events:
kubectl describe pod <pod-name> -n <namespace> - Verify image exists and is accessible
- Check
kind-config.yamlreferences correct image name
Resource Allocation
WSL2 may require additional resources. Adjust the WSL2 settings in your .wslconfig file on Windows if you encounter performance issues.
WSL2 Configuration:
Create or edit .wslconfig in your Windows user directory:
[wsl2]
memory=8GB
processors=4
swap=2GBAfter modifying .wslconfig, restart WSL2:
wsl --shutdownThen restart your WSL2 distribution.
Ansible Issues
Host Key Authentication Errors
If you're running into issues from Ansible related to host key authentication, try typing yes in your terminal for each node, or proceed with the following steps:
You can create a file in the same directory as the Ansible README called ansible.cfg to turn off that warning:
[defaults]
host_key_checking = FalseBe mindful about the security implications of disabling host key checking. Only use this in trusted environments.
General Troubleshooting
Checking Cluster Status
Verify your cluster is running correctly:
kubectl get nodesAll nodes should show Ready status.
Viewing Logs
To diagnose issues with specific components:
# View pod logs
kubectl logs <pod-name>
# View logs for all pods in a namespace
kubectl logs -n <namespace> --all-containers=trueDashboard Access
If the dashboard at http://localhost:11451 is not accessible, common issues:
- Verify the benchmark is running with
python main.py - Check if the port is already in use
- Review the console output for any error messages
