// enterprise kubernetes · projects · SCCs · operators · imagestreams · gitops · senior → principal
project-request template controls what gets bootstrapped. Users cannot create bare namespaces — oc new-project enforces the template, ensuring consistent baseline security across all tenants.
restricted (default, safest) to privileged (no restrictions).
:latest, :3.1.0) and can trigger automatic rollouts when an upstream image updates. BuildConfig defines how to build images: Source-to-Image (S2I) injects source code into a builder image without writing a Dockerfile; Docker strategy uses a Dockerfile; Pipeline strategy delegates to a Tekton pipeline. ImageStreams decouple your Deployment from a specific registry URL, enabling promotion across environments by pointing the stream tag to a different digest.
etcdctl snapshot save) is the primary disaster recovery mechanism. etcd performance (fsync latency) directly affects API server responsiveness — slow etcd disk is a common root cause of API timeouts under load.
m6i.xlarge minimum; workers are configurable. Storage defaults to EBS via the EBS CSI driver (gp3 StorageClass). ROSA (Red Hat OpenShift Service on AWS): fully managed OCP where Red Hat operates the control plane. You pay per node-hour plus the OCP subscription. ROSA uses AWS STS for IAM (no long-lived credentials on the cluster). Red Hat handles upgrades, etcd backups, and control plane incidents. ROSA HCP (Hosted Control Plane) is the newer architecture — control plane runs as pods in a Red Hat-managed cluster, provisioning in under 15 minutes.
restricted SCC prevents pods from running as root or a specific UID. Many upstream images (databases, tools) assume they can run as root or UID 0. The fix is granting the pod's service account the anyuid SCC — but do this selectively, not cluster-wide. Always check the image's required UID range first; many images can be configured to use arbitrary UIDs with a simple env var.
DeploymentConfig is an OCP-specific resource predating K8s Deployment. It was deprecated in OCP 4.14 and will be removed. DC offered image change triggers and lifecycle hooks that K8s Deployment didn't have, but those gaps are now filled by OpenShift GitOps, Tekton, and ImageStream triggers on Deployments. Migrate DCs to Deployments before upgrading past 4.14.
oc adm upgrade. Rushing an upgrade without checking the graph wastes time and risks cluster instability.
LimitRange in a namespace, pods with no resource requests are assigned BestEffort QoS class — the first to be evicted under node pressure. ResourceQuotas without LimitRanges are also ineffective: a quota on CPU requires every pod to declare CPU requests, but without a LimitRange default, pods that omit requests are rejected. Always pair ResourceQuota with a LimitRange.
kubectl works for standard K8s resources, but OCP-specific resources (Routes, BuildConfigs, ImageStreams, SCCs, Projects, MachineConfigs) require oc. More critically, oc enforces OCP security defaults that kubectl may bypass — for example, kubectl create namespace skips the ProjectRequest template and creates a raw namespace without RBAC bootstrapping. Use oc as your primary CLI in OCP environments.
| Namespace | Project (wraps namespace, adds RBAC bootstrap and metadata) |
| Ingress | Route (HAProxy-backed, TLS termination modes, weighted routing) |
| Deployment | Deployment (preferred) or DeploymentConfig (deprecated in 4.14) |
| PodSecurityAdmission | Security Context Constraint (SCC) — more granular |
| ImagePullSecret | ImageStream + integrated registry pull-through |
| Helm chart | Operator (for stateful apps) or Helm (both supported) |
| kubectl | oc (superset of kubectl, required for OCP-specific resources) |
| Node OS management | MachineConfig + MCO (declarative, rolling reboots) |
| Cluster upgrades | Cluster Version Operator (CVO) — operator-managed, sequential |
| restricted-v2 | Default in OCP 4.11+. No root, no privilege escalation, drops all capabilities, seccomp enforced. |
| restricted | Legacy default (pre-4.11). No root, arbitrary UID from namespace range. Use restricted-v2 for new workloads. |
| nonroot | Pod must run as non-root UID. No specific UID constraint. Less strict than restricted. |
| anyuid | Allows any UID including root. Required for images that assume root. Grant selectively to service accounts. |
| privileged | No restrictions — host network, host PID, all capabilities. Reserved for trusted system workloads only. |
| hostnetwork | Allows host network and host ports. Used for network infrastructure pods (CNI, monitoring agents). |
| node-exporter | OCP-specific for Prometheus node exporter. Host PID + network access. |
| Available | Operator is running and functional. Normal state. |
| Progressing | Operator is rolling out a change (upgrade in progress). |
| Degraded | Operator has a problem. OCP blocks cluster upgrades until resolved. |
| oc get co | Check all cluster operator statuses — first command in any troubleshooting session. |
| Install method (IPI) | openshift-install creates all AWS infra automatically. Requires AWS credentials with broad IAM permissions during install only. |
| Master node sizing | Minimum m6i.xlarge (4 vCPU / 16 GB). Recommend m6i.2xlarge for etcd I/O headroom. Never use burstable (T-series) for masters — etcd requires consistent disk performance. |
| Worker node sizing | Depends on workload. m6i.xlarge–2xlarge for general; r6i for memory-heavy (JVM, databases); c6i for CPU-intensive. Spot instances viable for stateless workers with Cluster Autoscaler. |
| Default storage class | gp3-csi (EBS gp3 via EBS CSI driver). 3000 IOPS / 125 MBps baseline. Increase IOPS for etcd disks and database PVCs. Set reclaimPolicy: Retain for production PVCs. |
| IAM — IPI | Cloud Credential Operator creates per-component IAM users (not roles) by default. Use STS mode (--credentials-mode=Manual) for short-lived tokens — required for FIPS/FedRAMP compliance. |
| IAM — ROSA | ROSA uses AWS STS exclusively — no long-lived IAM credentials on the cluster. Operator IAM roles are created per-cluster via rosa CLI during install. |
| Ingress / Load Balancer | IPI creates a Classic ELB for the default Ingress Controller. Prefer NLB (annotation: service.beta.kubernetes.io/aws-load-balancer-type: nlb) for better performance and TLS passthrough. |
| Private cluster | Set publish: Internal in install-config.yaml to create a private cluster (API + ingress on internal ELBs only). Requires VPN or Direct Connect to access. Common for regulated workloads. |
| ROSA vs IPI cost | ROSA: per node-hour charge + OCP subscription. IPI: EC2 + EBS + ELB + OCP subscription. ROSA eliminates control plane EC2 costs but adds managed service premium. Break-even depends on control plane size. |
| Dimension | OpenShift (OCP) | Amazon EKS | Google GKE | Azure AKS |
|---|---|---|---|---|
| K8s version currency | Lags ~2–3 minor versions; tested, supported | Current, fast releases | Current, fast releases; Autopilot available | Current; sometimes slow on patches |
| Security defaults | Hardened by default (SCCs, RHCOS, OPA). Strong baseline. | Permissive; security is your responsibility to configure | Good defaults; Binary Authorization, Workload Identity | Decent defaults; integrates with Azure AD and Defender |
| Managed control plane | Self-managed or ROSA (managed); you own masters on self-managed | Fully managed; $0.10/hr per cluster | Fully managed; free control plane | Fully managed; free control plane |
| Multi-cloud / on-prem | Yes — bare metal, vSphere, AWS, Azure, GCP, IBM Cloud | AWS-only (EKS Anywhere for on-prem) | GCP-only (Anthos for on-prem) | Azure-only (Arc for on-prem) |
| Built-in CI/CD | Tekton, OpenShift GitOps (ArgoCD), BuildConfigs, S2I | None native; use CodePipeline, Jenkins, GitHub Actions | Cloud Build integration; no native K8s CI/CD | Azure DevOps integration; no native K8s CI/CD |
| Cost model | Subscription per core (significant). Infrastructure on top. | Pay per node + $0.10/hr cluster fee. Data transfer costs. | Pay per node. Autopilot charges per Pod resource request. | Pay per node. No cluster fee. Azure Spot available. |
| Best for | Enterprise, regulated, hybrid/on-prem, Red Hat shops | AWS-native teams; large existing AWS investment | GCP teams; strong ML/data workloads; Autopilot simplicity | Azure/Microsoft shops; Azure AD integration critical |
restricted-v2, anyuid, privileged, etc.) rather than requiring you to define the policy from scratch.anyuid. Govern SCC grants the same way you govern RBAC — audit regularly with oc adm policy who-can use scc anyuid, and automate provisioning rather than allowing manual grants. The question to ask at design time: "what's the minimum SCC this workload needs?" not "what SCC makes it work?"registry.example.com/myapp:3.1.0, it points to an ImageStream tag (myapp:production). The stream tracks what digest that tag currently resolves to. When the underlying image updates — either a new push to the registry or a promotion via oc tag — OCP can automatically trigger a new rollout on any Deployment or DeploymentConfig watching that tag.
ImageStreams also enable image promotion across environments: oc tag myapp:staging myapp:production points the production tag at the staging image digest without any re-build or re-push. They abstract the registry URL from the workload definition, making environment-specific registry differences transparent.image.openshift.io/triggers) causes automatic rollouts on any tag update. In production, you often don't want automatic rollouts triggered by image changes — you want GitOps to control rollout timing. Audit your production Deployments for ImageStream trigger annotations and disable them for workloads where rollout timing must be controlled. Use ImageStreams for image promotion (the oc tag workflow) without enabling automatic triggers in sensitive environments.DeploymentConfig (DC) is an OCP-specific resource that predates the K8s Deployment. It offered features K8s lacked at the time: image change triggers (auto-rollout on ImageStream update), custom lifecycle hooks (pre/mid/post deployment), and recreate/rolling strategies similar to K8s. K8s Deployment has since caught up on most functionality. DCs were deprecated in OCP 4.14 and will eventually be removed.
Use K8s Deployment for all new workloads. Migrate existing DCs using the migration guide in the OCP docs — most are straightforward, replacing DC-specific fields with Deployment equivalents. ImageStream triggers on Deployments are possible via annotations, though as noted they should be used deliberately.oc and test in a dev namespace first. The main gap to cover is lifecycle hooks — if your DC uses a mid-lifecycle hook (e.g., running a DB migration between old and new pods), model this as an init container or a pre-upgrade Job in your GitOps pipeline instead.admin (full project control), edit (deploy and manage apps, no RBAC changes), view (read-only).
Pod-level SCCs are a parallel gate: even if a user has edit on a namespace, the pod's service account must have access to an SCC that allows the pod's security requirements. The two systems don't interact directly — RBAC controls what you can do with the K8s API; SCCs control what the pod can do at the OS level.edit RBAC who deploys a pod that needs anyuid will get an admission error even though their RBAC is correct — because the pod's service account (not the user) needs the SCC grant. Automate SCC grants as part of namespace provisioning: if a team deploys database workloads that need specific UIDs, grant the appropriate SCC to a dedicated service account via a GitOps-managed RoleBinding, not via one-off oc adm policy commands. This makes the security posture visible, auditable, and reproducible.oc describe pod <name> to see the specific SCC validation failure; oc get events for the admission error message. Understand what the pod needs — check securityContext in the pod spec for runAsUser, runAsGroup, privileged, capabilities. Then check which SCCs the pod's service account can use: oc adm policy who-can use scc anyuid (or the SCC you think is needed).
Fix: determine the minimum SCC required. If the image must run as root, grant anyuid to the service account: oc adm policy add-scc-to-serviceaccount anyuid -z <sa> -n <ns>. Prefer a dedicated service account per workload rather than using the default SA.anyuid, push back on the image. Many images that appear to need root only need a specific UID range, which can be set with runAsUser in the pod spec — and then a custom SCC (or nonroot) works without full root access. Granting anyuid to the default service account in a namespace is a common shortcut that grants root capability to every pod in that namespace without a service account specified. Use oc adm create-bootstrap-project-template to establish a project template that provisions a dedicated service account per workload class as part of namespace setup./etc/sysctl.d/ tuning, custom CA certs), and SSH authorized keys. The MCO renders MachineConfigs into an Ignition config and applies it by draining the node and rebooting it.
Common use cases: adding corporate CA certificates to all nodes, kernel tuning for database or high-performance workloads, enabling specific kernel modules, deploying a custom systemd service for monitoring agents that must run on the host.pom.xml → Java, package.json → Node.js, etc.). Based on detection, it selects an S2I builder ImageStream from the cluster's catalog. It creates: a BuildConfig (S2I build with the source repo), an ImageStream for the output image, a Deployment (or DC in older versions) pointing to the ImageStream, and a Service exposing the pod. A first build is triggered immediately. The Route must be created separately with oc expose service.oc new-app is great for demos and learning but rarely appropriate for production. It creates opinionated resources with defaults that may not match your org's standards (resource requests, liveness probes, security context, label taxonomy). In production, use Helm charts, Kustomize, or GitOps-managed manifests that you control explicitly. Treat oc new-app as a learning scaffold — run it to see what it creates, then use those manifests as a starting point to customize, not as the final deployment artifact.RollingUpdate strategy) creates new pods with the updated image, waits for them to pass readiness checks, then terminates old pods. Controlled by maxUnavailable (how many old pods can be down at once) and maxSurge (how many extra pods can exist during rollout). Zero-downtime requires: a properly configured readinessProbe (without it, new pods receive traffic before they're ready), preStop hook with a sleep if the app needs time to drain connections, and terminationGracePeriodSeconds long enough for in-flight requests to complete.minAvailable: 1 for any Deployment that must be continuously available — this guarantees at least one replica survives any infrastructure event.oc get secret access reads the value; Secrets in env vars are exposed in oc describe pod output and container process lists. Prefer volume mounts over env vars for sensitive values.edit role in a namespace includes get secrets by default — review this and use a more restrictive role if developers shouldn't see production secrets. Never store secrets in ConfigMaps (unencrypted by policy).oc logs <pod> --previous — get logs from the last crash; most root causes are in the application logs. (2) oc describe pod <pod> — check Events for OOMKilled, Liveness probe failures, volume mount errors, image pull failures. (3) If the container starts then immediately exits, use oc debug pod/<pod> to open a shell in a copy of the pod with the entrypoint overridden — inspect the filesystem, env vars, and volume mounts. (4) Check resource limits — OOMKilled means the container hit its memory limit; increase the limit or fix the memory leak.oc debug pod/<pod> is the most underused OCP troubleshooting tool. It creates a copy of the pod with the command overridden to a shell, using the same image, volumes, env vars, and security context — letting you reproduce the environment exactly without the crash loop. For nodes, oc debug node/<node> -- chroot /host bash gives you a root shell on the host OS, useful for diagnosing kubelet issues, disk pressure, or CNI problems. Build the habit of reaching for oc debug before resorting to exec into a running container — it's safer and doesn't affect production pods.etcdctl snapshot save) captures a point-in-time snapshot of the entire cluster state. It's the primary DR mechanism: restoring from backup to a rebuilt control plane brings back all workloads, config, and RBAC. Without a backup, a failed cluster means rebuilding everything from scratch.etcd_disk_wal_fsync_duration_seconds is high, the etcd disk is too slow — use dedicated SSDs for master nodes, never shared NFS.payments-prod), a MachineConfigPool per node class if teams need node isolation, and NetworkPolicy defaults in the project template to deny cross-namespace traffic by default. Per-team onboarding: create projects (dev, staging, prod) via a GitOps-managed namespace provisioner (not manually); apply ResourceQuota and LimitRange from a template; bind the team's RBAC group to edit in dev/staging and a custom restricted-edit (no secret reads) in prod; provision a dedicated service account per workload type with the minimum required SCC pre-granted.oc apply directly in production. Enforce this via ArgoCD's auto-sync + self-heal (ArgoCD reverts manual changes), and remove direct edit access to production namespaces for developers. The transition period is painful: teams resist losing oc access. Invest in developer experience — fast PR-to-deploy pipelines, clear feedback from ArgoCD on sync status, and runbooks for emergency changes (break-glass access with full audit trail). The discipline pays off in auditability and rollback capability.oc exec into a pod + curl is still the first step, but the actual policy is programmed into OVS flows that require ovn-nbctl and ovs-ofctl to inspect at the node level. Build familiarity with oc get networkpolicy, oc describe networkpolicy, and the OVN diagnostic tools before you need them in an incident. OVN's egress IP feature (stable IP per namespace for external firewall rules) is worth knowing — it's frequently asked about in network-sensitive enterprises.oc adm upgrade), confirm the target version is in the supported path (no skipping minors), verify all cluster operators are healthy (oc get co), ensure etcd backup is current, and review the release notes for deprecated APIs. Process: oc adm upgrade --to=<version> triggers the CVO. The CVO upgrades the control plane first (masters one at a time via MachineConfigPool), then worker nodes (drain, upgrade RHCOS, reboot, uncordon). Worker upgrades are rate-limited by maxUnavailable in the worker MachineConfigPool — tune this for your workload tolerance.SecretStore (Vault credentials, address) and ExternalSecret CRs that map Vault paths to K8s Secret keys. ESO reconciles on a schedule — the Secret is kept in sync with Vault, rotating automatically. Vault Agent Injector (Sidecar): Vault injects a sidecar that authenticates to Vault via the pod's service account (Vault K8s Auth) and writes secrets to a shared in-memory volume at /vault/secrets. The application reads files, not env vars — rotation is handled by the sidecar without pod restart.oc get secret (with appropriate RBAC), and it works with any application without sidecar changes. The trade-off: the Secret is materialized in etcd — a cluster compromise exposes it. Vault Agent Sidecar keeps the secret out of etcd entirely, which is the stronger isolation model for highly sensitive values (private keys, payment credentials). For most workloads, ESO + etcd encryption at rest is sufficient. Regardless of approach: audit Vault policy assignments regularly, use namespaced K8s Auth roles (one Vault role per namespace/service-account pair), and rotate all secrets on a schedule — not just after breaches.oc tag myapp:commit-sha myapp:staging — the staging ImageStream now references the exact digest that passed tests. (4) After staging validation, oc tag myapp:staging myapp:production. Each environment's Deployment watches its environment-specific ImageStream tag; the image is the identical binary at each stage.cluster-monitoring-config ConfigMap. Key cluster health signals: cluster operator status (oc get co), etcd wal_fsync_duration_seconds (latency), node resource pressure (disk, memory, PID), API server request latency and error rate, and pod scheduling latency.privileged: true, require all Deployments to have resource requests. ACM integrates with Gatekeeper to distribute policies fleet-wide. Kyverno is an alternative with a simpler YAML-based policy language — better for teams without Rego expertise; supports mutation (automatically adding labels, resource defaults) as well as validation.reclaimPolicy: Retain for production PVCs so a PVC delete doesn't destroy data; size PVs with growth headroom; use Pod Anti-Affinity to spread replicas across nodes/zones.monitoring namespace can scrape metrics on port 8080).existing VPC install), configure private clusters, air-gapped installs, or custom AMIs. Control plane costs are visible EC2 spend. Requires platform engineering bandwidth for cluster operations.
ROSA — Red Hat operates the control plane (masters are on Red Hat's AWS account, not yours). You manage worker nodes and workloads. ROSA uses AWS STS for all IAM interactions — no long-lived credentials on the cluster. Upgrades are managed with a maintenance window policy; Red Hat responds to control plane incidents. ROSA integrates natively with AWS services: IAM roles for service accounts (IRSA), PrivateLink for private cluster access, CloudWatch log forwarding. Faster to provision (30–45 minutes vs. 90+ for IPI).kube_pod_container_resource_requests vs. container_cpu_usage_seconds_total); charge-back by namespace to make teams accountable for waste.edit in their own namespaces only; use a custom role in prod that removes exec and get secrets from developers; service accounts for CI/CD are separate from human user accountsanyuid; stateless namespaces get only restricted-v2; document and audit with oc adm policy who-can use scc anyuidoc adm policy add-scc-to-serviceaccount anyuid -z default in production namespaces — grants root capability to every pod using the default SA; creates ungoverned, audit-invisible security gapsoc adm upgrade to verify the upgrade graph; audit cluster operators (oc get co) — all must be Available before starting; pull the API deprecation report from the OCP web console; check for DCs (deprecated) and migrate to Deployments; back up etcdoc get apirequestcounts to identify which deprecated APIs are still in active use; update manifests and Helm charts before the upgrade windowminAvailable: 1 at minimum; the node drain process respects PDBs — it waits for new pods to come up before evicting old ones; payment services should have PDB minAvailable: 2 to maintain the 2-replica constraintmaxUnavailable: 1 before upgrading — this limits how many workers drain simultaneously; for a 20-node cluster, the worker upgrade takes ~20 node-reboot cycles but prevents capacity collapseanalytics namespace is making outbound connections to an external IP not in your approved egress list. Investigation reveals the pod is running with anyuid SCC and has host network access. The pod belongs to a third-party analytics agent deployed 6 months ago. You suspect the agent image has been compromised or contains malicious behavior.oc label pod <pod> quarantine=true and apply a NetworkPolicy targeting that label for more surgical isolationoc exec <pod> -- ps aux, netstat -tlnp, /proc/<pid>/environ (env vars including any injected secrets), oc get pod <pod> -o yaml (full spec including SCC assignment), and pod logs; use oc debug pod/<pod> to inspect the filesystem without killing the running podoc adm policy who-can use scc anyuid to see all service accounts with anyuid; oc get rolebindings,clusterrolebindings -A | grep anyuid to trace how the grant was made; check git history of the namespace manifests for when and who added the SCC bindingoc get pod -o jsonpath='{.status.containerStatuses[].imageID}'; scan with a container image scanner; compare the digest against the known-good build artifacts from 6 months ago — if the digest changed, the image was tampered with post-push:latest or :stable can be silently updated; always pin production images by digest and verify digest against your build pipeline