Docker Core Concepts: Containers Are Not Magic, They Are Linux Processes + Isolation
DevOps learning notes.
This post clarifies Docker core concepts from first principles, including the relationship between image, layer, and container, multi-stage build, and why a container is a Linux process rather than magic.
Read-Only Build Template
An image is a read-only template used to create containers.
Analogy:
- Image = cake mold (read-only, shape does not change when used, can make unlimited identical cakes)
- Container = cake made from the mold (each is an independent entity)
Compared with OOP:
|
|
Stacked Artifacts and Non-Mutation
An image is not one single whole. It is stacked layer by layer. Each instruction line in a Dockerfile creates one layer:
|
|
Analogy: a layer is one sheet in a mille-feuille cake. You can keep adding sheets on top, but you cannot swap a middle sheet because upper sheets depend on it.
Rebuild Creates Fresh Hash Entries
Changing a Dockerfile does not “modify old layer.” It “creates a new layer.” Old layers still exist in Docker cache and are not touched.
|
|
It is like git commit: you cannot modify old commits, you can only add new commits. Once a layer is created, content never changes and is uniquely identified by SHA256 hash.
Dependency-First Ordering Saves Build Time
This is why Dockerfile should separate dependencies from source code:
|
|
If go.mod is unchanged, dependency download layer uses cache directly and does not re-download. If code changes, only the last layers rerun, saving large build time.
If everything is put in one layer:
|
|
Runtime Instance with Private Write Space
An image is read-only, but a container needs writable behavior (for example logs and temp files).
Docker does it this way:
|
|
Analogy:
- Image = CD disc (read-only, cannot write)
- Container = program running from CD, with a note pad beside it for temporary writes
When the container is removed, the note pad is removed. The CD never changes.
Two-Phase Compilation Pattern
Why split stages?
scratch is a fully empty image. It has no Go compiler, so you cannot run go build in an empty environment.
|
|
Builder stage is scaffolding for construction. When the house is done, scaffolding is removed. golang:alpine is only a build-time tool and does not enter final image.
Dockerfile
|
|
Several details:
CGO_ENABLED=0: disable C bridge and build a fully static binary with no dependency on system library. Then Go binary can run on empty scratch.
COPY go.mod go.sum ./ with ./: ./ and . both mean current directory inside container (/app set by WORKDIR), fully equivalent, only style difference.
COPY --from=builder: copy only that one binary from builder stage and discard everything else.
Artifact Footprint Comparison
| Image | Size | Content |
|---|---|---|
golang:1.25-alpine |
~300MB | Go toolchain + standard library + OS |
go-api (multi-stage) |
~10MB | only one binary |
.dockerignore
Like .gitignore, it tells Docker build context which files to exclude:
|
|
Empty Base Versus Minimal Runtime Base
|
|
scratch: fully empty, nothing inside.
distroless: built by Google, includes only minimum required pieces (CA certs, timezone data, basic user info), no shell, no curl, no apt, no tools.
Selection criteria:
| Language | Choice | Reason |
|---|---|---|
Go (CGO_ENABLED=0) |
scratch | static binary, no runtime dependency |
| Java | distroless/java | needs JVM, does not need shell |
| Python | distroless/python | needs interpreter, does not need shell |
Relaunch Versus Regenerate
At the moment a container is created, it is frozen on that image version.
Even if you later pull a newer image, the running container is completely unaffected:
|
|
Sometimes docker images shows:
|
|
Old image has no tag anymore, but cannot be deleted because a container still depends on it.
Restart: kill process and restart the same container. Binary is old, changes in .go have no effect.
Rebuild: rerun Dockerfile, produce new image, then create new container from that image. Code changes actually take effect.
|
|
Persistent Data Mounts
Container writable layer is temporary. If container is removed, data disappears. For persistent data, use Volume.
Analogy: container is a computer, volume is an external hard drive plugged in. If computer dies, data on external drive remains. Plug into a new computer and all data is back.
|
|
Volume disappears only when explicitly removed:
|
|
Moving-Label Pitfall
latest is not a version number. It is a moving label. Real-world practice is pinning exact version:
|
|
Pulling at different times and on different machines still gets exactly the same version. Upgrades become intentional behavior, not accidental.
Distribution Methods
| Scenario | Method |
|---|---|
| internal deployment, CI/CD | Registry (AWS ECR, GitHub Container Registry) |
| offline environment | docker save / docker load |
| open source | provide source code, users build it themselves |
|
|
Execution Model on a Shared Kernel
Runtime Unit Equals One Host Task
What
docker rundoes is start a process on your Linux host.
It does not boot a virtual machine. It is not magic. It is just a process.
|
|
You can see server directly in host process list. It runs on your Linux host, not inside some hidden “internal OS.”
Boundary Model with Visibility and Resource Limits
Difference between a container and a normal process is that it is constrained by two Linux kernel mechanisms:
|
|
Analogy: an apartment building.
|
|
Residents are not moved to another building (not VM). They are isolated inside the same building.
Visibility Boundaries
Namespace makes processes inside container feel they live in an independent environment.
PID Namespace
|
|
Network Namespace
|
|
Each container has its own IP and network interface and cannot see other container sockets.
Mount Namespace
/ inside container is / from image, not / from host. Running ls / in container shows completely different result from host.
Resource Quotas
Even if process wants to consume resources aggressively, cgroup limits how much it can use:
|
|
Even if program inside container spins a loop, it can only consume 0.5 CPU and does not impact host or other containers.
Hypervisor Guest Versus Shared-Kernel Unit
| VM | Container | |
|---|---|---|
| Isolation mechanism | independent kernel (Hypervisor) | Namespace + cgroup (shared kernel) |
| Startup time | minutes (needs boot) | milliseconds (just starts process) |
| Size | GB-level | MB-level |
| Isolation strength | stronger | weaker (shared kernel) |
| Performance overhead | higher | near-native |
|
|
Why Empty Runtime Reduces Attack Surface
Namespace isolation is not 100% perfect. If an attacker finds a kernel escape from inside a container, what they can do depends on which tools are available inside container:
|
|
One less tool means one less attack surface. This is the core reason why scratch and distroless are safer than full OS images.
End-to-End Mental Model
|
|
References
- Docker Documentation โ Multi-stage builds โ official explanation and best practices for multi-stage build
- Docker Documentation โ Building best practices โ official recommendations for layer cache, image size optimization, and more
- GoogleContainerTools/distroless โ source and design notes of distroless images
- Linux man page โ namespaces(7) โ core documentation for Linux Namespace
- Linux man page โ cgroups(7) โ core documentation for cgroup