Docker Core Concepts: Containers Are Not Magic, They Are Linux Processes + Isolation

Lou Chang included in DevOps Docker

2026-02-19 About 1900 words 9 minutes

Contents

DevOps learning notes.

This post clarifies Docker core concepts from first principles, including the relationship between image, layer, and container, multi-stage build, and why a container is a Linux process rather than magic.

Read-Only Build Template

An image is a read-only template used to create containers.

Analogy:

Image = cake mold (read-only, shape does not change when used, can make unlimited identical cakes)
Container = cake made from the mold (each is an independent entity)

Compared with OOP:

1
2


Image     = Class definition
Container = instance created by new

Stacked Artifacts and Non-Mutation

An image is not one single whole. It is stacked layer by layer. Each instruction line in a Dockerfile creates one layer:

1
2
3
4
5
6


FROM golang:1.25-alpine    ← Layer 1: base OS + Go toolchain
WORKDIR /app               ← Layer 2: create working directory
COPY go.mod go.sum ./      ← Layer 3: copy dependency definitions
RUN go mod download        ← Layer 4: download packages
COPY . .                   ← Layer 5: copy source code
RUN go build -o server ... ← Layer 6: compile

Analogy: a layer is one sheet in a mille-feuille cake. You can keep adding sheets on top, but you cannot swap a middle sheet because upper sheets depend on it.

Rebuild Creates Fresh Hash Entries

Changing a Dockerfile does not “modify old layer.” It “creates a new layer.” Old layers still exist in Docker cache and are not touched.

1
2
3
4
5
6
7


First build:
  FROM alpine  → Layer A (hash: abc123) ← created
  RUN go build → Layer B (hash: def456) ← created

Code changed, second build:
  FROM alpine  → Layer A (hash: abc123) ← cache hit, reused
  RUN go build → Layer C (hash: ghi789) ← new layer, Layer B still exists but unused

It is like git commit: you cannot modify old commits, you can only add new commits. Once a layer is created, content never changes and is uniquely identified by SHA256 hash.

Dependency-First Ordering Saves Build Time

This is why Dockerfile should separate dependencies from source code:

1
2
3
4


COPY go.mod go.sum ./   ← copy only these two files first
RUN go mod download     ← download packages (this layer gets cached)
COPY . .                ← then copy source code
RUN go build ...

If go.mod is unchanged, dependency download layer uses cache directly and does not re-download. If code changes, only the last layers rerun, saving large build time.

If everything is put in one layer:

1
2


COPY . .                ← any code change invalidates this layer
RUN go mod download     ← re-downloads packages every time

Runtime Instance with Private Write Space

An image is read-only, but a container needs writable behavior (for example logs and temp files).

Docker does it this way:

1
2
3
4
5


Image (all read-only layers)
     +
thin writable layer (belongs to this container only)
     =
Container

Analogy:

Image = CD disc (read-only, cannot write)
Container = program running from CD, with a note pad beside it for temporary writes

When the container is removed, the note pad is removed. The CD never changes.

Two-Phase Compilation Pattern

Why split stages?

scratch is a fully empty image. It has no Go compiler, so you cannot run go build in an empty environment.

1
2


golang:alpine  → has compiler, compiles .go into binary (builder stage)
scratch        → empty, only holds the compiled binary (final stage)

Builder stage is scaffolding for construction. When the house is done, scaffolding is removed. golang:alpine is only a build-time tool and does not enter final image.

Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


FROM golang:1.25-alpine AS builder

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .

RUN CGO_ENABLED=0 GOOS=linux go build -o server ./cmd/server/

FROM scratch

COPY --from=builder /app/server /server

EXPOSE 8080

ENTRYPOINT ["/server"]

Several details:

CGO_ENABLED=0: disable C bridge and build a fully static binary with no dependency on system library. Then Go binary can run on empty scratch.

COPY go.mod go.sum ./ with ./: ./ and . both mean current directory inside container (/app set by WORKDIR), fully equivalent, only style difference.

COPY --from=builder: copy only that one binary from builder stage and discard everything else.

Artifact Footprint Comparison

Image	Size	Content
`golang:1.25-alpine`	~300MB	Go toolchain + standard library + OS
`go-api` (multi-stage)	~10MB	only one binary

.dockerignore

Like .gitignore, it tells Docker build context which files to exclude:

1
2
3


.git
*.md
server

Empty Base Versus Minimal Runtime Base

1
2


Full OS (ubuntu)  →  distroless  →  scratch (empty)
     risky            middle          smallest

scratch: fully empty, nothing inside.

distroless: built by Google, includes only minimum required pieces (CA certs, timezone data, basic user info), no shell, no curl, no apt, no tools.

Selection criteria:

Language	Choice	Reason
Go (`CGO_ENABLED=0`)	scratch	static binary, no runtime dependency
Java	distroless/java	needs JVM, does not need shell
Python	distroless/python	needs interpreter, does not need shell

Relaunch Versus Regenerate

At the moment a container is created, it is frozen on that image version.

Even if you later pull a newer image, the running container is completely unaffected:

1
2
3
4


pull latest (= 11.0) → docker run → container A frozen at 11.0

pull latest (= 12.0) → new image arrives
                       container A still runs 11.0, completely unaffected

Sometimes docker images shows:

1
2
3


REPOSITORY    TAG       IMAGE ID
mssql         latest    abc123     ← new 12.0
<none>        <none>    def456     ← old 11.0, still used by a container

Old image has no tag anymore, but cannot be deleted because a container still depends on it.

Restart: kill process and restart the same container. Binary is old, changes in .go have no effect.

Rebuild: rerun Dockerfile, produce new image, then create new container from that image. Code changes actually take effect.

1
2
3
4
5


# correct update flow after code changes
docker build -t go-api .
docker stop old-container
docker rm old-container
docker run -p 8080:8080 go-api

Persistent Data Mounts

Container writable layer is temporary. If container is removed, data disappears. For persistent data, use Volume.

Analogy: container is a computer, volume is an external hard drive plugged in. If computer dies, data on external drive remains. Plug into a new computer and all data is back.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


# create container with volume mounted
docker run --name sql-old \
  -v sql-data:/var/opt/mssql \
  mssql/server:2019

# upgrade version, keep data
docker stop sql-old
docker rm sql-old
docker run --name sql-new \
  -v sql-data:/var/opt/mssql \
  mssql/server:2022

Volume disappears only when explicitly removed:

1
2


docker volume ls
docker volume rm sql-data

Moving-Label Pitfall

latest is not a version number. It is a moving label. Real-world practice is pinning exact version:

1
2
3
4
5


# not recommended
docker pull mssql/server:latest

# recommended
docker pull mssql/server:2022-CU14-ubuntu-22.04

Pulling at different times and on different machines still gets exactly the same version. Upgrades become intentional behavior, not accidental.

Distribution Methods

Scenario	Method
internal deployment, CI/CD	Registry (AWS ECR, GitHub Container Registry)
offline environment	`docker save` / `docker load`
open source	provide source code, users build it themselves

1
2
3
4
5


# export as file
docker save go-api | gzip > go-api.tar.gz

# import on another machine
docker load < go-api.tar.gz

Execution Model on a Shared Kernel

Runtime Unit Equals One Host Task

What docker run does is start a process on your Linux host.

It does not boot a virtual machine. It is not magic. It is just a process.

1
2
3
4


docker run -d --name my-api go-api

# visible directly on host
ps aux | grep server

You can see server directly in host process list. It runs on your Linux host, not inside some hidden “internal OS.”

Boundary Model with Visibility and Resource Limits

Difference between a container and a normal process is that it is constrained by two Linux kernel mechanisms:

1
2
3


Isolation
├── cannot see outside    → Namespace
└── cannot use too much   → cgroup

Analogy: an apartment building.

1
2
3
4


entire building          = your Linux host (shared kernel)
each separate room       = each container
walls and door locks     = Namespace (cannot see neighbors)
electricity quota/room   = cgroup (limited resources per container)

Residents are not moved to another building (not VM). They are isolated inside the same building.

Visibility Boundaries

Namespace makes processes inside container feel they live in an independent environment.

PID Namespace

1
2
3
4


Host sees:                         Container sees:
PID 1  → systemd                   PID 1  → server (your Go app)
PID 891 → server (Go app)          (cannot see any host processes)
PID 892 → nginx

Network Namespace

1
2
3


Host:          eth0 (192.168.1.100)
Container A:   eth0 (172.17.0.2)  ← isolated virtual network interface
Container B:   eth0 (172.17.0.3)

Each container has its own IP and network interface and cannot see other container sockets.

Mount Namespace

/ inside container is / from image, not / from host. Running ls / in container shows completely different result from host.

Resource Quotas

Even if process wants to consume resources aggressively, cgroup limits how much it can use:

1

docker run --cpus="0.5" --memory="512m" go-api

Even if program inside container spins a loop, it can only consume 0.5 CPU and does not impact host or other containers.

Hypervisor Guest Versus Shared-Kernel Unit

	VM	Container
Isolation mechanism	independent kernel (Hypervisor)	Namespace + cgroup (shared kernel)
Startup time	minutes (needs boot)	milliseconds (just starts process)
Size	GB-level	MB-level
Isolation strength	stronger	weaker (shared kernel)
Performance overhead	higher	near-native

1
2


VM:        Hardware → Hypervisor → Guest OS (full Kernel) → Process
Container: Hardware → Host Linux Kernel → Container Runtime → Process (Namespace + cgroup)

Why Empty Runtime Reduces Attack Surface

Namespace isolation is not 100% perfect. If an attacker finds a kernel escape from inside a container, what they can do depends on which tools are available inside container:

1
2
3
4
5
6
7
8


ubuntu image compromised:
  has bash → run arbitrary commands
  has curl → download malicious tools
  has apt  → install anything

scratch compromised:
  no shell → cannot run interactive commands
  no tools → almost nothing can be done

One less tool means one less attack surface. This is the core reason why scratch and distroless are safer than full OS images.

End-to-End Mental Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


Each Dockerfile instruction
  └── produces a Layer (read-only, hashed, immutable)
      └── multiple Layers stacked = Image (read-only template)
          └── docker run = Image + writable layer = Container (running process)
              ├── Namespace → cannot see outside (PID / Network / Mount)
              └── cgroup   → cannot use too much (CPU / Memory)

Container operations:
  docker stop / start  → restart, binary unchanged, code changes have no effect
  docker build         → rebuild, new image, code changes take effect
  Volume               → persistent data, independent of container lifecycle

References

Docker Documentation — Multi-stage builds — official explanation and best practices for multi-stage build
Docker Documentation — Building best practices — official recommendations for layer cache, image size optimization, and more
GoogleContainerTools/distroless — source and design notes of distroless images
Linux man page — namespaces(7) — core documentation for Linux Namespace
Linux man page — cgroups(7) — core documentation for cgroup