Linux Fundamentals

The goal I set for myself was simple: learn to live completely in the command line. Even when working with something like AWS, use the CLI and avoid depending on the GUI.

This post is a compilation of what I picked up while doing that.

Linux is Just the Kernel

Linux is just the kernel, the software that controls and manipulates the system's hardware. A Linux Distribution is the complete package of the Linux kernel + GUI + default apps + package managers + shell, etc. that makes up an OS.

The kernel is like the engine of a car. The distro or OS is the complete car with chassis, AC, interiors, body and everything else.

Raspberry Pi

A Raspberry Pi is just the hardware, a bare circuit board on a minimal basis and specs. It does not come with an OS, kernel or GUI. It comes with RAM, but no built-in storage.

The kernel, OS, files, data - everything exists on a micro-SD card. Pi Imager (a flashing tool) can be downloaded from the Pi official website onto the SD card. Then we just insert the card and turn it on.

Flashing - What It Really Means

Usually in an SD card or any storage device, there's a structure for the file system that determines where a piece of data exists or should be added to. There's like a menu of the data, a system that handles the data and searches for it too.

When we connect a pendrive to a laptop, the laptop sees the pendrive as just another drive with its own file system, ex: EXT4. When we try to paste data into the pendrive, the EXT4 system tells the pendrive hardware where to place those specific bytes.

With flashing, we don't copy-paste the image into the SD card at a location instructed by the filesystem. Instead we bypass the entire file system and place the ISO image data block-by-block directly on the hardware of the SD card. It replaces whatever existed in those blocks previously.

Now, the SD card doesn't contain the image data in one of its blocks. The SD card becomes the OS from the hardware level itself.

Some tools for flashing:

Rufus - to repurpose an old Windows laptop into Linux
BalenaEtcher - for macOS: https://etcher.balena.io/
UTM - for virtualization on Mac (virtualizes ARM-native OS images, emulates other architectures using QEMU)
VMware Fusion - also a good option for virtualization on macOS

Terminal vs Shell

Terminal is just the visual interface. It does not understand commands or anything.

Shell is the primary engine that processes commands, talks to the kernel, receives the output and conveys it to the terminal to display. When you connect to a remote host, the terminal belongs to your local Mac, but the shell belongs to the remote host.

Bash is the shell program used on Linux, for Mac it is Zsh.

The flow looks like this: terminal reflects the input I pass from the keyboard -> shell starts a process, parses it, expands it if needed -> kernel starts a child process -> kernel does the work, conveys the output and the child process exits -> shell process conveys output to terminal -> terminal displays the output.

Commands are just programs. cat is a program (which cat returns /usr/bin/cat). ls is a program too.

To check which shell I'm using: echo \(SHELL. What follows \) is a variable. We use caps when they are system variables or env variables set by the shell itself.

Everything is a File on Linux

This is one of those things that sounds abstract until you see it in practice.

Even processes are files. They may not be files that sit on the hard disk, but the kernel creates them on the fly just so I can read them using cat. The /proc folder is where the kernel presents process information as files. Similarly, every process gets its own directory of files that describe it. The kernel's process management is presented as files.

cat "something" > /dev/null - null is not really a file in /dev. It's just Linux's way of saying this is how you pass something into a black hole. But it looks like a file to any program, which makes it convenient to forward something into it.

When a program opens a network connection, the kernel creates a file descriptor that describes the connection. The program interacts with the file, and the corresponding actions are handled by the kernel. The program thinks it's "reading a file", but the kernel is actually pulling information from the network. The program thinks it's "writing to a file", but the kernel is actually sending data over the network.

Even pipes work this way. ps aux | grep ssh - ps aux just writes to a file and grep ssh just reads from a file. The pipe creates an anonymous file in between managing this process, but both commands think it's normal I/O.

Key Directories

/home - user home directories
/etc - system configuration files
/var - variable data: logs, databases, mail
/tmp - temporary files
/usr - where installed software lives
/bin - most command binaries
/sbin - system administration binaries
/opt - optional 3rd party software
/dev - device files
/proc - process and kernel information

Installing Software

Package managers vary by distro: apt for Debian-based (Ubuntu), apk for Alpine, pacman for Arch, dnf for RHEL/Fedora.

sudo apt update                          # update package repository list
sudo apt upgrade                         # upgrade all installed packages
apt search htop                          # search for a package
apt show htop                            # show info about a package
sudo apt install htop tree curl vim      # install packages with dependencies
sudo apt remove htop                     # remove package, keep config files
sudo apt purge htop                      # remove package including config files
apt list --installed                     # list all installed packages
apt list --installed | grep htop         # search within installed packages

When we install a program, it usually comes with 2 kinds of files: program files that carry the binaries to run it, and configuration files that customize the environment and variables to our requirements. System-wide config files are stored in /etc/. User-specific config files are usually dotfiles in the home directory, ex: .gitconfig.

Users, Groups & Permissions

whoami                    # current user
who                       # logged in users
id                        # user and group IDs
sudo adduser testuser     # create user
sudo deluser testuser     # delete user

File permissions follow the format: -rwxrwxrwx - file type, then owner, group, others. Read is 4, write is 2, execute is 1.

chmod 777 file-name       # or u+x, o+rw, etc.
chown user file           # change owner
chgrp group file          # change group
sudo -i                   # get root shell

Input, Output & Pipes

Three streams: standard input, standard output, standard error.

command > file.txt              # redirect output to file
command >> file.txt             # append instead of overwrite
wc -l < /etc/passwd            # redirect input from file
ls /nonexistant 2> /dev/null   # discard error output
ls /etc/ /nonexist &> all.txt  # redirect both stdout and stderr

Pipes send stdout of one command into stdin of another:

ls /etc | head -5
cat names.txt | sort | uniq -c
echo "HELLoo" | tr 'A-Z' 'a-z'
ls /etc/ | tee list.txt         # write to screen AND to a file

Processes

Processes have a PID, parent process (PPID), owner, and current state.

ps aux                     # list all processes
htop                       # real-time process viewer
kill -9 <PID>              # kill a process
pkill -f "python script.py" # kill by matching pattern
sleep 60 &                 # run in background
jobs                       # list background jobs
fg %1                      # bring job back to foreground

If a process is running in the foreground, Ctrl+Z pauses it, then bg takes it to the background.

systemd

systemd initializes the system at boot and manages services. It's a compiled binary on disk. Once the Linux kernel finishes initializing, it triggers systemd which becomes PID 1.

We use systemctl to interact with systemd, which handles starting, stopping or managing daemons. Unlike SysVinit which was the standard earlier, systemd runs the booting process in parallel. It manages the order of starting services, handles dependencies, mounts filesystems, sets up networking, manages user logins and so on.

Every service produces logs. journald captures those and can be queried through journalctl.

sudo systemctl start cron            # start a service (stop, restart, enable, disable)
journalctl --since "1 hour ago"      # query logs
pstree                               # visualize process tree

Networking

192.168.xxx.xxx always refers to a private internal network.

ip a                # view IP address, look for inet line on eth0
ip route            # view routing table, default points to WiFi router (gateway)
hostname            # view hostname
ping <ip>           # test connectivity
curl https://example.com    # download files / make HTTP requests
wget <url>                  # download files

Ports and Sockets

A port is just a number. A socket is a combination of IP address + port number + protocol (TCP/UDP). When a program wants to connect over the network, it asks the kernel to create a socket.

One port can have multiple sockets. For example, when we start sshd, it binds to port 22 and calls listen() which is a system call. This socket is now in LISTEN state. When a client connects, the listening socket creates a new socket with state ESTABLISHED for that connection, while the original one continues listening.

ss -tunlp           # see what ports are open and listening
ss -tun             # see active/established connections

`/etc/hosts`

Used to assign hostnames or shortcuts that we can easily remember, avoiding typing IP addresses every time. Can also be used to indirectly block websites by pointing, for example, www.facebook.com to the loopback address 127.0.0.1.

SSH

How it works behind the scenes:

Client runs ssh username@ip-address
SSH client creates TCP connection to server on port 22
TCP 3-way handshake
Diffie-Hellman key exchange
Authentication (password or key)
Encrypted channel established
Commands from our terminal run on the remote server

# Generate SSH keys
ssh-keygen -t ed25519

# Copy public key to server
ssh-copy-id user@ip-address

# If ssh-copy-id isn't available
cat ~/.ssh/id_ed25519.pub | ssh user@192.168.100.71 "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"

# Disable password authentication on server
vi /etc/ssh/sshd_config    # change PasswordAuthentication to no
sudo systemctl restart ssh

SSH config for shortcuts (~/.ssh/config):

Host linux
    HostName 192.168.1.12
    User admin
    IdentityFile ~/.ssh/id_ed25519

Now ssh linux just works.

Copy files with SCP:

scp file.txt admin@192.168.1.12:/home/admin              # local to remote
scp admin@192.168.1.12:/var/log/syslog ./                 # remote to local
scp -r directory/ admin@192.168.1.12:/home/admin          # copy directory

Tmux

Any process we execute on a remote server through SSH will break if we lose connection to the server. I generally used nohup if I knew it was going to take a lot of time, which I now realize was a pretty amateur approach. I discovered that tmux offers a much better way. It lets us run multiple terminal sessions inside one window and keeps them running even if we lose the SSH connection to the server.

Suppose we want to carry out multiple tasks on the same server, we'd normally use multiple SSH connections. Tmux handles that by allowing us to monitor multiple windows and panes in the same SSH session - watch logs in one terminal, edit files in another, and run scripts in a third. Without tmux, we would need 3 SSH connections.

sudo apt install tmux
tmux                              # start tmux
tmux new -s devops-lab            # create named session
tmux ls                           # list sessions
tmux attach -t devops-lab         # reattach to session
tmux kill-session -t devops-lab   # kill session

All tmux commands start with Ctrl+b:

Sessions: Ctrl+b d to detach

Windows (each window is a full terminal):

Shortcut	Action
`Ctrl+b c`	Create new window
`Ctrl+b n`	Next window
`Ctrl+b 2`	Jump to window 2
`Ctrl+b ,`	Rename window
`Ctrl+b &`	Close window

Panes (split each window into separate terminals):

Shortcut	Action
`Ctrl+b "`	Split horizontally
`Ctrl+b %`	Split vertically
`Ctrl+b arrows`	Move between panes
`Ctrl+b z`	Toggle full-screen on a pane

Configuration & Customization

Dotfiles

Hidden files meant to manage configuration for the shell, editors, or other tools. Ex: .zshrc, .bashrc, .profile.

.profile is loaded once during SSH or boot. Contains env variables, PATH, etc.

.bashrc is loaded every time a new terminal opens because customizations like aliases, prompts, shell behaviors don't persist across terminals and cannot be inherited.

We usually put everything in .bashrc and then source it from .profile itself to maintain consistency.

Starship Prompt

Starship works across shells and VMs.

curl -sS https://starship.rs/install.sh | sh
echo 'eval "$(starship init bash)"' >> ~/.bashrc
source ~/.bashrc

Starship reads ~/.config/starship.toml, but it doesn't get created automatically:

mkdir -p ~/.config
vi ~/.config/starship.toml

Minimal configuration:

command_timeout = 1000
"$schema" = 'https://starship.rs/config-schema.json'
add_newline = true

[character]
success_symbol = '[➜](bold green)'

[package]
disabled = true

Explore more at: https://starship.rs/config/

Vim Configuration

cp /etc/vim/vimrc ~/.vimrc
# uncomment what's relevant

Tmux Configuration

vi ~/.tmux.conf

# Start window numbering at 1 (not 0)
set -g base-index 1
setw -g pane-base-index 1

# Enable mouse support
set -g mouse on

# Increase history limit
set -g history-limit 10000

# 256 color support
set -g default-terminal "tmux-256color"
set-option -sa terminal-overrides ',xterm-256color:RGB'

Book to read next: Unix and Linux System Administration Handbook

Linux Fundamentals - What I Learned Getting Comfortable With the Command Line

Comments

More from this blog

Notes on The Algorithm by Jon McNeill

Notes on The Book of Elon by Eric Jorgensen

Tmux - Why Every DevOps Engineer Should Use It

How I Built a Deliberately Vulnerable Banking App to Demonstrate Automated Security Scanning with Semgrep and Jenkins

Linux is Just the Kernel

Raspberry Pi

Flashing - What It Really Means

Terminal vs Shell

Everything is a File on Linux

Key Directories

Installing Software

Users, Groups & Permissions

Input, Output & Pipes

Processes

systemd

Networking

Ports and Sockets

`/etc/hosts`

SSH

Tmux

Configuration & Customization

Dotfiles

Starship Prompt

Vim Configuration

Tmux Configuration

Command Palette

Comments

More from this blog

Linux is Just the Kernel

Raspberry Pi

Flashing - What It Really Means

Terminal vs Shell

Everything is a File on Linux

Key Directories

Installing Software

Users, Groups & Permissions

Input, Output & Pipes

Processes

systemd

Networking

Ports and Sockets

/etc/hosts

SSH

Tmux

Configuration & Customization

Dotfiles

Starship Prompt

Vim Configuration

Tmux Configuration

`/etc/hosts`