Linux Commands Every Sysadmin Must Know

Linux Commands

Nowadays there are countless ways to manage a Linux system – sysadmins can easily feel like sitting back and simply relying on managed environments to help them through their workload. These command line tools and packages help developers find problems on their Linux-powered machines, and can often make it much simpler and easier to organize their workflow. Linux commands is the path for  optimizing apps and to help troubleshoot, providing essential information.

It doesn’t matter what your experience level is when it comes to managing a Linux system: we think the following commands will help you understand your system better. You can also use many of these tools for troubleshooting. For example, why does your application refuse to work on a remote host – but work just fine locally? The following tools cover everything from a development environment across to Linux environments that are virtualized – Linux containers are covered too, and so are bare metal machines. Let’s get started.

curl

Do you want to test the endpoint of an application, or check its connectivity to an upstream service? curl is your friend because curl can transfer a URL. Use it to determine whether your app is able to contact a different service (a database for example) or to check if a service is healthy.

Here is an example: if your app shows an HTTP 500 error that says a MongoDB cannot be reached, you can use the following command

$ curl -I -s superapp:5000
HTTP/1.0 500 INTERNAL SERVER ERROR

Here, using -l will show the header while -s produces a silenced response. In another example, you can check the endpoint of a database using your PC:

$ curl -I -s database:27017
HTTP/1.0 200 OK

The response you get should look fine, but next you can try to get to your database – by reaching it from the application host. Don’t forget, your application uses the same name as the hostname of the database, so this is something you can try:

$ curl -I -s https://superapphost.com
HTTP/1.1 200 OK

When you examine this response you will see that the application is unable to find the database because the address of the database cannot be reached, or because the host (whether it is a virtual machine or a container) does not have a nameserver that allows it to find your hostname.

ls

Want a list of files in a directory? Use the ls command. It’s a frequently used command by both developers and sysadmins. If you’re using a container, it’s a good way to find out what the files and directories look like on your container image. Another really useful aspect of ls is that it shows you the permissions on files.

In the example we show below you will see why you are unable to run myapp – it’s because of a problem with permissions. You can check permissions using the command ls-l; in turn you will see from the results below that the permissions it shows does not include an “x” in the sequence “-rw-r–r—”; this sequence of course means that the file is write and read only.

$ ./superapp
bash: ./superapp: Permission denied
$ ls -l superapp
-rw-r--r--. 1 root root 33 Jul 21 18:36 superapp

tail

Using the tail command shows you the very last part of a file. It is a useful command because, in most cases when you need to examine a log file, you only need to see the final part – not the entire file. Using this command you can check just the recent requests made to the application.

One example is using tail to see what has happened in your Apache web server logs. Use -f and you will see the requests to your Apache server as they happen.

Using the -f option tells tail to “follow”, doing this means that the log lines are sent to your screen as these lines are being written to the log. In our example you will see a script running in the background which is trying to access an endpoint – every few seconds, with the log making a note of all of these requests. Another tail option is -n which shows you the last 100 lines in a file.

$ tail -n 100 /var/log/httpd/access_log

cat

The function you need to print and to concatenate a file is – cat. You can make use of it if you want to check your dependencies file or if you want to double check the version of an app which you have been building on your local machine.

$ cat requirements.txt
flask
flask_pymongo

In our example we make use of the function to see whether the Python Flash app has a specific dependency – Flask, in this instance.

env

env is the command you need if you want to set environment variables, or print your environment variables. Checking for incorrect environment variables can help you find issues with applications. Below, we show how you can use env to check environment variables on the machine hosting your app:

$ env
PYTHON_PIP_VERSION=9.0.1
HOME=/root
DB_NAME=test
PATH=/usr/local/bin:/usr/local/sbin
LANG=C.UTF-8
PYTHON_VERSION=3.4.6
PWD=/
DB_URI=mongodb://database:27017/test

grep

If you’re looking for file patterns grep is your friend. You can also use it to find a pattern in the output of a different command that you used – grep will use highlighting to show you the lines you’re looking for. So, you can use grep if you want to find something in a log file or to find a specific process.

For example, if you’re trying to figure out whether Apache Tomcat has successfully started up you could find that there are too many lines to go through. However, a grep command can help you find the right line – simply pipe the output to grep and it will show you the line that says the server has started.

$ cat tomcat.log | grep org.apache.catalina.startup.Catalina.start
15-Jan-2020 14:08:48.543 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 516 ms

netstat

You can display the network status using netstat. It includes an outline of the ports currently being used as well as inbound connections. Note that netstat is not included in Linux by default, you need to add it separately by installing the net-tools package.

How it can be used? E.g. developer tests an application on a local machine and then publishes their app on the host machine – in turn, getting an error which says a port is being used by another application – or that a specific address is currently being used.

ip address

ip address is the command you use if you want to see all the interfaces and IP addresses associated with an application, on a specific host. One way to use ip address is to check the IP address for your container – or to check the IP address of the host. Also, where a host gets connectivity from two networks ip address will tell you which network is connected to which network interface.

Note that if you can find ip address on your host you need to download a specific package to install it – it’s called iproute2.

chmod

Sometimes when you run an app binary for the first time you can get an error which states “permission denied”. With ls, of course, you can check the permission in place for a specific application’s binary:

$ ls -l
total 4
-rw-rw-r--. 1 user user 34 Jan 15 12:17 testx.sh

Our example displays how in this case you do not have the execution rights to be able to run this binary – there is no “x” in place. Chmod is the command you need to change these permissions so that you are able to execute the binary as the current user.

$ chmod +x test.sh
[user@localhost ~]$ ls -l
total 4
-rwxrwxr-x. 1 user user 34 Jan 15 12:17 testx.sh

This example shows how the permissions are now updated with the right execution privileges. When you now try to run your binary you won’t see the same permission error. Note that you can also use chmod where you have loaded a binary into your container – chmod can make sure that the container has the right permissions.

id

You can use the id command to determine the identity of a user that is running a specific app. In our example below, Vagrant is in place to test apps and to isolate the app’s development environment.

Once you log in to Vagrant you will encounter a problem installing Apache: you’ll be told you don’t have the ability to do it while logged in as root. You can use id to see what your user and group is – in our example, you will see you are running as user “vagrant”, which on its own is in the “vagrant” group.

$ yum -y install httpd
Loaded plugins: fastestmirror
You need to be root to perform this command.
$ id
uid=1000(vagrant) gid=1000(vagrant) groups=1000(vagrant) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

By the way, to fix this particular problem, you will need to execute the command as a superuser – this will give you sufficient privileges.

dig and nslookup

DNS – domain name servers – is the server that links a URL to the server containing that application. Sometimes you can’t resolve a URL, for whatever reason, and often the blame is on some connectivity problem with that application.

Say that, from your host, you wanted to access a database that is at the “mydatabase” URL – but you get an error which says “cannot resolve”. Both dig and nslookup are great tools for trying to find what the real problem is. You use dig to look up a DNS, while nslookup performs a full query of an internet name server.

$ nslookup superdatabase
Server:   10.0.2.3
Address:  10.0.2.3#53
** server can't find superdatabase: NXDOMAIN

So, with nslookup you can now see that your domain “mydatabase” cannot be resolved. Dig will give you a similar result:

$ dig superdatabase
; <<>> DiG 9.9.4-RedHat-9.9.4-50.el7_3.1 <<>> superdatabase
;; global options: +cmd
;; connection timed out; no servers could be reached

It can be tricky to find the root cause behind errors like this – it could be as a result of many different problems. Your sysadmin can often help you because these can be very technical, networking problems. If your problem is on a local test server it could mean that you have not configured the nameservers on your test host the right way.

sestatus

When using an application host that is enterprise-grade or managed by a large company you might encounter a Linux security module called SELinux. The goal of SELinux is to enable processes to run on a host using the absolutely minimum security privileges possible. As a result, any malicious process cannot do much damage because by default it does not have the security privileges.

However, it can happen that an app needs to legitimately access a file – only giving an error when it tries to. You can check whether SELinux is causing the problem by making use of grep and tail to check whether there is a message in /var/log/audit logs which says “denied”. Another option is to try and see whether SELinux is enabled – you do this using sestatus.

$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

Our example shows a case where the Linux instance has SELinux configured and enabled. If it is your own host you manage locally you can choose to reconfigure SELinux to be less restrictive. On the other hand, if the Linux host is not under your control your sysadmin can help you change the configuration so your app has the required permissions.

du

For more insight into exactly which files use the most disk space you can use the du command to show space usage in a specific directory. In the example below, you can find out which log uses the most disk space in /var/log, we added the -h parameter for a human readable result, while -s tells du to show the total file size.

$ du -sh /var/log/*
358K  /var/log/audit
5.0K  /var/log/boot.log
0 /var/log/chrony
4.0K  /var/log/cron
4.0K  /var/log/maillog
48K /var/log/messages

You can see from the above output that the biggest directory is in fact /var/log/audit. Using df and du together will help you figure out what is using the most disk space on your host.

Understanding Basic Linux Commands Can Help You

To wrap it up, the basic tools we have outlined above are simple commands which can all help you find out why one of your apps are not working as expected – and why an application is working fine in one development environment, but refuses to co-operate in another development environment.

Sysadmins make use of these tools to debug system and application issues. If you understand these commands you’ll have a head start – you can fix some problems yourself, you can help your sysadmin find a problem and at the very least you can grow to understand what sysadmins are trying to tell you while they try to fix your application.

MariaDB Performance Tuning

MariaDB performance

The performance of MariaDB is something that a multitude of uses are now interested in improving. Since it emerged as a fork of MySQL it’s seen a big acceleration in uptake by the open-source database community. Starting life as a drop-in replacement, MariaDB has begun to distinguish itself from MySQL, and particularly so since MariaDB 10.2 was released.

But that aside, there are still no obvious differentiators between MariaDB and MySQL, as both use mutually compatible engines that can run natively with each other. That means it should come as no surprise that MariaDB performance tuning mirrors that of MySQL. Let’s look at boosting MariaDB performance, and that of systems that run in a Linux environment.

MariaDB – Optimizing System and Hardware

You can increase the performance of MariaDB by improving your hardware. Here’s the right order to do it in:

Memory

If you want to adjust Server System Variables to give you larger key and table caches, then memory (and lots of it) is what you need. More memory means less disk caching, which is the considerably slower option.

But do bear in mind that just adding in extra memory might not give you the spectacular MariaDB performance improvements that you were hoping for if you don’t set the server variables correctly to take advantage of it.

Also, remember that filling additional RAM slots on the motherboard elevates the bus frequency, which means more latency between the CPU and the RAM. Better to use the biggest RAM sticks you can find for each slot instead.

Disks

Disks have always been a system bottleneck in comparison to RAM because they’re just not as fast. But that doesn’t mean you can’t improve their speediness. The most important figure to be aware of is disk seek time (which tells you how quickly the read head can move to get to the data) so choose disks with the lowest seek times you can find, and think about using dedicated disks for transaction logs and temporary files, too.

Fast Ethernet

In tandem with your internet bandwidth, fast ethernet lowers your client request response times, and your replication response times for reading binary logs across slaves. Lower response times are particularly significant with Galera-based clusters.

CPU

It’s hard to conceive of a situation where getting a faster processor wouldn’t be a good thing, as tearing through computations more quickly means more data is delivered to the client sooner. But while processor speed is very important, so are bus speed, cache size, and core count.

 

Setting Your Disk I/O Scheduler for MariaDB Performance

An I/O scheduler optimizes disk access requests by grouping I/O requests in similar areas on the disk. This speeds things up because the disk drive only has to go to one location to find what it’s looking for, so expect fewer disk operations and a big drop in response time. The recommended tools for managing the I/O performance of MariaDB are noop and deadline.

noop lets you check if complex I/O scheduling decisions of other schedulers are responsible for regressions in I/O. In some instances, it can be useful for devices that do I/O scheduling themselves, as intelligent storage, or devices that don’t rely on mechanical movements, like SSDs. Usually, the DEADLINE I/O scheduler is a more appropriate choice for these devices, but because of less overhead NOOP may realize better performance of MariaDB with particular workloads.

DEADLINE is an I/O scheduler that’s orientated towards latency. Each I/O request has an assigned deadline, and these requests are usually stored according to sector numbers in queues (read and write). The DEADLINE algorithm maintains two extra queues (read and write) where the requests are also sorted according to the deadline. As long as no requests have timed out, the “sector” queue is used. If timeouts do occur, then requests from the “deadline” queue are served until expired requests have been gone through. The algorithm usually favors reads more than writes.

For PCIe devices (NVMe SSD drives), they have their own lengthy internal queues and fast service so they don’t gain any benefits from an I/O scheduler. We recommend you don’t have any explicit scheduler-mode configuration parameter.

You can check your scheduler setting with:

cat /sys/block/${DEVICE}/queue/scheduler

It resembles this example output:

cat /sys/block/sda/queue/scheduler

[noop] deadline cfq

In order to make it permanent, you need to edit the /etc/default/grub configuration file. Look for the variable GRUB_CMDLINE_LINUX adding “elevator” like this:

GRUB_CMDLINE_LINUX="elevator=noop"

Increase Open Files Limit

For optimum server performance of MariaDB, you need to keep the total number of client connections, database files, and log files below the operating system’s maximum file descriptor limit (ulimit -n). Linux systems limit the number of file descriptors that any single process can open to 1,024. On active database servers (and production ones in particular) it’s easy to reach the default system limit.
To create more headroom, edit /etc/security/limits.conf and add or specify this:

mysql soft nofile 65535

mysql hard nofile 65535

You’ll need to do a system restart, then give confirmation by running the following:

$ ulimit -Sn

65535

$ ulimit -Hn

65535

Alternatively, you might want to set this by using mysqld_safe if you are starting the mysqld process thru mysqld_safe,

[mysqld_safe]

open_files_limit=4294967295

or if you are using systemd,

sudo tee /etc/systemd/system/mariadb.service.d/limitnofile.conf <<EOF
[Service]

LimitNOFILE=infinity

EOF

sudo systemctl daemon-reload

Setting Swappiness on Linux for MariaDB

Linux Swap is important for MariaDB performance because it does the processing equivalent of a tire change. Memory leaks can have a negative impact on your work, slowing the machine down, but it will usually still be able to get through its assigned job.
To make changes to swappiness, just run:

sysctl -w vm.swappiness=1

This happens dynamically, with no need to reboot the server. To make it persistent, edit /etc/sysctl.conf and add the line,

vm.swappiness=1

It’s pretty common to set swappiness=0, but changes have been made following new kernel releases (i.e. kernels > 2.6.32-303), so you need to set vm.swappiness=1.

Filesystem Optimizations for MariaDB

Ext4 and XFS are the file systems most commonly used in Linux environments running MariaDB. Certain setups are also available for implementing an architecture using ZFS and BRTFS (as referenced in MariaDB’s documentation).
As well as this, the majority of database setups don’t need to record file access time, so if you’d like to turn it off when you mount the volume into the system, edit the file /etc/fstab. For example, on a volume named /dev/md2, it will look like this:

/dev/md2 / ext4 defaults,noatime 0 0

Set Your max_allowed_packet

MariaDB handles packets in a similar way to MySQL. It splits data into packets, which means that the client needs to be conscious of the max_allowed_packet value, which defines the maximum size of the packet which can be sent. The server stores the body in a buffer and its maximum size will correspond to that value. Exceed that limit and the socket closes, limiting MariaDB performance.

If it’s set too low, then the query will be triggered which means the client connection will be stopped and closed. That’s why you may see errors like ER_NET_PACKET_TOO_LARGE or find that your connection to the MySQL server has been interrupted during the query. So, what’s an ideal setting? We suggest starting out with 512MiB. If the application is only low demand, go with the default value (which has been 16MiB since MariaDB 10.2.4) and only alter it (via session) when your data need is likely to go higher, as with more demanding workloads where large packets need to be processed. Note that if the max_allowed_packet value is too small on the slave, this can also cause the slave to cut the I/O thread.

Using Threadpool

In some situations, this kind of tuning might not always be right for the best MariaDB performance. Thread pools work best when queries are fairly short and the load is CPU bound (OLTP workloads). If it isn’t CPU bound, you may still need to place a limit on the number of threads to conserve memory for the database memory buffers.

Threadpool is ideally suited to situations where you are looking for ways to reduce context switching on your system and sustain fewer threads than there are clients. But this number also shouldn’t be too low, because we don’t want to limit the use of the available CPUs. Consequently, the ideal for boosting MariaDB performance would be to have one active thread for each CPU.

Setting the thread_pool_max_threads, thread_pool_min_threads for maximum and minimum thread numbers is something you can’t do in MySQL, it’s unique to MariaDB.

The variable thread_handling sets how the server takes care of threads for client connections. As well as threads for client connections, it also applies to some internal server threads, including Galera slave threads.

Tuning Your Table Cache + max_connections

If you find that you’re sometimes seeing events in the process list to do with Opening tables and Closing tables statuses, this might mean that your table cache needs to be increased. You can also monitor this via the mysql client prompt by running SHOW GLOBAL STATUS LIKE ‘Open%table%’; and monitor the status variables.
For max_connections, if your application has need of multiple concurrent connections, set this to 500 to start with.
For table_open_cache, go with your total number of tables plus extra ones to account for the temporary tables that may need to be cached as well. So, if you’ve got 100 tables, it makes sense to specify 300.
Set your table_open_cache_instances variable to 8. This can reduce contention among sessions and so improve scalability. You can partition the open tables cache, dividing them into several smaller cache instances using table_open_cache / table_open_cache_instances as your guide.

For InnoDB, table_definition_cache places a soft limit on the total number of open table instances in the cache of the InnoDB data dictionary. The value that you set will determine how many table definitions can be stored in the definition cache. If you use multiple tables, you can create a sizeable table definition cache to make tables open faster. The table definition cache uses less space and doesn’t use file descriptors, in contrast to the normal table cache. The lowest possible value is 400. The default value is derived from the formula below, and is limited to 2000:

MIN(400 + table_open_cache / 2, 2000)

If the number of open tables is greater than the value in the table_definition_cache, the LRU mechanism will start earmarking any instances of tables that need to be removed and eventually evicts them from the data dictionary cache. Having a cache limit like this helps to minimize occasions when memory-hogging tables that aren’t used very much take up too much space unnecessarily, and so is key to improving Maria DB performance. It’s possible that there could be more table instances with cached metadata than is allowed by the limit set in table_definition_cache, because parent and child table instances with foreign key relationships aren’t put on the LRU list and so will not be liable for eviction from memory.In contrast to the table_open_cache, the table_definition_cache doesn’t use file descriptors and is a lot smaller.

Dealing with Query Cache

We think that disabling the query cache to improve the performance of MariaDB is the preferred option. You need to make sure that query_cache_type=OFF and query_cache_size=0 so that the query cache is completely disabled. In contrast to MySQL, MariaDB still supports query cache and doesn’t plan to withdraw support for it anytime soon. There are those who think that using query cache gives them performance benefits, but as this post from Percona demonstrates, an enabled query cache increases overhead and reduces server performance.

If you want to use query cache, ensure that you monitor it by running SHOW GLOBAL STATUS LIKE ‘Qcache%’;. Qcache_inserts reports on how many queries have been added to the query cache, Qcache_hits shows how many have made use of it, and Qcache_lowmem_prunes contains the number of queries that have been dropped because of insufficient memory. Over time, using query cache may cause it to become fragmented. A high Qcache_free_blocks to Qcache_total_blocks ratio may point to increased fragmentation. To defragment it, run FLUSH QUERY CACHE. This will defragment the query cache without dropping any queries and improve MariaDB performance.

Always Keep an Eye on Your Servers

Monitoring your MariaDB nodes is critical for maintaining the optimum performance of MariaDB. Popular monitoring tools (such as Nagios, Zabbix, or PMM) for those who would rather use open source and free tools are well-liked, but for enterprise-level tools, we would like to point you towards ClusterControl. It not only offers monitoring, it will also give you performance advisories, alarms and alerts when it thanks that you could improve your MariaDB performance.

Conclusion

Improving the performance of MariaDB requires the same sort of approach as you’d take with MySQL, apart from a few differences associated with different versions. MariaDB has gone its own way and in doing so as established itself as a trustworthy option among the community, so tuning and optimizing for better MariaDB performance is something that more and more people will be interested in.

Linux System Administration – Getting Started

Linux System Administration

If you’re new to Linux system administration this guide offers you some useful tips and an overview of some of the common issues that may cross your path. Whether you’re a relative newcomer or a Linux administration stalwart, we hope that this collection of Linux commands will prove useful.

Basic Configuration

One of your first tasks in the administration of Linux is configuring the system, but it’s a process that often throws up a few hurdles. That’s why we’ve collected some tips to help you ‘jump’ over them. Let’s go through it:

Set the Hostname

Use these commands to set the hostname correctly:

hostname

hostname -f

The first one needs to show your short hostname, while the one that follows it should show your FQDN—fully qualified domain name.

Setting the Time Zone

In Linux administration, setting your service time zone to the one that most of your users share is something that they’ll no doubt appreciate. But if they’re scattered across continents then it’ll be better to play it safe and go for UTC – Universal Coordinated Time, also known as GMT – Greenwich Mean Time.

Operating systems all have their own ways of letting you switch time zones:

Setting the Time Zone in Ubuntu or Debian

Type this next command and answer the questions that pop up when prompted:

dpkg-reconfigure tzdata

Setting the Time Zone in Arch Linux or CentOS 7

  1. See the list of time zones that are available:
  2. timedatectl list-timezones

Use the Up, Down, Page Up and Page Down keys to select the one you’re after, then either copy it or write it down. Hit q to exit.

  1. Set the time zone (change UK/London to the correct zone):
  2. timedatectl set-timezone 'UK/London'

Manually set the Time Zone – Linux System Administration

Locate the correct zone file in /usr/share/zoneinfo/ and link it to /etc/localtime. Here are some examples:

Universal Coordinated Time:

ln -sf /usr/share/zoneinfo/UTC /etc/localtime

Eastern Standard Time:

ln -sf /usr/share/zoneinfo/EST /etc/localtime

American Central Time (including Daylight Savings Time):

ln -sf /usr/share/zoneinfo/US/Central /etc/localtime

American Eastern Time (including Daylight Savings Time):

ln -sf /usr/share/zoneinfo/US/Eastern /etc/localtime

Configure the /etc/hosts File

In Linux System Administration the /etc/hosts file offers a list of IP addresses and their matching hostnames. This lets you set hostnames for an IP address in one location on the local machine, and then have many applications link to outside resources using their hostnames. The system of host files goes before DNS, so hosts files will always be referenced before a DNS query. This means that /etc/hosts can help you maintain small “internal” networks which as someone involved with Linux administration you might want to use in development or for managing clusters.

It’s a requirement of some applications that the machine identifies itself properly in the /etc/hosts file. Because of this, we strongly suggest you configure the /etc/hosts file not long after deployment.

127.0.0.1   localhost.localdomain   localhost

103.0.113.11    username.example.com   username

You can specify some hostnames separated by spaces on each line. Each of those lines needs to start with no more than one IP address. In the example above, swap out 103.0.113.11 for the IP address of your machine. Consider some extra /etc/hosts entries:

198.51.100.20   example.com

192.168.1.1     stick.example.com

Here, every request for the example.com domain or hostname is going to resolve to the IP address 198.51.100.20, which circumvents the DNS records for example.com and returns an alternative website.

The second line requests that the system looks to 192.168.1.1 for the domain stick.example.com. These types of host entries make administration of Linux easier – they are helpful for using “back channel” or “private” networks to get into other servers belonging to a cluster without the need to route traffic over the public network.

Network Diagnostics

Now let’s take a look at some simple Linux commands that there are useful for assessing and diagnosing network problems. If you think you might be having connection problems, you can add the output from the appropriate commands to your support ticket.  This will assist staff in resolving your issues. If your network problems are happening intermittently then this can be especially helpful.

The ping Command

The ping command lets you test the quality of the connection between the local machine and an external machine or address. These commands “ping” google.com and 215.48.207.120:

ping google.com

ping 215.48.207.120

They send an ICMP packet, which is a small amount of data to the remote host,  then they await a response. If the system can make a connection, it will let you know the “round trip time” for each packet. Here’s what that looks like for four pings to google.com:

PING google.com (216.58.217.110): 56 data bytes

64 bytes from 216.58.217.110: icmp_seq=0 ttl=54 time=17.721 ms

64 bytes from 216.58.217.110: icmp_seq=1 ttl=54 time=15.374 ms

64 bytes from 216.58.217.110: icmp_seq=2 ttl=54 time=15.538 ms

The time field tells you how long each individual packet took to complete the round trip in milliseconds. In Linux Administration, when you’ve got all the information you want, you can interrupt the process using Control+C. It will then show you some statistics that look like this:

--- google.com ping statistics ---

4 packets transmitted, 4 received, 0% packet loss, time 3007ms

rtt min/avg/max/mdev = 34.880/41.243/52.180/7.479 ms

These are the ones you should take note of:

  • Packet Loss, this takes the difference between how many packets were sent and how many came back to you and expresses it as a percentage.
  • Round Trip Time (rtt) tells you all the ping responses. “min” is the fastest packet round trip, and in this case, it took 34.88 milliseconds. “avg” is the average round trip, and that took 41.243 milliseconds. “max” is the longest a packet took, which was 52.18 milliseconds. “mdev” shows a single standard deviation unit, and for these four packets, it was 7.479 milliseconds.

In your administration of Linux, the ping command is useful for giving you a rough measure of point-to-point network latency, and if you want to establish that you definitely are connected to a remote server then this is the tool that can tell you.

The traceroute Command

The traceroute command tell you a bit more than the ping command. It can trace the packet’s journey from the local machine to the remote machine and report the number of hops (meaning each step using an intermediate server) it took on the way. This can be useful when you’re investigating a network issue because packet loss in one of the first few hops tells you that the problem may be with the user’s Internet service provider (ISP) or local area network (LAN), rather than your administration of Linux. But if packets were being shared near the end of the route, this could indicate a problem with the service connection.

This is what output from a traceroute command typically looks like:

traceroute to google.com (74.125.53.100), 30 hops max, 40 byte packets

1 207.192.75.2 (207.192.75.2) 0.414 ms 0.428 ms 0.509 ms

2 vlan804.tbr2.mmu.nac.net (209.123.10.13) 0.287 ms 0.324 ms 0.397 ms

3 0.e1-1.tbr2.tl9.nac.net (209.123.10.78) 1.331 ms 1.402 ms 1.477 ms

4 core1-0-2-0.lga.net.google.com (198.32.160.130) 1.514 ms 1.497 ms 1.519 ms

5 209.85.255.68 (209.85.255.68) 1.702 ms 72.14.238.232 (72.14.238.232) 1.731 ms 21.031 ms

6 209.85.251.233 (209.85.251.233) 26.111 ms 216.239.46.14 (216.239.46.14) 23.582 ms 23.468 ms

7 216.239.43.80 (216.239.43.80) 123.668 ms 209.85.249.19 (209.85.249.19) 47.228 ms 47.250 ms

8 209.85.241.211 (209.85.241.211) 76.733 ms 216.239.43.80 (216.239.43.80) 73.582 ms 73.570 ms

9 209.85.250.144 (209.85.250.144) 86.025 ms 86.151 ms 86.136 ms

10 64.233.174.131 (64.233.174.131) 80.877 ms 216.239.48.34 (216.239.48.34) 76.212 ms 64.233.174.131 (64.233.174.131) 80.884 ms

The hostnames and IP addresses sitting before and after a failed jump can help you determine whose machine is involved with the routing error. Lines with three asterisks (* * *) indicate fail jumps.

If you’re trying to fix network issues or someone like your ISP is looking into it for you then traceroute output can help track down the problem, and recording traceroute information can really help when the issue only happens infrequently.

The mtr Command

As with the traceroute tool, the mtr command is important in Linux System Administration. It can tell you about the route that internet traffic takes between the local system and a remote host. However, mtr also gives you extra information about the round-trip time for the packet, too. Think of mtr as a bit like a mixture of traceroute and ping.

An output from an mtr command might look like this:

HOST: username.example.com              Loss%   Snt     Last    Avg     Best    Wrst    StDev

  1. 256.129.75.4                    0.0%    10      0.4     0.4     0.3     0.6     0.1
  2. vlan804.tbr2.mmu.nac.net        0.0%    10      0.3     0.4     0.3     0.7     0.1
  3. 0.e1-1.tbr2.tl9.nac.net         0.0%    10      4.3     4.4     1.3     11.4    4.1
  4. core1-0-2-0.lga.net.google.com  0.0%    10      64.9    11.7    1.5     64.9    21.2
  5. 209.85.255.68                   0.0%    10      1.7     4.5     1.7     29.3    8.7
  6. 209.85.251.9                    0.0%    10      23.1    35.9    22.6    95.2    27.6
  7. 72.14.239.127                   0.0%    10      24.2    24.8    23.7    26.1    1.0
  8. 209.85.255.190                  0.0%    10      27.0    27.3    23.9    37.9    4.2
  9. gw-in-f100.1e100.net            0.0%    10      24.1    24.4    24.0    26.5    0.7

As with the ping command, mtr is great for Linux administration. In this case it tells you real-time connection speed. Use CONTROL+C to stop it manually and use the –report flag to make it stop automatically after 10 packets and produce a report, like this:

mtr --report

Don’t be surprised when it pauses while it’s producing the output. This is perfectly normal.

Linux System Diagnostics

If you’re having trouble with your system and it’s not related to networking or some other application problem, it might be useful to rule out hardware and issues at the operating system level. These tools can help you diagnose and fix such problems.

If you discover a problem with memory usage, you can use these tools and methods to find out exactly what’s causing it.

Check Level of Current Memory Use

Use this command:

free -m

Possible output should look like this:

            total       used       free     shared    buffers     cached

Mem:          1997        898       1104        105         34        699

-/+ buffers/cache:        216       1782

Swap:          255          0        255

Output like this will require some close reading to understand. It’s saying that the system is using 898 megabytes of memory (RAM) out of a total 1997 megabytes, and 1104 megabytes our free. Although, there’s also 699 megabytes of stale data in the system, buffered and Held in the cache. The operating system will empty its caches if more space is required, but it will hold onto a cache if no other process wants to use it. A system that uses Linux Administration will usually leave old data sitting in RAM until it’s needed for something else, so don’t worry if it looks like there is very little free memory.

In the example above, there are only 1782MB of free memory, which means that’s all that any extra process application will have left to work with.

Use vmstat to Monitor I/O Usage

The vmstat tool tells you about memory, swap utilization, I/O wait, and system activity. It’s especially good for the diagnosis of I/O-type difficulties. Here’s an example:

vmstat 1 20

This runs a vmstat every second for twenty seconds, so it will pick up a sample of the current system state. Here’s how the output will typically look:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa

 0  0      4  32652  47888 110824    0    0 0     2   15   15  0  0 100  0

 0  0      4  32644  47888 110896    0    0 0     4  106  123  0  0 100  0

 0  0      4  32644  47888 110912    0    0 0     0   70  112  0  0 100  0

 0  0      4  32644  47888 110912    0    0 0     0   92  121  0  0 100  0

 0  0      4  32644  47888 110912    0    0 0    36   97  136  0  0 100  0

 0  0      4  32644  47888 110912    0    0 0     0   96  119  0  0 100  0

 0  0      4  32892  47888 110912    0    0 0     4   96  125  0  0 100  0

 0  0      4  32892  47888 110912    0    0 0     0   70  105  0  0 100  0

 0  0      4  32892  47888 110912    0    0 0     0   97  119  0  0 100  0

 0  0      4  32892  47888 110912    0    0 0    32   95  135  0  0 100  0

 

The memory and swap columns give you the same kind of information as the “free -m” command, although in a format that’s a little more difficult to comprehend. The last column in most installations provides the most relevant information—the wa column. It shows how long the CPU spends idling while it waits for I/O operations to be completed.

If the number there is frequently a lot greater than 0, then this points to an I/O usage issue, but if the vmstat output is similar, don’t worry, because it’s not that.

Administration of Linux is sometimes hit with an intermittent issue, so run vmstat when it happens to let you diagnose it correctly, or at least discount the possibility of an I/O issue. Any support staff helping you will welcome vmstat output to help them diagnose problems.

Monitor Processes, Memory, and CPU Usage with htop

You can get a more ordered view of your system’s state in real time by using htop. You’ll have to add it to most systems yourself, and, depending on your distribution, you’ll use one of these commands to do so:

apt-get install htop

yum install htop

pacman -S htop

emerge sys-process/htop

To start it, type:

htop

Press the F10 or Q keys at any time when you want to quit. Some htop behaviors may seem hard to fathom to start with, so be aware of the following:

  • The memory utilization graph shows cached memory, used memory and buffered memory, while the numbers displayed at the end of it indicate the total amount that’s available and the total amount installed as reported by the kernel.
  • The htop default configuration shows all application threads as separate processes, which might not be obvious if you weren’t aware of it. If you prefer to disable this then select the “setup” option with F2, then “Display Options,” and then toggle “Hide userland threads”.

The F5 key lets you toggle a “Tree” view that arranges the processes in a hierarchy. This is handy because it lets you see which processes were spawned by other processes and it shows it in an organized way. This can help you diagnose an issue when it’s hard to tell one process from another.

File System Management

The FTP protocol has often been used by web developers and editors to manage and transfer files on a remote system. But the problem with FTP is that it’s very insecure and doesn’t offer a very efficient way of managing for managing the files on your system when you have SSH access.

If you’re new to Linux systems administration you might want to use WinSCP instead, with rsync used to synchronize files using SSH and the terminal.

Uploading Files to a Remote Server

If you have used an FTP client before, the OpenSSH is similar, and you can use it over the SSH protocol. Dubbed “SFTP,” numerous clients such as WinSCP for Windows, Cyberduck for Mac OS X, and Filezilla for Linux, OS X, and Windows desktops support this protocol.

If you’re familiar with FTP, then you’ll be comfortable with SFTP. If you’ve got access to a file system at the command line then you’ll automatically have the same access over SFTP, so bear this in mind when you set up user access.

You can also use Unix utilities such as scp and rsync to securely transfer your files. A command to copy team-info.tar.gz on a local machine would look like:

scp team-info.tar.gz [email protected]:/home/username/backups/

After the scp command comes the path of the file on the local file system that you want to transfer, followed by the username and hostname of the remote machine separated by an “@” symbol. Use a colon (:) after the hostname and then put the path on the remote server where the file will be uploaded to. Here’s a less specific example:

scp [/path/to/local/file] [remote-username]@[remote-hostname]:[/path/to/remote/file]

OS X and Linux machines make this command available by default. It’s useful for copying files between remote servers in Linux Administration. If you use SSH keys, you can use the scp command without needing a password for each transfer.

The syntax of scp follows the form scp [source] [destination]. If you want to do the reverse operation and copy files from a remote host to your local machine and simply swap destination and source.

Protecting Files on a Remote Server

As someone involved with Linux Administration, it’s important to maintain file security when you let a number of users have network access to your network-accessible servers.

Best practices for security include:

  • Only giving users the minimum permissions required for whatever tasks they need to complete.
  • Only running services on public interfaces that are in active use. A frequent source of security vulnerabilities comes from unused daemons that have been left running, and this holds equally true for database servers, HTTP development servers, and FTP servers, too.
  • When you can, use SSH connections to encrypt any sensitive information that you want to transfer.

Symbolic Links

Symbolic linking, often referred to as “symlinking”, lets you create objects in your file system that can point to other objects. This is useful in the Administration of Linux if you want to let users and applications access particular files and directories without having to reorganize all your folders. This approach lets users have restricted access to your web-accessible directories without moving your DocumentRoot to their home directories.

Type a command in the following format to set up a symbolic link:

ln -s /home/username/config-git/etc-hosts /etc/hosts

This creates a link of the file etc-hosts at the location of the system’s /etc/hosts file. More generically:

ln -s [/path/to/target/file] [/path/to/location/of/sym/link]

Here are some features of the link command to be aware of:

  • The location of the link, which is the last term, can be left out, and if you do that, then one with the same name as the file you’re linking to will be created in the current directory.
  • When specifying the link location, make sure that the path doesn’t have a slash at the end. You can produce a symlink that targets a directory, but make sure that it doesn’t end with a slash.
  • If you take out a symbolic link this won’t affect the target file.
  • When you create a link, you can use relative or absolute paths.

Managing Files on a Linux System

If you’re new to handling files via the terminal interface as part of your Linux system administration role, here’s a list of basic commands to help you.

To copy files:

cp /home/username/todo.txt /home/username/archive/todo.01.txt

This will copy todo.txt to an archive folder and then append a number to the file name. If you want to repeatedly copy every file and subdirectory in one directory into another, use -R in the command like this:

cp -R /home/username/archive/ /srv/backup/username.01/

To move a file or directory:

mv /home/username/archive/ /srv/backup/username.02/

You can also rename a file using use the mv command.

To delete a file:

rm scratch.txt

This deletes the scratch.txt file from the current directory.

Package Management

Administration of Linux is made much easier by the package management tools that come with the majority of Linux systems. These make it simple to centrally install and maintain your system’s software. Installing your software manually makes it harder to manage dependencies and keep your system up to date. Package management tools help keep you on top of the majority of such tasks, so here are some basic package management tasks for use in Linux administration.

Track Down Packages Installed on Your System

Packages are easy to install and they often produce multiple dependencies that can be easy to lose sight of. These commands list all the packages installed on your system:

On Debian and Ubuntu systems:

dpkg -l

This example shows the first few lines of the output of this command on a production Debian Lenny system.

||/ Name                         Version                      Description

+++-============================-============================-===============================

ii  adduser                      3.110                        add and remove users and groups

ii  apache2-mpm-itk              2.2.6-02-1+lenny2            multiuser MPM for Apache 2.2

ii  apache2-utils                2.2.9-10+lenny4              utility programs for webservers

ii  apache2.2-common             2.2.9-10+lenny4              Apache HTTP Server common files

ii  apt                          0.7.20.2+lenny1              Advanced front-end for dpkg

ii  apt-utils                    0.7.20.2+lenny1              APT utility programs

ii  bash                         3.2-4                        The GNU Bourne Again SHell

On CentOS and Fedora systems:

yum list installed

This example shows a few lines of the output from this command:

MAKEDEV.i386                 3.23-1.2                  installed

SysVinit.i386                2.86-15.el5               installed

CentOS and Fedora systems show the name of the package (SysVinit), the architecture it was compiled for (i386), and the build version installed on the system (2.86-15.el5).

For Arch Linux systems:

pacman -Q

This command pulls up a complete list of the packages installed on the system. Arch also lets you filter the results so that it only shows those packages that were explicitly installed (with the -Qe option) or that were installed automatically as dependencies (with the -Qd option). The command above is actually a combination of the output of two commands:

pacman -Qe

pacman -Qd

Here’s an example of the output:

perl-www-mechanize 1.60-

perl-yaml 0.70-1

pkgconfig 0.23-1

procmail 3.22-2

python 2.6.4-1

rsync 3.0.6-1

On Gentoo Linux systems:

emerge -evp --deep world

Here’s an example of this output:

These are the packages that would be merged, in order:

Calculating dependencies... done!

   [ebuild   R   ] sys-libs/ncurses-5.6-r2  USE="unicode -debug -doc -gpm -minimal -nocxx -profile -trace" 0 kB

   [ebuild   R   ] virtual/libintl-0  0 kB

   [ebuild   R   ] sys-libs/zlib-1.2.3-r1  0 kB

Because it’s usual for so many packages to be installed on most systems, these commands can produce quite a large output, so it can be used for tools like grep and less to narrow your results. For example:

dpkg -l | grep "python"

This will pull up a list of all packages where the name or description features the word “python.” You can also use less in a similar way:

dpkg -l | less

This gives you the same list as the basic “dpkg -l; but the results will appear in the less pager, which will let you search and scroll more easily.

Adding | grep “[string]” to these commands will let you filter package list results, or with all distributions you can add | less to show the results in a pager.

Finding Package Names and Information

The name of the package isn’t always intuitive, because it doesn’t always look like the name of the software. That’s why many package management tools exist to help you search the package database. Such tools are great for finding a particular piece of software when you don’t know its name and they make Linux Administration a lot easier.

For Debian and Ubuntu systems:

apt-cache search [package-name]

This searches the local package database for a particular term and then produces a list with descriptions. Here’s some of the output for apt-cache search python :

txt2regex - A Regular Expression "wizard", all written with bash2 builtins

vim-nox - Vi IMproved - enhanced vi editor

vim-python - Vi IMproved - enhanced vi editor (transitional package)

vtk-examples - C++, Tcl and Python example programs/scripts for VTK

zope-plone3 - content management system based on zope and cmf

zorp - An advanced protocol analyzing firewall

groovy - Agile dynamic language for the Java Virtual Machine

python-django - A high-level Python Web framework

python-pygresql-dbg - PostgreSQL module for Python (debug extension)

python-samba - Python bindings that allow access to various aspects of Samba

Be aware that apt-cache search queries all the records relating to every package and not just the titles and the descriptions shown here, which is why vim-nox and groovy are included, as both mention python in their descriptions. To view the complete record on a package use:

apt-cache show [package-name]

This will tell you about the maintainer, the dependencies, the size, the upstream project’s homepage, and the software’s description.

On CentOS and Fedora systems:

yum search [package-name]

This creates a list of all the packages in the database matching the given term. Here’s what the output of yum search wget typically looks like:

Loaded plugins: fastestmirror

 Loading mirror speeds from cached hostfile

  * addons: centos.secsup.org

  * base: centos.secsup.org

  * extras: centos.secsup.org

  * updates: styx.biochem.wfubmc.edu

 ================================ Matched: wget =================================

 wget.i386 : A utility for retrieving files using the HTTP or FTP protocols.

The package management tools can tell you more about any individual package. To get a complete list from the package database use this command:

yum info [package-name]

This output will give you more detailed information about the package, its purpose, origins and dependencies.

On Arch Linux systems:

pacman -Ss [package-name]

This will search the local package database. Here’s a snippet from the results that a search for “python” would bring up:

extra/twisted 8.2.0-1

Asynchronous networking framework written in Python.

community/emacs-python-mode 5.1.0-1

    Python mode for Emacs

The terms “extra” and “community” tell you where the software is sitting. To ask for additional information regarding a particular package, your command should be set out like this:

pacman -Si [package-name]

If you run pacman with the -Si option, it will get the record for the package from the database that includes a brief description, package size and dependencies.

For Gentoo Linux systems:

emerge --search [package-name]

emerge --searchdoc [package-name]

The first command will just look for package names in the database. The second one will search for both names and descriptions. These commands will let you search your local package tree (i.e., portage) for a particular package name or term. The output of either command will look similar to the example below.

Searching...

 [ Results for search key : wget ]

 [ Applications found : 4 ]

 *  app-emacs/emacs-wget

       Latest version available: 0.5.0

       Latest version installed: [ Not Installed ]

       Size of files: 36 kB

       Homepage:      http://pop-club.hp.infoseek.co.jp/emacs/emacs-wget/

       Description:   Wget interface for Emacs

       License:       GPL-2

Since the output you’ll get from the emerge –search command will be so long-winded, there isn’t a tool to show you more information, unlike in some of the other distributions. If you want to narrow your search results down even more you can use regular expressions with the emerge –search command.

In Linux administration, produce a lot of text, so tools like grep and less can be very useful for making the results more easy to scroll through. For example:

apt-cache search python | grep "xml"

This will bring up all those packages that matched for the search term “python” and that also have “xml” somewhere in their name or description. In the same way:

apt-cache search python | less

This will give you the same list as the simple apt-cache search python but the results will be displayed in the less pager. This makes it easier to search and scroll.

If you add | grep “[string]” to these commands it will filter package search results, or you can use | less to show the results in the less pager. This works across all distributions.

Text Manipulation

On Linux and UNIX-like systems, the vast majority of system configuration information is held in plain text format, so next up are some basic Linux commands and tools for working with text files.

Search for a String in Files with grep

In Linux system administration the grep tool lets you search for a term or regex pattern within a stream of text, like a file or the output from a command.

Let’s look at how to use the grep tool:

grep "^Subject:.*HELP.*" /home/username/mbox

This will search your email subject headers which begin with any amount of characters, and which contain the word “help” in capital letters and are followed by any number of extra characters. It would then show the results in the terminal.

The grep tool gives some extra options, and if you use them, they force the program to return the context for each match (e.g., with -C 2 for two lines of context). With -n, grep it produces the line number of the match. With -H, grep it gives you the file name of each match, which is handy when you “grep” a group of files or when you repeatedly “grep” through a file system (using -r). Type grep –help for extra options.

To grep a collection of files, you can specify the file using a wildcard:

grep -i "jones" ~/org/*.txt

This will return every time the word “jones,” shows up. Case gets ignored because of the -i instruction. The grep tool will search all files in the ~/org/ directory that have got a .txt extension.

You can use it to filter the results from a different command that sends output to standard out (stdout). It manages this by “piping” the output of one command into grep. For example:

ls /home/username/data | grep "7521"

In this example, we assume that there are a lot of files with a UNIX timestamp in their file names in the /home/username/data directory. The command will filter the output so it only shows files with the digits “7521” in their file names. In these cases, grep only filters the output of ls and doesn’t check the contents of the file itself.

Search and Replace In a Group of Files

The sed tool, or the Stream EDitor, can search for a regex pattern and replace it with another string. Use it as an alternative to the grep tool, which is strong on text filtering of regular expressions, but not as good with editing a file or otherwise manipulating text.

Do be warned that sed is powerful enough to do a lot of damage if you don’t know how to wield it safely, so we suggest that you make backups so you can test your sed commands in safety before you run them. Here’s a simple sed one-liner, to demonstrate its syntax:

sed -i `s/^good/BAD/` singularity.txt

This replaces any appearances of the word “good” at the beginning of a line (noted by the ^) with the string “BAD” in the file singularity.txt. The -i option tells sed to do the replacements “in place.” The sed command can produce backups of the files that it edits if you include a suffix after the -i option, as in -iBAK. In the above example, it would back up the original file as morning-star.txt.BAK before making changes.

A sed statement is generally formatted to look like:

's/[regex]/[replacement]/'

To match literal slashes (/), you must escape them by using a backslash (\), which is to say that if you want to match a / character you would need to use \/ in the sed expression. When searching for a string with a number of slashes, you can swap them for a different character. For example:

's|r/e/g/e/x|regex|'

This would remove the slashes from the string r/e/g/e/x so that it would become regex after the sed command was run on the file that contains the string.

This example searches and replaces one IP address with another. In this case, 97.22.58.33 is replaced with 87.65.33.31:

sed -i 's/97\.22\.58\.33/87\.65\.33\.31/'

Here, period characters are escaped as \.. In regular expressions, the full-stop (period) character matches with any character if you don’t escape it.

Edit Text

You’ll often need to use a text editor to edit the contents of a file, and some distribution templates include the vi/vim and nano text editors. Both are small yet powerful tools that are at home manipulating text in the terminal environment.

Other options are available though, including emacs and “zile.” Use your operating system’s packet manager to install these programs if you want. Be sure to search your package database in order to install a version that has been compiled without GUI components (i.e. X11).

To open a file, type a command that begins with the name of the editor you would like to run then the name of the file you want to edit. Here are some examples of commands that open the /etc/hosts file:

nano /etc/hosts

vi /etc/hosts

emacs /etc/hosts

zile /etc/hosts

Once you’ve edited a file, save and exit the editor to get back to the prompt. The actual procedure as a bit different with each editor. In emacs and zile it’s the same key sequence. You hit ctrl, x and s to save, usually written as “C-x C-s” and then it’s “C-x C-c” to close the editor. In nano, use Control-O (written as \^O) and confirm the file name to write the file. Hit Control-X to exit.

For administration of Linux it helps to know that vi and vim are modal editors, and the way they work is a little more complicated. After you open a file in vi, you press the “I” key to switch to insert mode, which will allow you to edit text in the usual way. To save the file, you need to go back into “normal” mode, so just press the escape key (Control-[ also works), and type:wq to write the file and exit the program.

This is just a brief introduction to using these text editors in Linux system administration, but there are many online resources available online that will  help you go from beginner to expert.

Webservers and HTTP Issues

It’s best to install and configure your webserver in a way that best suits your application or website. Let’s go over a number of basic webserver tasks and functions and offer some advice for beginners.

Serve Websites

Webservers work by listening on a TCP port, usually port 80 for HTTP and port 443 for HTTPS. When a visitor requests content, the servers respond by delivering it. Resources are usually specified with a URL that has the protocol, http or https; a colon and two slashes, ://; hostname or domain, www.example.com or username.example.com; followed by a file path, /images/avatar.jpg, or index.html. A complete URL would look something like: http://www.example.com/images/avatar.jpg.

To offer these resources to visitors, your system must be running a webserver. There are lots of different HTTP servers and endless configurations to support various web development frameworks. The three recommended webservers for general use are Apache HTTP server, Lighttpd, and Nginx. There are pluses and minuses for all of them, and the one you choose will largely depend on a combination of your needs and your experience.

Once you’ve decided which webserver go for, you need to decide what (if any) scripting support you need to install. Scripting support lets your webserver run dynamic content and also program server-side scripts in languages like Python, PHP, Ruby, and Perl.

How to Choose a Webserver

Most visitors don’t know which webserver you use so the one you choose really comes down to your own requirements and preferences. This can make Linux system administration a challenge for anyone new to it, so let’s consider some of your choices.

The Apache HTTP Server is thought by many to be the ideal webserver. It’s the open-source option that’s used more than any other, its configuration interface has enjoyed many years of stability and its modular architecture suits all kinds of deployments. Apache is the basis of the LAMP stack, and it helps to integrate dynamic server-side apps into the webserver.

The thing with webservers like Lighttpd and nginx is that they’re more weighted towards serving static content efficiently. If you’re dealing with high demand and limited server resources then one of these servers might be the better option. Lighttpd and nginx offer stability and functionality and they don’t strain system resources, but on the downside, they can be harder to configure when you want to integrate dynamic content interpreters.

So, choose your Webserver according to your needs, taking into account factors like the type of content you’ll be serving, how in-demand it will be, and how comfortable you are managing Linux system administration with that software.

Apache Logs

With Apache, webserver problems can be difficult to troubleshoot, but there are known common issues which will give you clues about where to start. When things get a little trickier Linux administration you might need to look through the Apache error logs.

These are located in the /var/log/apache2/error.log file by default (on Debian-based distributions). You can track or “tail” this log with this command:

tail -F /var/log/apache2/error.log

We suggest you add a custom log setting:

Configuring Apache Virtual Host

1 ErrorLog /var/www/html/example.com/logs/error.log CustomLog /var/www/html/example.com/logs/access.log combined

Here example.com is a stand-in for the name of your virtual host and the place where its resources are kept. Apache creates two log files with logged information relating to that virtual host, making administration of Linux easier as you troubleshoot errors on specific virtual hosts. To track or tail the error log:

tail -F /var/www/html/example.com/logs/error.log

This displays new error messages when they appear. You can take specific parts of an error message from an Apache log and do a web search to diagnose problems. Common ones include:

  • Missing files, or mistakes in file names
  • Permissions errors
  • Configuration errors
  • Dynamic code execution or interpretation errors

DNS Servers and Domain Names

DNS stands for Domain Name System, and it’s the service used by the Internet to link the difficult-to-remember chain of numbers in IP addresses with more memorable domain names. This section will look at some DNS-type tasks.

Redirect DNS Queries using CNAMEs

Using CNAME DNS records makes it possible to redirect requests for one hostname or domain to a different hostname or domain. This helps when you need to reroute requests for one domain to a different one, thus avoiding the need to set up a webserver to handle such requests.

CNAMEs only work in relation to redirecting from one domain to another. If you need to point a full URL somewhere else, you’ll have to set up a webserver and do some server-level redirection configuration and/or web hosting. CNAMEs let you redirect subdomains, like team.example.com, to other ones, like jill.example.org. CNAMEs have to point to a valid domain with a valid A Record, or to another CNAME.

Despite some limitations, CNAMEs can be occasionally quite helpful in the administration of Linux, particularly if you need to switch a machine’s hostname.

Setting Up Subdomains

A name that comes before a first-level domain indicates that it’s a subdomain. In  team.example.com, team is a subdomain for the root domain example.com.

Follow these steps to create and host a sub-domain:

  1. First, create an A Record for the domain in the DNS zone. You can do this using the DNS Manager. You can host the DNS for your domain with the provider of your choice.
  2. Set up a server to respond to requests sent to this domain. For webservers like Apache, you’ll need to configure a new virtual host. For XMPP servers configure another host to accept the requests for this host. For more information, consult the Linux system administration documentation for the particular server you want to deploy.
  3. Configured subdomains work almost like root domains on your server. You can set up HTTP redirection for the new subdomain if you need to.

SMTP Servers and Email Issues

In this section, we’ll be looking at setting up email to suit your requirements and configuring your system to send email.

Which Email Solution?

Email functionality with Linux administration hinges on two major components. The SMTP server or “Mail Transfer Agent” is the most significant one. The MTA—as it’s known—sends mail between servers. The second part of the system is when a server gets mail to the user’s own machine. These servers often use a protocol like POP3 or IMAP to give remote access to the mailbox.

The email server tool chain can also feature other components, which you might have access to depending on your deployment. They include filtering and delivery tools such as procmail, anti-virus filters such as ClamAV, mailing list managers like MailMan, and spam filters like SpamAssassin. These components work independently of the MTA and remote mailbox server.

The most widely used SMTP servers or MTAs in the UNIX-like arena are Postfix, Exim, and Sendmail. Sendmail is the oldest and lots of Linux adminsitration professionals know it well. Postfix is modern and robust, and it slots into many different configurations. Exim is the standard MTA in Debian systems, and many feel that it’s easier to use for basic tasks. Servers like Courier and Dovecot are also popular for remote mailbox access,

If you’re looking for an email solution that is easy to install, you could take a look at Citadel groupware server. Citadel offers an integrated “turnkey” solution that comes with an SMTP server, remote mailbox access, real time collaboration tools including XMPP, and a shared calendar interface.

If you’re looking for a simpler and modular email stack, it’s worth taking a look at Postfix SMTP server.

Sending Email From Your Server

For simple configurations, you might not need a full email stack, but applications running on that server we’ll still need to be able to send mail for notifications and to meet other day-to-day needs.

We can’t go into configuring applications to send notifications and alerts in this guide, but the majority of applications come with a simple “sendmail” interface, which you can access via several common SMTP servers that include Postfix and msmtp.

To install Postfix on Debian and Ubuntu systems:

apt-get install postfix

On CentOS and Fedora systems:

yum install postfix

When you’ve installed Postfix, your applications should be able to access the sendmail interface, which can be found at /usr/sbin/sendmail. The majority of applications running on your system should be capable of sending mail with this setup.

If you need to use your server to send email through an external SMTP server, you might want to think about a simpler tool like msmtp because it’s included in the majority of distributions, and it can be installed using the appropriate command:

apt-get install msmtp

yum install msmtp

pacman -S msmtp

Use type msmtp or which msmtp, to find where msmtp is on your system (usually at /usr/bin/msmtp). You can set authentication credentials with command line arguments or by declaring SMTP credentials in a configuration file.

Linux vs Unix – What’s the Difference?

Linux vs Unix

Most software developers under the age of 40 wouldn’t have thought about Unix vs Linux many years ago. In recent years they will have mostly known Linux as the dominant operating system, particularly in the data center, where it might now be the OS of choice as much as 70% of the time (although this is an estimate—it’s hard to get a definitive figure) and Windows variants account for most of the other 30%. Developers who use any major public cloud can assume the target system will run Linux, and the explosion of Android and Linux-based systems in smartphones, TVs, automobiles, and other devices also point to its prevalence, so the question of Linux vs Unix appears decided.

Despite this, even those software developers who have only known the rise of Linux will have at least heard that Unix exists, even if it’s only on those occasions when they’ve heard Linux being described as a “Unix-like” operating system. Some may still wonder which is best; who would win in a Linux vs Unix contest? That’s a good question, but we really need to explore the history of these two contenders in order to answer it conclusively.

So, what’s Unix? The caricatures that swirl around the creation of this OS feature bearded men with elbow patches and sandals writing C code and shell scripts on green screens back  in the 1970s. But Unix actually has a much richer history than such easy stereotypes would suggest, and we’ll attempt to lay out some of it, along with the differences between Linux and Unix, in this article.

The beginnings of Unix

A small team of programmers at AT&T Bell Labs in the late 1960s wanted to create a multi-user, multi-tasking operating system for a machine called the PDP-7, and two of the team’s most notable members were Ken Thompson and Dennis Ritchie. A lot of Unix’s concepts are recognizable from its predecessor, Multics, which was rewritten in the 1970s by the Unix team using the C language. That’s what sets Unix apart from all the rest.

Back then, it wasn’t common for operating systems to be portable. Use of low-level source language meant that the hardware platform that an operating system had been written for was the one that it was stuck with. But Unix being written in C made it possible to port it to other hardware architectures.

As well as its portable nature—which assisted Unix’s quick adoption in other research, academic, and commercial settings—some of the operating system’s core design concepts made it attractive to programmers and users. Ken Thompson’s Unix philosophy was geared towards modular software design, the idea of which was that small, purpose-built programs could be combined to tackle large and complicated tasks. Because Unix had been designed around pipes and files, this approach to “piping” the inputs and outputs of programs together into a direct set of operations on the input is still popular today. In fact, the present cloud functions-as-a-service (FaaS)/serverless paradigm has its origins in the Unix way of thinking.

Quick growth and competition

Unix grew in popularity through the 1970s and 1980s, expanding into research, academia, and commercial business, but Unix wasn’t open source software though. This meant that anyone wanting to use it needed to buy a licence from AT&T, which owned it. The University of Illinois bought the first known software licence in 1975

Thanks to Ken Thompson’s sabbatical at Berkeley University in the 1970s, a lot of Unix activity got underway there, resulting in the creation of Berkeley Software Distribution, or BSD. At first, BSD wasn’t offering a competitor to AT&T’s Unix, just an add-on with some extra software and capabilities. When 2BSD (the Second Berkeley Software Distribution) came along in 1979, Bill Joy, a Berkeley graduate student had made more programs available like vi and the C shell (/bin/csh).

Along with BSD, which enjoyed enduring popularity as a member of the Unix family, commercial Unix offerings became prevalent throughout the 1980s and early 1990s thanks to names such as HP-UX, IBM’s AIX, Sun’s Solaris, Sequent, and Xenix. As different branches of the Unix family tree took shape, the “Unix wars” ensued, and the community was now focused on standardization. The results came in 1988 with the POSIX standard, and The Open Group added more follow-on standards in the 1990s.

This period saw Sun and AT&T release System V Release 4 (SVR4), which many commercial vendors were quick to pick up. The BSD group of operating systems had been busy growing too, resulting in various open source versions that came out under the BSD license, including FreeBSD, OpenBSD, and NetBSD.  Many of these variants of Unix still get used today, although a lot have experienced a decline in their share of the server market, with many managing no better than single digits popularity. BSD might currently have more installations than any modern Unix system. Also, BSD can be found in every Apple Mac hardware unit sold recently, as the OS X (now macOS) operating system it uses is derived from BSD.

We could say a lot more about Unix’s history but that’s beyond the scope (and the length) of this piece. We are more interested in talking about Unix vs Linux, so let’s look at how Linux got started.

Linux Appears

The Linux operating system is the descendent of two projects that began in the 1990s. Richard Stallman wanted to build a free and open source alternative to Unix. He called the program GNU, a recursive acronym that meant “GNU’s not Unix!” A kernel project got going, but progress was difficult, and with no kernel, any hopes of a free and open source operating system would be in vain. But then came Linus Torvald with a feasible kernel named Linux that completed project. Linus used a number of GNU tools (like the GNU Compiler Collection, or GCC) and they proved to be a perfect match for the Linux kernel.

Linux distributions appeared thanks to GNU components, the Linux kernel, MIT’s X-Windows GUI, and other BSD components that were permitted under the open source BSD license. Slackware and then Red Hat distributions were popular because they enabled the typical 1990s PC user to use the Linux operating system. For many this complemented the proprietary Unix systems they used in their working or academic lives, so out of Linux and Unix, Linux offered clear appeal.

Because of the free and open source nature of Linux components, anyone was allowed to create a Linux distribution, and soon there were hundreds of distros. Distrowatch.com currently lists 312 unique Linux distributions. Naturally, numerous developers make use of Linux either via popular free distributions like Fedora, Canonical’s Ubuntu, Debian, Arch Linux, Gentoo, and many other variants, or through cloud providers. Commercial Linux offerings, which offer support in addition to the free and open source components, achieved viability when numerous enterprises—and IBM was among them—moved from Unix and its proprietary model to supplying middleware and software solutions for Linux. Red Hat Enterprise Linux was built on the basis of a proprietary model of commercial support. German provider SUSE followed suit with SUSE Linux Enterprise Server (SLES).

Unix vs Linux

Up to now, we’ve had a brief overview of the history of Linux and Unix and the GNU/Free Software Foundation underpinnings of a free and open source alternative to Unix. Let’s take a look at what’s different about Linux vs Unix, two operating systems that share similar histories and aspirations.

There aren’t many obvious differences between Linux and Unix from the user’s point of view. A lot of Linux’s appeal came from the fact that it worked on different architecture types (the modern PC included) and it’s tools were familiar to Unix users and system administrators.

Compliance with POSIX standards made it possible to compile software written on Unix for a Linux operating system without too much difficulty. In a lot of cases, shell scripts could be used directly on Linux. Some tools in Unix and Linux may have had slight differences in flag/command-line options, but many worked in the same way on either system.

It’s worth noting here that macOS hardware and operating system became popular as a Linux development platform because a lot of Linux tools and scripts also work in the macOS terminal. Linux tools like Homebrew make a lot of open source software components available.

The other differences between Linux and Unix mostly relate to licensing. Linux vs Unix is largely a contest of free v licensed software. Alongside this, the fact that Unix distributions lack a common kernel affects software and hardware vendors, too. With Linux, a vendor can create a device driver for a particular hardware device with the reasonable expectation that it will work fine across the majority of distributions. But with Unix having commercial and academic branches to cater to, it might be necessary to release different drivers for all the Unix variants. There was also the problem of licensing, and other worries related to access to an SDK or a distribution model for the software as a binary device driver across multiple versions of Unix. Linux and Unix are clearly  different.

Many of the advances seen in Linux have been mirrored by Unix, which shows that Linux and Unix developers keep a close eye on each other. A lot of GNU utilities became available as add-ons for Unix systems on occasions when developers wanted features from GNU programs that were not part of Unix. For instance, IBM’s AIX had AIX Toolbox for Linux Applications which contained hundreds of GNU software packages (like Bash, GCC, OpenLDAP, among numerous others) that could be added to an AIX installation to make transitioning between Linux and Unix-based AIX systems go more smoothly.

Proprietary Unix still exists, and there are many major vendors who promise to support their current releases for several more years yet, so Unix won’t be disappearing from sight any time soon. Also, the BSD branch of Unix is open source, and NetBSD, OpenBSD, and FreeBSD all boast strong user bases and open source communities. They might not be as vocal as their Linux equivalents, but their numbers continue to outstrip those of proprietary Unix numbers in areas like the web server arena.

Linux has become ubiquitous across a multitude of hardware platforms and devices. Linux drives the Raspberry Pi, which has become hugely popular among enthusiasts. It’s a platform that has ushered in a whole array of IoT devices running Linux. We’ve already mentioned Linux’s prevalence in Android devices, cars, and smart TVs. Every cloud service features virtual servers running Linux, and a lot of the most popular cloud-native stacks are based on Linux, whether that means container runtimes or Kubernetes or the sundry serverless platforms that are coming to the fore.

Finally, it’s interesting to note that Microsoft’s creation of the Windows Subsystem for Linux (WSL), along with the Windows port of Docker, including LCOW (Linux containers on Windows) support, would have been unthinkable back in 2016. They point clearly to the fact that Linux vs Unix is a contest that’s been pretty much decided.