For beginning system administrator

Started by chirkovmisha, Aug 14, 2022, 05:20 AM

Previous topic - Next topic

chirkovmishaTopic starter

Professionals who are new to system administration face a number of the same problems. To offer solutions, I undertook to write this series of lectures. Some things in it are specific to hosting technical support, but generally, they can be useful, if not for everyone, then for many. Therefore, I have adapted the text of the lecture to share here.

It does not matter what your position is called - it is important that in fact you are in managing. Therefore, let's start with what a system administrator should do. Its main task is to put in order, maintain order and prepare for future increases in order. Without a system administrator, a mess begins on the server. Logs are not written, or the wrong thing is written in them, resources are not optimally distributed, the disk is filled with all sorts of garbage, and the system begins to slowly bend from such an amount of chaos. Calmly! System administrators in your person begin to solve issues and clean up the mess!

Pillars of system administration

However, before you start solving problems, it is worth getting acquainted with the four main pillars of administration:


This is the foundation of the basics. If you do not build your workflow on these principles, it will be inefficient, unproductive, and generally bear little resemblance to real managing. Let's deal with each separately.


Documentation implies not reading the documentation (although you can't do without it), but also maintaining it.

How to keep documentation:

    Faced with a new issue that you have never seen before? Write down the main symptoms, methods of diagnosis and principles of elimination.
    Have you come up with a new elegant solution to a typical problem? Write it down so you don't have to reinvent it in a month.
    Have you been helped to deal with a question in which you did not understand anything? Write down the main theses and concepts, draw yourself a diagram.

The main idea: you should not completely trust your own memory when mastering and applying new things.

In what format you will do this is up to you: it can be a system with notes, a personal blog, a text file, a physical notepad. The main thing is that your records meet the following requirements:

    Don't be unnecessarily long. Highlight the main ideas, methods and means. If understanding a issue requires diving into the low-level mechanics of how memory allocation works in Linux, don't rewrite the article you learned it from - link to it.
    The entries should be understandable to you. If the line race cond.lockup does not allow you to directly understand what you described with this line, explain. Good documentation doesn't take half an hour to figure out.
    Search is a very good feature. If you're blogging, add tags; if in a physical notebook, stick small post-it with descriptions. There is not much point in documentation if you spend as much time searching for an answer in it as you would spend solving a question from scratch.

Not only will you not have to look for the same answers twice, but documentation will be a great help in learning new topics (notes, yes!), pump up your spider sense (the ability to diagnose a complex problem with one superficial look), add organization to your actions. If the documentation is available to your colleagues, it will allow them to figure out what and how you piled up there when you are not there.


Templatization is the creation and use of templates. To solve most typical questions, it is worth creating a specific action template. Most problems should be diagnosed using a standardized workflow. When you have fixed/installed/optimized something, the performance of this something should be checked against standardized checklists.

Templates are the best way to organize your workflow. By using standard procedures to solve the most common problems, you get a lot of cool stuff. For instance, the use of checklists will allow you to diagnose all the functions that are essential for the operation and discard the diagnostics of unimportant functionality. And standardized procedures will minimize unnecessary tossing and reduce the chance of error.

The first essential point is that procedures and checklists also need to be documented. If you just rely on memory, you can skip some really important check or operation and ruin everything. The second essential  point is that all template practices can and should be modified if the situation requires it. There are no ideal and absolutely universal patterns. If there is a issue, but the template check did not reveal it, this does not mean that there is no problem. However, before attempting to test some unlikely hypothetical issue, it's always a good idea to do a quick pattern check first.


The optimization speaks for itself. The workflow should be optimized as much as possible in terms of time and labor costs. There are countless options here: learn hotkeys, abbreviations, regular expressions, available tools. Look for more practical ways to use these tools. If you call a command 100 times a day, put it on a keyboard shortcut.

Familiarize yourself with the different options for available tools - perhaps there is a more convenient terminal client, DE, clipboard manager, browser, email client, operating system. Find out what tools your colleagues and acquaintances use - maybe they choose them for a reason. After you pick up the tools, learn how to use them: learn the keys, abbreviations, tips and tricks.

Optimally use standard tools - coreutils, vim, regular expressions, bash. For the last three, there are a huge number of wonderful manuals and documentation. With their help, you can quickly go from the state of "I feel like a monkey crаcking nuts with a laptop" to "I'm a monkey who uses a laptop to order a nut crаcker."


Automation will transfer heavy operations from our tired hands to the tireless hands of automation. If some standard procedure is executed in five commands of the same type, then why not wrap all these commands in one file and not call one command that downloads and executes this file?

Automation itself is 85% about writing and optimizing your own tools (and another 15% from trying to make them work as they should). It can be just an advanced one-liner or a huge all-powerful tool with a web interface and API. The main criterion here is that creating a tool should take no more time and effort than the amount of time and effort that this tool will save you. If you spend five hours writing a script that you will never need again, for a task that would take you an hour or two to solve without a script, this is a very poor workflow optimization. You can spend five hours creating a tool only if the number, type of tasks and time allow it, which is rare.

Automation does not necessarily mean writing full-fledged scripts. For instance, to create a bunch of objects of the same type from a list, a clever one-liner is enough, which will automatically do what you would do with your hands, switching between windows, with heaps of copy-paste.

Actually, if you build the managing process on these four pillars, you can quickly increase your efficiency, productivity and qualifications. However, this list needs to be supplemented with one more item, without which work in IT is almost impossible - self-education.

Self-education of the system administrator

To be even slightly competent in this area, you need to constantly study and learn new things. If you do not have the slightest desire to face the unknown and figure it out, you will very quickly "sag". All sorts of new solutions, technologies and methods are constantly appearing in IT, and if you do not study them at least superficially, you are on the way to losing. Many areas of information technology stand on a very complex and voluminous basis. For instance, networking. Networks and the Internet are everywhere, you come across them every day, but once you dig into the technologies behind them, you will find a huge and very complex discipline, the study of which is never a walk in the park.

I did not include this item in the list, because it is key for IT generally, and not just for system administration. Naturally, you won't be able to learn absolutely everything right away - you just physically don't have enough time. Therefore, when self-education, one should remember about the necessary levels of abstraction.

You don't have to immediately learn how the internal memory management of each individual utility works, and how it interacts with Linux memory management, but it's good to know what RAM is schematically and why it is needed. You don't need to know how the headers of TCP and UDP are structurally different, but it would be nice to understand the basic differences between the protocols at work.
You don't need to learn what optical attenuation is, but it would be nice to know why true loss is always inherited across nodes. There is nothing wrong with knowing how certain elements work at a certain level of abstraction and not necessarily dismantling absolutely all levels when there is no abstraction at all (you will just go crazy).

However, in your field, to argue at the level of abstraction "well, this is such a thing that allows you to show websites" is not very good. The following lectures will be devoted to an overview of the main areas that the system administrator has to deal with in working at lower levels of abstraction. I will try to limit the amount of knowledge reviewed to a minimum level of abstraction.

10 commandments of system administration

So, we have learned the four main pillars and the foundation. Can we start solving problems? Not yet. Before that, it is advisable to familiarize yourself with the so-called "best implementations" and good manners. Without them, there is a chance that you will do more harm than good. So, let's begin:

    Some of my colleagues believe that the very first rule is "do no harm". But I tend to disagree. When you try not to harm, you can't do anything - too many actions are potentially destructive. I think the most essential rule is "make a backup". Even if you hurt, you can always roll back, and everything will not be so bad.

    Backup should always be done when time and place allow it. You need to back up what you will change and what you risk losing during a potentially destructive action. It is advisable to check the backup for integrity and availability of all the necessary data. The backup should not be deleted immediately after you have checked everything, if you do not need to free up disk space. If the place requires it, back it up to your personal server and delete it after a week.
    The second most important rule (which I myself often break) is "do not hide." If you have made a backup, write where so that your colleagues do not have to look for it. If you have done some non-obvious or complex actions, write it down: you will go home, and the issue  may repeat or arise for someone else and your solution will be found by keywords. Even if you are doing something that you know well, your colleagues may not know it.
    The third rule does not need to be explained: "never do something the consequences of which you do not know, do not imagine or do not understand." Don't copy commands from the internet if you don't know what they do, call man and parse first. Do not use ready-made solutions if you cannot understand what they do. Minimize obfuscated code execution to an absolute minimum. If there is no time to understand, then you are doing something wrong and you should read the next paragraph.
    "Test." New scripts, tools, one-liners, and commands should be tested in a controlled environment, not on the client machine, if there is at least a minimal potential for destructive actions. Even if you backed up everything (and you did), downtime is not the coolest thing. Get a separate server/virtual/chroot for this business and test there. Nothing broken? Then you can run on the "combat".

"Control". Minimize all transactions that you do not control. One crooked dependency on a package can take half the system down with it, and the -y flag for yum remove gives you the opportunity to practice your system recovery skills from scratch. If the action has no uncontrolled alternatives - the next item and a ready backup.
    "Check." Check the consequences of your actions and whether you need to roll back to a backup. Check if the problem is really solved. Check if the error is reproduced and under what conditions. Check what you can break with your actions. It is superfluous to trust in our work, but never to verify.
    "Communicate." If you can't solve the issue, ask your colleagues if they have experienced this. If you want to apply a controversial decision - find out the opinion of colleagues. Perhaps they will come up with a better solution. There is no confidence in your actions - discuss them with colleagues. Even if this is your area of expertise, a fresh look at the situation can clear up a lot. Don't be ashamed of your own ignorance. It is better to ask a stupid question, look like a fool and get an answer than not ask this question, not get an answer and be a fool.
    "Don't refuse help unreasonably." This point is the reverse of the previous one. If you are asked a stupid question, clarify and explain. They ask for the impossible - explain that it is impossible and why, offer alternatives. If there is no time (really no time, not desire) - say that you have an urgent issue / a large amount of work, but you will figure it out later. If colleagues do not have urgent tasks, offer to contact them and delegate the issue.
    "Let's give feedback." Have a colleague started using a new technique or a new script, and you are facing the negative consequences of this decision? Report it. Perhaps the issue is solved in three lines of code or five minutes of refining the technique. Found a bug in the software? Report a bug. If it plays or doesn't need to be played, it will likely get fixed. Voice wishes, suggestions and constructive criticism, bring questions for discussion if it seems that they are relevant.
    "Ask for feedback." We are all imperfect, and so are our decisions, and the best way to check if your decision is correct is to bring it up for discussion. We optimized something at the client - ask to follow the work, maybe the "bottleneck" of the system is not where you were looking for. We wrote a help script - show your colleagues, maybe they will find a way to improve it.

If you constantly apply these practices in your work, most of the problems cease to be problems: you will not only reduce the number of your own mistakes and fakes to a minimum, but you will also have the opportunity to correct mistakes (in the form of backups and colleagues who will advise you to backup). Further - only technical details, in which, as you know, the devil lies.

The main tools that you will have to work with more than 50% of the time are grep and vim. What could be easier? Text search and text editing. However, both grep and vim are powerful multifunctional multi-tools that allow you to search and edit text efficiently. If some Windows notepad allows you to simply write / delete a line, then in vim you can do almost anything with the text. If you don't believe me, call the vimtutor command from the terminal and start learning.
As for grep, its main strength is in regular expressions. Yes, the tool itself allows you to quite flexibly set search conditions and output data, but without RegExp it doesn't make much sense. And you need to know regular expressions! At least at a basic level. To get started, I would advise you to watch this video, it understands the basics of the basics of regular expressions and their use in conjunction with grep. Oh yes, when combined with vim, you get the ULTIMATE POWER to do things with text that you have to hang them with 18+ icons.

Of the remaining 50%, 40% comes from the coreutils toolkit. For coreutils, you can see the list on Wikipedia, and the manual for the entire list is on the GNU website. What is not covered by this set is in the POSIX utilities. It is not necessary to memorize this with all the keys by heart, but it is useful to at least roughly know what the main tools can do. You don't have to reinvent the wheel out of crutches. I somehow had to replace line breaks with spaces in the output from some utility, and a sick brain gave birth to a construction like sed ':a;N;$!ba;s/\n/ /g', a colleague who came up drove me away with a broom from the console, and then solved the issue by writing tr '\n'

I would advise you to remember what each particular tool does and the keys to the most frequently used commands, for everything else there is man. Feel free to call man if you are in any doubt. And be sure to read man on man itself - it contains essential information about what you find.

Knowing these tools, you will be able to effectively solve a significant part of the issues that you will encounter in implementation. In the following lectures, we'll look at when to apply such tools and the frameworks for the underlying services and applications they apply to.


I would add some points.

Strive to make everything as simple as possible, before attracting an additional service with a load of dependencies for a primitive operation, think about 96% there is a way to make it easier, in 99.99% of cases it will be many times faster.

Try to use the standard tools to the maximum - convenience is good, but the possibility of getting into a situation where the usual tools are not at hand is far from zero.


And I would add else:
Show firmness of character and do not let them make a gag out of you. It's better to quarrel with the head of the department once time than to be regularly distracted by all the nonsense that he hung on you.

Everything written applies just to large companies.
If 1-3 people, then all of the above is bureaucracy and unattainable prospects,
For little firms, a person who can analyze the market and program is more important.


From the point of view of a simple person, an administrator is a generalist computer specialist:

he makes sure that all computers in the company turn on and work as they should;
updates the software on them when the time comes;
sets up a network within the corporation;
monitors antiviruses and updates them regularly;
sets up a PC for new employees;
fixes computers that are broken;
if necessary, it sets up access within the company, launches a mail server, internal storage, and monitors the operation of the entire network as a whole.
Generally, for the masses, a sysadmin is a synonym for the word "programmer".