上QQ阅读APP看书,第一时间看更新
Disadvantages of IT automation
As with any other technology, IT automation does come with some disadvantages. From my point of view, these are the biggest disadvantages:
- Automating all of the small tasks that were once used to train new system administrators.
- If an error is performed, it will be propagated everywhere.
The consequence of the first is that new ways to train junior system administrators will need to be implemented.
The second one is trickier. There are a lot of ways to limit this kind of damage, but none of those will prevent it completely. The following mitigation options are available:
- Always have backups: Backups will not prevent you from nuking your machine – they will only make the restore process possible.
- Always test your infrastructure code (playbooks/roles) in a non-production environment: Companies have developed different pipelines to deploy code, and those usually include environments such as dev, test, staging, and production. Use the same pipeline to test your infrastructure code. If a buggy application reaches the production environment it could be a problem. If a buggy playbook reaches the production environment, it can be catastrophic.
- Always peer-review your infrastructure code: Some companies have already introduced peer-reviews for the application code, but very few have introduced it for the infrastructure code. As I was saying in the previous point, I think that infrastructure code is way more critical than application code, so you should always peer-review your infrastructure code, whether you do it for your application code or not.
- Enable SELinux: SELinux is a security kernel module that is available on all Linux distributions (it is installed by default on Fedora, Red Hat Enterprise Linux, CentOS, Scientific Linux, and Unbreakable Linux). It allows you to limit users and process powers in a very granular way. I suggest using SELinux instead of other similar modules (such as AppArmor) because it is able to handle more situations and permissions. SELinux will prevent a huge amount of damage because, if correctly configured, it will prevent many dangerous commands from being executed.
- Run the playbooks from a limited account: Even though user and privilege escalation schemes have been in Unix code for more than 40 years, it seems as if not many companies use them. Using a limited user for all your playbooks, and escalating privileges only for commands that need higher privileges, will help prevent you from nuking a machine while trying to clean an application temporary folder.
- Use horizontal privilege escalation: The sudo command is a well known, but is often used in its more dangerous form. The sudo command supports the -u parameter that will allow you to specify a user that you want to impersonate. If you have to change a file that is owned by another user, please do not escalate to root to do so, just escalate to that user. In Ansible, you can use the become_user parameter to achieve this.
- When possible, don't run a playbook on all your machines at the same time: Staged deployments can help you detect a problem before it's too late. There are many problems that are not detectable in dev, test, staging, and QA environments. The majority of them are related to a load that is hard to emulate properly in those non-production environments. A new configuration you have just added to your Apache HTTPd or MySQL servers could be perfectly OK from a syntax point of view, but disastrous for your specific application under your production load. A staged deployment will allow you to test your new configuration on your actual load without risking downtime in case something was wrong.
- Avoid guessing commands and modifiers: A lot of system administrators will try to remember the right parameter, and try to guess if they don't remember it exactly. I've done it too, a lot of times, but this is very risky. Checking the man page or the online documentation will usually take you less than two minutes, and often, by reading the manual, you'll find interesting notes you did not know. Guessing modifiers is dangerous because you could be fooled by a non-standard modifier (that is, -v is not a verbose mode for grep, and -h is not a help command for the MySQL CLI).
- Avoid error-prone commands: Not all commands have been created equally. Some commands are (way) more dangerous than others. If you can assume a cat-based command safe, you have to assume that a dd-based command is dangerous, since it performs copies and conversion of files and volumes. I've seen people using dd in scripts to transform DOS files to Unix (instead of dos2unix) and many other, very dangerous, examples. Please, avoid such commands, because they could result in a huge disaster if something goes wrong.
- Avoid unnecessary modifiers: If you need to delete a simple file, use rm ${file}, not rm -rf ${file}. The latter is often performed by users that have learned to be sure, always use rm -rf, because at some time in their past, they have had to delete a folder. This will prevent you from deleting an entire folder if the ${file} variable is set wrongly.
- Always check what could happen if a variable is not set: If you want to delete the contents of a folder and you use the rm -rf ${folder}/* command, you are looking for trouble. In case the ${folder} variable is not set for some reason, the shell will read a rm -rf /* command, which is deadly (considering the fact that the rm -rf / command will not work on the majority of current OSes because it requires a --no-preserve-root option, while the rm -rf /* will work as expected). I'm using this specific command as an example because I have seen such situations: the variable was pulled from a database which, due to some maintenance work, was down, and an empty string was assigned to that variable. What happened next is probably easy to guess. In case you cannot prevent using variables in dangerous places, at least check them to see whether they are empty or not before using them. This will not save you from every problem, but may catch some of the most common ones.
- Double-check your redirections: Redirections (along with pipes) are the most powerful elements of Unix shells. They could also be very dangerous: a cat /dev/rand > /dev/sda can destroy a disk even if a cat-based command is usually overlooked because it's not usually dangerous. Always double-check all commands that include a redirection.
- Use specific modules wherever possible: In this list, I've used shell commands because many people will try to use Ansible as if it's just a way to distribute them: it's not. Ansible provides a lot of modules and we'll see them in this book. They will help you create more readable, portable, and safe playbooks.