Top text processing and filtering commands in Linux

article-featured-image

In this article you'll get to know about some of the best text processing and filtering commands that every Linux person should know about. If you are starting your career in the Linux world or trying to get a new job in this field, then you must be familiar with these commands and tools.

Introduction

Text processing and filtering commands are used to manipulate or modify the text data. These are powerful tools made to handle text-based information efficiently. I'll be using Linux Mint 21.2 GUI version for demonstrations. These commands are the same and available in all distributions so you can pick any Linux distribution.

These tools are mainly used on server or CLI versions of Linux because there is no Graphical User Interface present to interact with the Operating System. Below are some of the best and most used text processing commands.

To do it more practically, use the below command to create a new file with some text in it:
$
echo -e "Two legends rise, in dark and gleam,\nTheir quests for justice, a timeless theme.\nOne cloaked in night, the other in steel,\nTheir valorous spirits, a heroic seal.\n\nIn worlds apart, their stories told,\nYet both wear masks, a tale unfolds.\nBatman and Ironman, in their own way,\nShield the world, where heroes sway." > heroes.txt

Here I'm using echo command to write some text into heroes.txt named file. We'll use this file to perform various operations throughout the process.

grep command

grep command is used to search in text. Search can be based on a specific string or pattern. grep command comes pretty handy when it comes to searching some text in a file or even in the output text of some other command.

Now Let's say we want to search word bat in this file, we'll use grep command like this:

$
grep -i "bat" heroes.txt

$
grep -io "bat" heroes.txt

This command will look for word bat in heroes.txt file. Every line that contains the specified word will be printed as output in the terminal.

searching keyword with grep As you can see in the above image, there is only one line that contains the specified word.
  • Option i is used to ignore the lettercase while searching.
  • Option o in the second command is used to print only the specified string, if found, instead of the whole word or line.

Now here comes the real thing. Let's say you have a file with 1000 lines of text and you want to find and count the appearance of a specific keyword. This is where grep is so much fun. After filtering the word, We can further process the output produced by grep command.

Use the below command to count the appearance of specified word:
$
grep -io "in" heroes.txt|wc -l

In the above command, we used wc command for further processing the text output produced by grep command. This wc -l will count the output lines.

search keyword with grep and count wit wc

As shown in the image above, there are 5 appearances of in word inside the heroes.txt file. This way we can use grep for filtering the text and then further process that output using other commands. Use grep --help to know more about grep command and options that can be used with it.

awk command

awk is text processing language used for recognizing patterns and processing text. It extracts and manipulates text from files along with formatting and calculations. Its primary function is to break down lines into fields and then process those fields further using various operations.

It can arrange and print text output based on a specified delimiter. Let's say we want to print the first field of every line from our file, awk command can perform this operation like this:

$
awk '{print$1}' heroes.txt
print field with awk

As you can see in the image above, awk prints the first field of every line. Fields are separated by a delimiter. awk uses whitespace as default delimiter to seperate fields.

$
awk '{print$1,$3}' heroes.txt

The Above command will print the first and third fields separated by whitespace delimiter. It will produce the output as shown in below image:

print multiple field with awk

If you want to use another delimiter instead of default whitespace, use the following option with awk command to achieve this:

$
awk -F',' '{print$1}' heroes.txt

Here i used -F option to specify , as delimiter. Now the fields will be seperated by , instead of whitespace. This command will print the first field based on the specified delimiter.

print field with awk using custom delimiter

Using a custom delimiter can be very useful when the text content of a file is formatted with other delimiters instead of whitespace.

You can combine awk and grep command together for further text processing.

$
grep -i 'bat' heroes.txt|awk '{print$3}'

This command will utilize both grep and awk tools together. First, grep will search and filter out the bat keyword from the specified file, and then awk command will print the first field from output produced by grep command using the default delimiter.

filter using grep and print field with awk

Using grep and awk together is a great combination to process the output text. You can execute awk --help or man awk commands to know more about this tool.

cut command

cut command is used for extracting a particular section or field from a file or output text of other commands. It also worked on the basis of delimiter. The main use cases of this command are to further process the output text or print the output from a file that contains structured data formatted by delimiter.

cut command comes in handy when the goal is to produce output more controlled than usual. Use the below command:

$
cut -c1-3 heroes.txt

In this command, I'm specifying to print only 1 to 3 characters of each line. The following output will be produced:

cut command

As you can see, the command only printed the first three characters of each line from the file, as specified in the command. Just remember that whitespace is also counted as a character when working with the cut command.

Now we know that the cut command also works with a delimiter, so let's try it. Unlike awk, there is no default delimiter in cut command. You have to specify the delimiter manually if you want to process the text on delimiter basis. Use command:

$
cut -d, -f2 heroes.txt

In this command, option -d is used to specify , as prefered delimiter. Option -f2 specifies the field that we want. This command will produce the below output:

cut command with delimiter

As specified in the command, lines are seperated by , and second field is printed as output. We can further process the output text by using grep or awk command if required.

Using grep, awk, and cut commands together are very practical and more common than you think. Linux experts often use combinations of various text-processing commands to produce the desired output. Use the below command:

$
awk '{print$1,$3}' heroes.txt|grep -i "batman"|cut -d, -f1

Now this command is composed from awk, grep and cut commands. The purpose of this command is to utilize these tools in a way to produce the output that otherwise is not possible without manually opening and altering the file.

cut awk and grep command Let's break down what each command means:
  • awk '{print$1,$3}' heroes.txt command will print first and third field seperated by whitespace (default delimiter in awk) as output.
  • grep -i "batman" command will further process the output produced by awk command and output the line that contains keyword bat.
  • cut -d, -f1 command will further process the output produced by grep command. It will separate the fields by , delimiter and print the first field as final output.

This way you can use multiple text processing commands together to serve the purpose. Any output produced by any other Linux command can be further processed by these commands.

sort command

sort command is used to sort the lines of a file or text output in alphabetical or numerical order. The purpose of this command is to arrange the data in a specific order. Use the below command:

$
sort heroes.txt
sort command

As you can see in the image above, sort command arranged the lines in aplhabetical order. Just remember that an empty will always appear first in alphabetical order.

Just like other text processing commands, you can also use sort command to further process the output produced by other commands. Use below command to use it with grep command:

$
grep -i 'in' heroes.txt|sort
sort command with grep

As shown in the image above, output produced by grep command is further processed using sort command to arrange the lines in aplhabetical order. Use sort --help to know about the options that can be used with this command.

uniq command

As the name states, uniq command is used to filter out the duplicate lines from the file or output text of other commands. This command is most useful when you have duplicate lines multiple times in the file or data stream and you want to filter out those duplicate lines while getting information about what lines were duplicated. Use command:

$
echo -e "batman\nironman\nbatman" | sort | uniq
uniq command with sort command

As you can see in the above image, executed commands contain two times word batman but the output contains only once because duplicate entry has been deleted by the uniq command.

If you want to know what duplicate lines are removed from the output, use the below command with -c option. Run the below command:

$
echo -e "batman\nironman\nbatman" | sort | uniq -c
option uniq command with sort command

Here you can see that word batman was appeared in 2 lines. So the duplicate line has been removed from the output but in the meantime, also let the user know what word was duplicated. Use uniq --help to know more about options that can be used with this command.

tr command

tr stands for translate. This command is used to replace or delete characters from the input text file or standard text input. Use the below commands for better understanding:

$
cat heroes.txt|tr 'a' 'e'
character replacement using tr command

Here I'm printing the content of our text file using cat command and then pass the printed text as standard input to tr command where It'll replace character a with e and produce the standard output as shown in the image above.

tr can also be used to remove the character. Use the below command:

$
cat heroes.txt|tr -d 'a'
character remove using tr command

In the above command, I'm using tr to remove a character from the standard input. As you might have noticed in the image above, character a has been removed from everywhere. Use tr --help to explore more options about this command.

Text processing in Linux
protocolten-admin

Author: Harpreet Singh

Created: Tue 21 Nov 2023

Updated: 11 months, 2 weeks ago

POST CATEGORY
  1. Linux
  2. Cloud
  3. Informative
Suggested Posts:
PROGRAMMING post image
Basic Python Coding Questions and Answers

Python is one of the most powerfull programming language. Python is used in web-development, …

LINUX post image
Understand SELinux module and manage security policies in Linux

This article is all about SELinux. It's a security component that protects Linux systems from …

LINUX post image
Setup a local DNS Server

In this article, you'll learn about how you can create and configure a DNS …

SECURITY post image
Large Data Encryption & Decryption using Cryptography

In the past few years, keeping your data safe and secure is challenging than …

LINUX post image
Secure Apache against DDoS attacks using mod evasive

mod_evasive is an Apache web server module that helps protect the server against some types …

Sign up or Login to post comment.

Comments (0)