How To Play with Word and Character Counts in Linux
Play with Word and Character Counts in Linux terminal
wc( word count) command prints newline, word and byte counts from file. This article explains how to play with word and character count in Linux terminal.
To analyze text file
Let’ s take the samba configuration file smb.conf for testing purpose.
[root@linuxhelp ~]# cd /etc/samba/
[root@linuxhelp samba]# ls
lmhosts smb.conf
To view the repeated words and frequency in the smb.conf file.
[root@linuxhelp samba]# cat smb.conf | tr ' ' ' 12' | tr ' [:upper:]' ' [:lower:]' | tr -d ' [:punct:]' | grep -v ' [^a-z]' | sort | uniq -c | sort -rn | head
363
86 the
66 to
30 a
22 samba
21 on
21 for
20 yes
20 is
18 this
This command is used to create text file man.txt with manual page content for using man command.
$ fold -w1 < man.txt | tr ' [:lower:]' ' [:upper:]' | sort | tr -d ' [:punct:]' | uniq -c | sort -rn | head -20
The following command helps you to break down words individually.
[root@linuxhelp samba]# echo ' linuxhelp' | fold -w1
l
i
n
u
x
h
e
l
p
-w1 is used for width
To sort the result and get the output with frequency, use the following command.
[root@linuxhelp samba]# fold -w1 < smb.conf | sort | uniq -c | sort -rn | head
1636
887 e
682 o
663 t
646 s
615 a
531 -
523 i
519 r
496 n
Get frequent characters in text file with uppercase and lowercase by using the following command.
[root@linuxhelp samba]# fold -w1 < smb.conf | sort | tr ' [:lower:]' ' [:upper:]' | uniq -c | sort -rn | head -20
1636
903 E
714 S
702 O
699 T
620 A
545 N
539 I
533 R
531 -
386 L
285 M
276 D
260 H
259 C
238 U
234 P
224 =
211 B
210 #
To strip out punctuation, use tr command.
[root@linuxhelp samba]# fold -w1 < smb.conf | tr ' [:lower:]' ' [:upper:]' | sort | tr -d ' [:punct:]' | uniq -c | sort -rn | head -20
1636
1221
903 E
714 S
702 O
699 T
620 A
545 N
539 I
533 R
386 L
285 M
276 D
261
260 H
259 C
238 U
234 P
211 B
140 W
Run the above script in one line to view the output
[root@linuxhelp samba]# cat smb.conf | tr ' ' ' 12' | tr ' [:upper:]' ' [:lower:]' | tr -d ' [:punct:]' | tr -d ' [0-9]' | sort | uniq -c | sort -n | grep -E ' ..................' | head
1 add group script usrsbingroupadd g
1 add machine script usrsbinuseradd n c workstation u m d nohome s binfalse u
1 add user script usrsbinuseradd u n g users
1 and groupadd family of binaries run the following command as the root user to
1 a pershare basis
1 apply the correct selinux labels to these files
1 a publicly accessible directory that is read only except for users in the
1 argument list can include mypdcname mybdcname and mynextbdcname
1 boolean on
1 browser control options
Tag :
Advanced commands
Q
What is the use of word count command in Linux?
A
wc-word count command prints newline, word and byte counts from file.
Q
How to display and see the last 100 lines of a file?
A
In order to display and see the last 100 lines of a file using " tail command" as "tailf -200 file path".
Q
How to find most frequently used words in Linux?
A
Use reverse search by pressing ctrl + R and type the word in terminal.
Q
How to check for the width?
A
In order to check use "-w" for it.
Q
How to directly append files in Linux?
A
Make use of ">>" with cat command so the input goes and append at the end of the file.