How To Play with Word and Character Counts in Linux

Play with Word and Character Counts in Linux terminal

wc( word count) command prints newline, word and byte counts from file. This article explains how to play with word and character count in Linux terminal.

To analyze text file

Let’ s take the samba configuration file smb.conf for testing purpose.

[root@linuxhelp ~]# cd /etc/samba/
[root@linuxhelp samba]# ls
lmhosts  smb.conf

To view the repeated words and frequency in the smb.conf file.

[root@linuxhelp samba]# cat smb.conf | tr '  '   ' 12'  | tr ' [:upper:]'  ' [:lower:]'  | tr -d ' [:punct:]'  | grep -v ' [^a-z]'  | sort | uniq -c | sort -rn | head
    363 
     86 the
     66 to
     30 a
     22 samba
     21 on
     21 for
     20 yes
     20 is
     18 this

This command is used to create text file man.txt with manual page content for using man command.

$ fold -w1 <  man.txt | tr ' [:lower:]'  ' [:upper:]'  | sort | tr -d ' [:punct:]'  | uniq -c | sort -rn | head -20

The following command helps you to break down words individually.

[root@linuxhelp samba]# echo ' linuxhelp'  | fold -w1
l
i
n
u
x
h
e
l
p

-w1 is used for width

To sort the result and get the output with frequency, use the following command.

[root@linuxhelp samba]# fold -w1 <  smb.conf | sort | uniq -c | sort -rn | head
   1636  
    887 e
    682 o
    663 t
    646 s
    615 a
    531 -
    523 i
    519 r
    496 n

Get frequent characters in text file with uppercase and lowercase by using the following command.

[root@linuxhelp samba]# fold -w1 <  smb.conf | sort | tr ' [:lower:]'  ' [:upper:]'  | uniq -c | sort -rn | head -20
   1636  
    903 E
    714 S
    702 O
    699 T
    620 A
    545 N
    539 I
    533 R
    531 -
    386 L
    285 M
    276 D
    260 H
    259 C
    238 U
    234 P
    224 =
    211 B
    210 #

To strip out punctuation, use tr command.

[root@linuxhelp samba]# fold -w1 <  smb.conf | tr ' [:lower:]'  ' [:upper:]'  | sort | tr -d ' [:punct:]'  | uniq -c | sort -rn | head -20
   1636  
   1221 
    903 E
    714 S
    702 O
    699 T
    620 A
    545 N
    539 I
    533 R
    386 L
    285 M
    276 D
    261 
    260 H
    259 C
    238 U
    234 P
    211 B
    140 W

Run the above script in one line to view the output

[root@linuxhelp samba]# cat smb.conf | tr ' '  ' 12'  | tr ' [:upper:]'  ' [:lower:]'  | tr -d ' [:punct:]'  | tr -d ' [0-9]'  | sort | uniq -c | sort -n |  grep -E ' ..................'  | head
      1     add group script  usrsbingroupadd g
      1     add machine script  usrsbinuseradd n c workstation u m d nohome s binfalse u
      1     add user script  usrsbinuseradd u n g users
      1  and groupadd family of binaries run the following command as the root user to
      1  a pershare basis
      1  apply the correct selinux labels to these files
      1  a publicly accessible directory that is read only except for users in the
      1  argument list can include mypdcname mybdcname and mynextbdcname
      1  boolean on
1    browser control options
FAQ
Q
What is the use of word count command in Linux?
A
wc-word count command prints newline, word and byte counts from file.
Q
How to display and see the last 100 lines of a file?
A
In order to display and see the last 100 lines of a file using " tail command" as "tailf -200 file path".
Q
How to find most frequently used words in Linux?
A
Use reverse search by pressing ctrl + R and type the word in terminal.
Q
How to check for the width?
A
In order to check use "-w" for it.
Q
How to directly append files in Linux?
A
Make use of ">>" with cat command so the input goes and append at the end of the file.