Toolbox IV

In this section, we will represent our syntactic motives in graphical forms. The tool used for this programme is patron2graphe. It is to be executed in the command line. For this command to run successfully, certain arguments have to be considered. Let’s look at the command before explaining its details:

  1. ./patron2graphe.exe "utf-8" cultNa.txt motif-iso-8859-1.txt

As we can see, it takes in the first argument, the encoding (that needs to be specified, for this example, it is “iso-8859-1”). In the second argument, it takes a file that contains the sequences and in the last argument, a file containing the pattern is to be taken.

Without mentioning any pattern, the programme will consider all the words of the selected file and thus the display would not be readable. To avoid this, we need to define a specific pattern (that we are looking for) in the file motif-iso-8859-1.txt which contains MOTIF=\bsocial (Note that if your documents' encoding is UTF-8, the format of the pattern would be MOTIF=social).

For example, we define the following pattern by writing a regular expression: [Ss]ocial

The regular expression will also include singular and plural words. In this way, the graph is divided into two parts, the pattern NOM ADJ in singular and plural form. Both sides of the graphs display commonly the word “social” despite the appearance of different sequences containing the word "social".

Here is the display of the graph for the word "bio".

And also the graph for the word "ministre".

We can search for many sequences in this manner by taking into account the files that we obtained in .