La vie multilingue du mot "fête"

WORK PLAN


We worked on the project in two phases. In the first phase, we created a blog in which we described our work in different levels along with the problems encountered, prepared the work environment, made our corpus, script writing and analysis of the results.

BLOG

First of all, we made a blog in which our work concerning the project was described step by step and we also did some exercises in order to train ourselves. You can visit our blog (google blogger) پلوریتال by clicking HERE.

Work Environment

1. URLS : Location our three urls files english.txt, frech.txt and urdu.txt

2. PROGRAMMES : Location of our script (script.fahad1.sh) , the input.txt file containing the paths to the input files(./URLS) and the output (./TABLEAUX/Tableaux.html)for the programme.

3. PAGES-ASPIREES : Location of our downloaded urls.

4. DUMP-TEXT : Location of the content (extracted pages) saved as text files.

5. CONTEXTES : Folders containing the contexts in which our word appears.

6. INDEX : Folder contaning the dictionnaries of every Dump file.

7. TABLEAUX : Location of our HTML table.

8. FICHIERGLOBAUX : Location of our files containing the concatenation of every dump, contexte and index files.

CORPUS

We made a corpus of about 150 urls which comprised of 50 urls for each language.

In order to gather these urls, we typed the keyword celebration,fête and جشن.

These words appeared in the majority in press and some forums. This choice enabled us to see on what type of occasions are these words utilised.