Les TABLEAUX de données

Tableau n° 1

fichier : url-mot-chinois.txt
n°URLURLPAGES ASPIREESRet. CURLDUMP initial
(non utf-8)
DUMP utf-8
1http://www.huiyi99.com/resource/samples.html1.html0 - 1-utf8.txt
2http://www.chinanews.com/shipin/2011/09-08/news44025.html2.html02.txt
(euc-cn)
2-utf8.txt
3http://roll.sohu.com/20110828/n317579384.shtml3.html03.txt
(euc-cn)
3-utf8.txt
4http://news.zbinfo.net/gj/1092792.shtml4.html04.txt
(euc-cn)
4-utf8.txt
5http://www.cqcb.com/cbnews/instant/20111016/19930.html5.html05.txt
(euc-cn)
5-utf8.txt
6http://roll.sohu.com/20111015/n322244679.shtml6.html06.txt
(euc-cn)
6-utf8.txt
7http://www.hottx.net/history/hyl/7.html07.txt
(euc-cn)
7-utf8.txt
8http://news.sina.com.cn/c/2011-10-25/183023360769.shtml8.html08.txt
(euc-cn)
8-utf8.txt
9http://roll.sohu.com/20111031/n324050821.shtml9.html09.txt
(euc-cn)
9-utf8.txt
10http://news.163.com/11/1030/09/7HJSIDNK00014JB6.html10.html010.txt
(euc-cn)
10-utf8.txt
11http://www.17u.com/blog/article/606622.html11.html0 - 11-utf8.txt
12http://www.997788.com/member/detail.asp?daihao=%BB%D8%D2%E4%C0%FA%CA%B712.html012.txt
(euc-cn)
12-utf8.txt
13http://news.sohu.com/20111011/n321743397.shtml13.html013.txt
(euc-cn)
13-utf8.txt
14http://trans.wenweipo.com/gb/paper.wenweipo.com/2011/06/17/ED1106170002.htm14.html014.txt
(euc-cn)
14-utf8.txt
15http://mil.cnwest.com/content/2010-11/24/content_3793522.htm15.html0 - 15-utf8.txt
16http://ent.cn.yahoo.com/ypen/20110303/238382.html16.html0 - 16-utf8.txt
17http://d.lotour.com/henansbwg/20110318/589294.shtml17.html017.txt
(euc-cn)
17-utf8.txt
18http://city.ifeng.com/cshm/20110615/104269.shtml18.html0 - 18-utf8.txt
19http://www.olympic.cn/news/olympic_comm/2010-09-29/2078547.html19.html019.txt
(euc-cn)
19-utf8.txt
20http://roll.sohu.com/20111031/n324056806.shtml20.html020.txt
(euc-cn)
20-utf8.txt
21http://roll.sohu.com/20111031/n324042273.shtml21.html021.txt
(euc-cn)
21-utf8.txt
22http://news.hexun.com/2011-10-25/134534592.html22.html022.txt
(euc-cn)
22-utf8.txt
23http://www.literature.org.cn/Article.aspx?id=4527623.html023.txt
(euc-cn)
23-utf8.txt
24http://www.duwenzhang.com/huati/huiyi/index1.html24.html024.txt
(euc-cn)
24-utf8.txt
25http://blog.zol.com.cn/3156/article_3155756.html25.html025.txt
(euc-cn)
25-utf8.txt
26http://tt.mop.com/read_9284578_1_0.html26.html026.txt
(euc-cn)
26-utf8.txt
27http://www.523333.com/xiaoyuan/10470.html27.html027.txt
(euc-cn)
27-utf8.txt
28http://www.china5080.com/memory/134388.html28.html0 - 28-utf8.txt
29http://bbs.zjol.com.cn/viewthread.php?tid=474122129.html029.txt
(ISO-8859-15)
29-utf8.txt
30http://www.jiangshanren.com/thread-32346-1-1.html30.html030.txt
(euc-cn)
30-utf8.txt
31http://ent.ynet.com/3.1/1110/31/6406148.html31.html031.txt
(euc-cn)
31-utf8.txt
32http://yule.sohu.com/20111026/n323476861.shtml32.html032.txt
(euc-cn)
32-utf8.txt
33http://ent.163.com/11/1028/20/7HFRC5UR00032DGD.html33.html033.txt
(euc-cn)
33-utf8.txt
34http://joy.online.sh.cn/joys/gb/content/2011-10/27/content_4914524.htm34.html0 - 34-utf8.txt
35http://www.morningpost.com.cn/wenti/wyxw/2011-10-18/225048.shtml35.html0 - 35-utf8.txt
36http://movie.runsky.com/RKnews/htmlnews/20111023/27037011211.0.html36.html036.txt
(euc-cn)
36-utf8.txt
37http://www.cnbeta.com/articles/157555.htm37.html037.txt
(euc-cn)
37-utf8.txt
38http://www.chinanews.com/cul/2011/08-13/3255744.shtml38.html038.txt
(euc-cn)
38-utf8.txt
39http://gb.cri.cn/27564/2011/08/31/1326s3355743.htm39.html039.txt
(euc-cn)
39-utf8.txt
40http://ent.oeeee.com/a/20110828/486242.html40.html040.txt
(euc-cn)
40-utf8.txt
41http://pc.pcgames.com.cn/pcnews/1110/2350197.html41.html041.txt
(euc-cn)
41-utf8.txt
42http://www.izhufu.com/meishi/1313466644.shtml42.html042.txt
(euc-cn)
42-utf8.txt
43http://news.66163.com/2011-09-04/548336.shtml43.html043.txt
(euc-cn)
43-utf8.txt
44http://edu.zjol.com.cn/05edu/system/2011/10/30/017955574.shtml44.html044.txt
(euc-cn)
44-utf8.txt
45http://lzcb.gansudaily.com.cn/system/2011/09/01/012157096.shtml45.html045.txt
(euc-cn)
45-utf8.txt
46http://tv.sohu.com/20110815/n316332228.shtml46.html046.txt
(euc-cn)
46-utf8.txt
47http://www.ce.cn/xwzx/shgj/gdxw/201109/28/t20110928_22728355.shtml47.html047.txt
(euc-cn)
47-utf8.txt
48http://society.yunnan.cn/html/2011-08/14/content_1773709.htm48.html0 - 48-utf8.txt
49http://www.chuhoi.com/news/hotspot/2011-09-15/13004.html49.html049.txt
(euc-cn)
49-utf8.txt
50http://jx.zjol.com.cn/05jx/system/2011/05/24/017545472.shtml50.html050.txt
(euc-cn)
50-utf8.txt


Tableau n° 2

fichier : url-mot-français.txt
n°URLURLPAGES ASPIREESRet. CURLDUMP initial
(non utf-8)
DUMP utf-8
1http://www.desordre.net/memory/perec/index.htm1.html01.txt
(iso-8859-1)
1-utf8.txt
2http://lejardin.forumactif.com/t2773-vous-en-souvenez-vous2.html02.txt
(windows-1252)
2-utf8.txt
3http://forum.doctissimo.fr/psychologie/memoire/souvenez-votre-enfance-sujet_133695_1.htm3.html0 - 3-utf8.txt
4http://www.evene.fr/citations/mot.php?mot=souviens4.html0 - 4-utf8.txt
5http://www.evene.fr/citations/mot.php?mot=souviens&p=25.html0 - 5-utf8.txt
6http://www.evene.fr/citations/mot.php?mot=souviens&p=36.html0 - 6-utf8.txt
7http://www.dicocitations.com/citation.php?mot=souvenez_vous7.html07.txt
(ISO-8859-1)
7-utf8.txt
8http://www.dicocitations.com/citation/souvenez_vous/1/10.php8.html08.txt
(ISO-8859-1)
8-utf8.txt
9http://www.dicocitations.com/citation/souvenez_vous/1/20.php9.html09.txt
(ISO-8859-1)
9-utf8.txt
10http://www.e-sante.fr/perte-memoire/symptome-maladie/152410.html0 - 10-utf8.txt
11http://www.lepoint.fr/editos-du-point/patrick-besson/je-ne-me-souviens-pas-07-04-2011-1316617_71.php11.html0 - 11-utf8.txt
12http://francais.agonia.net/index.php/essay/1834014/Le_m%C3%A9canisme_de_la_m%C3%A9moire_chez_Proust12.html012.txt
(windows-1252)
12-utf8.txt
13http://remacle.org/bloodwolf/philosophes/Aristote/memoire.htm13.html0 - 13-utf8.txt
14http://www.pseudo-sciences.org/spip.php?article104914.html0 - 14-utf8.txt
15http://www.scienceshumaines.com/se-souvenir-pour-anticiper_fr_15055.html15.html015.txt
(iso-8859-1)
15-utf8.txt
16http://www.etudes-litteraires.com/dissertation-biographique.phpPage non aspiree...22Page
non "aspirée"
Page
non "aspirée"
17http://help.yahoo.com/l/fr/yahoo/mail/yahoomail/account/account-10.html17.html0 - 17-utf8.txt
18http://www.psychologies.com/Therapies/Psychanalyse/Reves/Articles-et-Dossiers/Je-ne-me-souviens-jamais-de-mes-reves18.html0 - 18-utf8.txt
19http://www.commentfaiton.com/fiche/voir/11484/comment-se-souvenir-de-ses-reves19.html0 - 19-utf8.txt
20http://www.atoute.org/n/forum/showthread.php?t=13180020.html020.txt
(ISO-8859-1)
20-utf8.txt
21http://www.je-me-souviens.fr/page/4/21.html0 - 21-utf8.txt
22http://www.pasteurweb.org/Etudes/Encouragements/SouviensToi.htm22.html022.txt
(windows-1252)
22-utf8.txt
23http://latabledemk30.canalblog.com/archives/2011/10/27/22423582.html23.html0 - 23-utf8.txt
24http://www.familles.com/v4/forums/forums-familiaux-papotages-autres-sujets-de-quoi-vous-souvenez-vous-sur-ce-forum-t826602.html24.html024.txt
(iso-8859-1)
24-utf8.txt
25http://www.web-libre.org/breves/reveil,17460.html25.html025.txt
(iso-8859-1)
25-utf8.txt
26http://politproductions.com/content/souvenez-vous-jmg-le-cl%C3%A9zio26.html0 - 26-utf8.txt
27http://arnaudh-s.blogspot.com/2008/07/je-me-souviens-encore.html27.html0 - 27-utf8.txt
28http://pititeshistoiressansconckence.blogspot.com/2006/09/je-me-souviens-de-toi.html28.html0 - 28-utf8.txt
29http://coeurs2soeurs.skyrock.com/29.html029.txt
(ISO-8859-15)
29-utf8.txt
30http://prettywords.skyrock.com/30.html030.txt
(ISO-8859-15)
30-utf8.txt
31http://xrapee-me.skyrock.com/31.html031.txt
(ISO-8859-15)
31-utf8.txt
32http://www.cairn.info/article.php?ID_ARTICLE=DIO_201_012732.html0 - 32-utf8.txt
33http://www.cairn.info/revue-cahiers-de-psychologie-clinique-2001-2-page-81.htm33.html0 - 33-utf8.txt
34http://www.bienpublic.com/grand-dijon/2011/10/31/la-toussaint-perd-du-terrain34.html0Encodage
non détecté
Encodage
non détecté
35http://www.ledevoir.com/international/europe/334673/le-triomphe-de-merkel35.html0 - 35-utf8.txt
36http://mesetudesfrancaises.blogspot.com/2011/08/les-souvenirs.html36.html0 - 36-utf8.txt
37http://tvanouvelles.ca/lcn/infos/lemonde/archives/2011/10/20111026-165416.html37.html037.txt
(iso-8859-1)
37-utf8.txt
38http://www.femmeactuelle.fr/actu/dossiers-d-actualite/elles-se-souviennent-du-11-septembre-2001-0090538.html0 - 38-utf8.txt
39http://www.charles-de-gaulle.org/pages/l-homme/dossiers-thematiques/1940-1944-la-seconde-guerre-mondiale/les-forces-francaises-libres/temoignages/fred-moore-les-engages-des-ffl-de-1940-se-souviennent.php39.html0 - 39-utf8.txt
40http://www.lest-eclair.fr/article/a-la-une/les-habitants-du-nogentais-se-souviennent40.html0 - 40-utf8.txt
41http://news.fr.msn.com/election-presidentielle-2012-et-vous/photo.aspx?cp-documentid=15738609241.html0 - 41-utf8.txt
42http://www.tuesmarocain.com/tu-te-souviens-besoin-decodeur-2m/42.html0 - 42-utf8.txt
43http://www.trombi.com/magazine/Jaime-aller-voir-les-professeurs-pour-voir-sils-se-souviennent-de-moi43.html043.txt
(iso-8859-1)
43-utf8.txt
44http://outre-vie.forumactif.com/t3325-vous-souvenez-vous-de-vies-anterieures-extra-terrestre44.html044.txt
(windows-1252)
44-utf8.txt
45http://elc95.blogspot.com/2011/08/vous-souvenez-vous-de-tang.html45.html0 - 45-utf8.txt
46http://lambercier.unblog.fr/brassees-de-souvenirs-la-honte-de-la-deculottee-publique/46.html0 - 46-utf8.txt
47http://sante-medecine.commentcamarche.net/forum/affich-992843-vous-souvenez-vous-de-votre-premiere-rentree-scolaire47.html047.txt
(iso-8859-1)
47-utf8.txt
48http://www.tierslivre.net/spip/spip.php?article231648.html0 - 48-utf8.txt
49http://www.rue89.com/2010/08/03/la-sexomnie-inquietante-sexualite-somnambule-160846?page=149.html0 - 49-utf8.txt
50http://www.cine3mondes.com/vente-dvd/asie,3,oncle-boonmee-celui-qui-se-souvient-de-ses-vies-anterieures,25550.html050.txt
(iso-8859-1)
50-utf8.txt