Les TABLEAUX de données

Tableau n° 1

fichier : url-mot-chinois.txt
n°URLURLPAGES ASPIREESRet. CURLDUMP initial
(non utf-8)
DUMP utf-8CONTEXTESCONTEXTES
HTML
NB Occur
1http://www.huiyi99.com/resource/samples.html1.html0 - 1-utf8.txt1-utf8.txt1-utf8.html0
2http://www.chinanews.com/shipin/2011/09-08/news44025.html2.html02.txt
(iso-8859-1)
2-utf8.txt2-utf8.txt2-utf8.html0
3http://roll.sohu.com/20110828/n317579384.shtml3.html03.txt
(iso-8859-1)
3-utf8.txt3-utf8.txt3-utf8.html0
4http://news.zbinfo.net/gj/1092792.shtml4.html04.txt
(gb2312)
4-utf8.txt4-utf8.txt4-utf8.html0
5http://www.cqcb.com/cbnews/instant/20111016/19930.html5.html05.txt
(iso-8859-1)
5-utf8.txt5-utf8.txt5-utf8.html0
6http://roll.sohu.com/20111015/n322244679.shtml6.html06.txt
(iso-8859-1)
6-utf8.txt6-utf8.txt6-utf8.html0
7http://www.hottx.net/history/hyl/7.html07.txt
(iso-8859-1)
7-utf8.txt7-utf8.txt7-utf8.html0
8http://news.sina.com.cn/c/2011-10-25/183023360769.shtml8.html08.txt
(iso-8859-1)
8-utf8.txt8-utf8.txt8-utf8.html0
9http://roll.sohu.com/20111031/n324050821.shtml9.html09.txt
(iso-8859-1)
9-utf8.txt9-utf8.txt9-utf8.html0
10http://news.163.com/11/1030/09/7HJSIDNK00014JB6.html10.html010.txt
(iso-8859-1)
10-utf8.txt10-utf8.txt10-utf8.html0
11http://www.17u.com/blog/article/606622.html11.html0 - 11-utf8.txt11-utf8.txt11-utf8.html0
12http://www.997788.com/member/detail.asp?daihao=%BB%D8%D2%E4%C0%FA%CA%B712.html012.txt
(iso-8859-1)
12-utf8.txt12-utf8.txt12-utf8.html0
13http://news.sohu.com/20111011/n321743397.shtml13.html013.txt
(gbk)
13-utf8.txt13-utf8.txt13-utf8.html0
14http://trans.wenweipo.com/gb/paper.wenweipo.com/2011/06/17/ED1106170002.htm14.html014.txt
(iso-8859-1)
14-utf8.txt14-utf8.txt14-utf8.html0
15http://mil.cnwest.com/content/2010-11/24/content_3793522.htm15.html0 - 15-utf8.txt15-utf8.txt15-utf8.html0
16http://ent.cn.yahoo.com/ypen/20110303/238382.html16.html0 - 16-utf8.txt16-utf8.txt16-utf8.html0
17http://d.lotour.com/henansbwg/20110318/589294.shtml17.html017.txt
(iso-8859-1)
17-utf8.txt17-utf8.txt17-utf8.html0
18http://city.ifeng.com/cshm/20110615/104269.shtml18.html0 - 18-utf8.txt18-utf8.txt18-utf8.html0
19http://www.olympic.cn/news/olympic_comm/2010-09-29/2078547.html19.html019.txt
(iso-8859-1)
19-utf8.txt19-utf8.txt19-utf8.html0
20http://roll.sohu.com/20111031/n324056806.shtml20.html020.txt
(iso-8859-1)
20-utf8.txt20-utf8.txt20-utf8.html0
21http://roll.sohu.com/20111031/n324042273.shtml21.html021.txt
(iso-8859-1)
21-utf8.txt21-utf8.txt21-utf8.html0
22http://news.hexun.com/2011-10-25/134534592.html22.html022.txt
(iso-8859-1)
22-utf8.txt22-utf8.txt22-utf8.html0
23http://www.literature.org.cn/Article.aspx?id=4527623.html0Encodage
non détecté
Encodage
non détecté
 -  -  - 
24http://www.duwenzhang.com/huati/huiyi/index1.html24.html024.txt
(iso-8859-1)
24-utf8.txt24-utf8.txt24-utf8.html0
25http://blog.zol.com.cn/3156/article_3155756.html25.html025.txt
(iso-8859-1)
25-utf8.txt25-utf8.txt25-utf8.html0
26http://tt.mop.com/read_9284578_1_0.html26.html026.txt
(gbk)
26-utf8.txt26-utf8.txt26-utf8.html0
27http://www.523333.com/xiaoyuan/10470.html27.html027.txt
(iso-8859-1)
27-utf8.txt27-utf8.txt27-utf8.html0
28http://www.china5080.com/memory/134388.html28.html0 - 28-utf8.txt28-utf8.txt28-utf8.html7
29http://bbs.zjol.com.cn/viewthread.php?tid=474122129.html029.txt
(iso-8859-1)
29-utf8.txt29-utf8.txt29-utf8.html2
30http://www.jiangshanren.com/thread-32346-1-1.html30.html030.txt
(iso-8859-1)
30-utf8.txt30-utf8.txt30-utf8.html0
31http://ent.ynet.com/3.1/1110/31/6406148.html31.html031.txt
(gb2312)
31-utf8.txt31-utf8.txt31-utf8.html0
32http://yule.sohu.com/20111026/n323476861.shtml32.html032.txt
(gbk)
32-utf8.txt32-utf8.txt32-utf8.html0
33http://ent.163.com/11/1028/20/7HFRC5UR00032DGD.html33.html033.txt
(iso-8859-1)
33-utf8.txt33-utf8.txt33-utf8.html0
34http://joy.online.sh.cn/joys/gb/content/2011-10/27/content_4914524.htm34.html0 - 34-utf8.txt34-utf8.txt34-utf8.html3
35http://www.morningpost.com.cn/wenti/wyxw/2011-10-18/225048.shtml35.html0 - 35-utf8.txt35-utf8.txt35-utf8.html1
36http://movie.runsky.com/RKnews/htmlnews/20111023/27037011211.0.html36.html036.txt
(gb2312)
36-utf8.txt36-utf8.txt36-utf8.html0
37http://www.cnbeta.com/articles/157555.htm37.html037.txt
(iso-8859-1)
37-utf8.txt37-utf8.txt37-utf8.html0
38http://www.chinanews.com/cul/2011/08-13/3255744.shtml38.html038.txt
(iso-8859-1)
38-utf8.txt38-utf8.txt38-utf8.html0
39http://gb.cri.cn/27564/2011/08/31/1326s3355743.htm39.html039.txt
(gb2312)
39-utf8.txt39-utf8.txt39-utf8.html0
40http://ent.oeeee.com/a/20110828/486242.html40.html040.txt
(iso-8859-1)
40-utf8.txt40-utf8.txt40-utf8.html0
41http://pc.pcgames.com.cn/pcnews/1110/2350197.html41.html041.txt
(gb2312)
41-utf8.txt41-utf8.txt41-utf8.html1
42http://www.izhufu.com/meishi/1313466644.shtml42.html042.txt
(iso-8859-1)
42-utf8.txt42-utf8.txt42-utf8.html0
43http://news.66163.com/2011-09-04/548336.shtml43.html043.txt
(iso-8859-1)
43-utf8.txt43-utf8.txt43-utf8.html0
44http://edu.zjol.com.cn/05edu/system/2011/10/30/017955574.shtml44.html044.txt
(iso-8859-1)
44-utf8.txt44-utf8.txt44-utf8.html0
45http://lzcb.gansudaily.com.cn/system/2011/09/01/012157096.shtml45.html045.txt
(iso-8859-1)
45-utf8.txt45-utf8.txt45-utf8.html0
46http://tv.sohu.com/20110815/n316332228.shtml46.html046.txt
(iso-8859-1)
46-utf8.txt46-utf8.txt46-utf8.html0
47http://www.ce.cn/xwzx/shgj/gdxw/201109/28/t20110928_22728355.shtml47.html047.txt
(iso-8859-1)
47-utf8.txt47-utf8.txt47-utf8.html0
48http://society.yunnan.cn/html/2011-08/14/content_1773709.htm48.html0 - 48-utf8.txt48-utf8.txt48-utf8.html0
49http://www.chuhoi.com/news/hotspot/2011-09-15/13004.html49.html049.txt
(iso-8859-1)
49-utf8.txt49-utf8.txt49-utf8.html0
50http://jx.zjol.com.cn/05jx/system/2011/05/24/017545472.shtml50.html050.txt
(iso-8859-1)
50-utf8.txt50-utf8.txt50-utf8.html0
 Fichier DUMP
global

49 fichier(s)
Fichier CONTEXTES
global

49 fichier(s)
 


Tableau n° 2

fichier : url-mot-français.txt
n°URLURLPAGES ASPIREESRet. CURLDUMP initial
(non utf-8)
DUMP utf-8CONTEXTESCONTEXTES
HTML
NB Occur
1http://www.desordre.net/memory/perec/index.htm1.html01.txt
(us-ascii)
1-utf8.txt1-utf8.txt1-utf8.html54
2http://lejardin.forumactif.com/t2773-vous-en-souvenez-vous2.html02.txt
(iso-8859-1)
2-utf8.txt2-utf8.txt2-utf8.html2
3http://forum.doctissimo.fr/psychologie/memoire/souvenez-votre-enfance-sujet_133695_1.htm3.html0 - 3-utf8.txt3-utf8.txt3-utf8.html21
4http://www.evene.fr/citations/mot.php?mot=souviens4.html0 - 4-utf8.txt4-utf8.txt4-utf8.html24
5http://www.evene.fr/citations/mot.php?mot=souviens&p=25.html0 - 5-utf8.txt5-utf8.txt5-utf8.html22
6http://www.evene.fr/citations/mot.php?mot=souviens&p=36.html0 - 6-utf8.txt6-utf8.txt6-utf8.html6
7http://www.dicocitations.com/citation.php?mot=souvenez_vous7.html07.txt
(iso-8859-1)
7-utf8.txt7-utf8.txt7-utf8.html0
8http://www.dicocitations.com/citation/souvenez_vous/1/10.php8.html08.txt
(iso-8859-1)
8-utf8.txt8-utf8.txt8-utf8.html0
9http://www.dicocitations.com/citation/souvenez_vous/1/20.php9.html09.txt
(iso-8859-1)
9-utf8.txt9-utf8.txt9-utf8.html0
10http://www.e-sante.fr/perte-memoire/symptome-maladie/152410.html0 - 10-utf8.txt10-utf8.txt10-utf8.html0
11http://www.lepoint.fr/editos-du-point/patrick-besson/je-ne-me-souviens-pas-07-04-2011-1316617_71.php11.html0 - 11-utf8.txt11-utf8.txt11-utf8.html38
12http://francais.agonia.net/index.php/essay/1834014/Le_m%C3%A9canisme_de_la_m%C3%A9moire_chez_Proust12.html012.txt
(windows-1252)
12-utf8.txt12-utf8.txt12-utf8.html0
13http://remacle.org/bloodwolf/philosophes/Aristote/memoire.htm13.html0 - 13-utf8.txt13-utf8.txt13-utf8.html13
14http://www.pseudo-sciences.org/spip.php?article104914.html0 - 14-utf8.txt14-utf8.txt14-utf8.html0
15http://www.scienceshumaines.com/se-souvenir-pour-anticiper_fr_15055.html15.html015.txt
(iso-8859-1)
15-utf8.txt15-utf8.txt15-utf8.html0
16http://www.etudes-litteraires.com/dissertation-biographique.php - 0
BAD
 -  -  -  -  - 
17http://help.yahoo.com/l/fr/yahoo/mail/yahoomail/account/account-10.html17.html0 - 17-utf8.txt17-utf8.txt17-utf8.html2
18http://www.psychologies.com/Therapies/Psychanalyse/Reves/Articles-et-Dossiers/Je-ne-me-souviens-jamais-de-mes-reves18.html0 - 18-utf8.txt18-utf8.txt18-utf8.html4
19http://www.commentfaiton.com/fiche/voir/11484/comment-se-souvenir-de-ses-reves19.html0 - 19-utf8.txt19-utf8.txt19-utf8.html0
20http://www.atoute.org/n/forum/showthread.php?t=13180020.html020.txt
(iso-8859-1)
20-utf8.txt20-utf8.txt20-utf8.html2
21http://www.je-me-souviens.fr/page/4/21.html0 - 21-utf8.txt21-utf8.txt21-utf8.html18
22http://www.pasteurweb.org/Etudes/Encouragements/SouviensToi.htm22.html022.txt
(windows-1252)
22-utf8.txt22-utf8.txt22-utf8.html10
23http://latabledemk30.canalblog.com/archives/2011/10/27/22423582.html23.html0 - 23-utf8.txt23-utf8.txt23-utf8.html0
24http://www.familles.com/v4/forums/forums-familiaux-papotages-autres-sujets-de-quoi-vous-souvenez-vous-sur-ce-forum-t826602.html24.html024.txt
(iso-8859-1)
24-utf8.txt24-utf8.txt24-utf8.html10
25http://www.web-libre.org/breves/reveil,17460.html25.html025.txt
(iso-8859-1)
25-utf8.txt25-utf8.txt25-utf8.html2
26http://politproductions.com/content/souvenez-vous-jmg-le-cl%C3%A9zio26.html0 - 26-utf8.txt26-utf8.txt26-utf8.html0
27http://arnaudh-s.blogspot.com/2008/07/je-me-souviens-encore.html27.html0 - 27-utf8.txt27-utf8.txt27-utf8.html5
28http://pititeshistoiressansconckence.blogspot.com/2006/09/je-me-souviens-de-toi.html28.html0 - 28-utf8.txt28-utf8.txt28-utf8.html7
29http://coeurs2soeurs.skyrock.com/29.html029.txt
(iso-8859-1)
29-utf8.txt29-utf8.txt29-utf8.html2
30http://prettywords.skyrock.com/30.html030.txt
(iso-8859-1)
30-utf8.txt30-utf8.txt30-utf8.html6
31http://xrapee-me.skyrock.com/31.html031.txt
(iso-8859-1)
31-utf8.txt31-utf8.txt31-utf8.html14
32http://www.cairn.info/article.php?ID_ARTICLE=DIO_201_012732.html0 - 32-utf8.txt32-utf8.txt32-utf8.html3
33http://www.cairn.info/revue-cahiers-de-psychologie-clinique-2001-2-page-81.htm33.html0 - 33-utf8.txt33-utf8.txt33-utf8.html2
34http://www.bienpublic.com/grand-dijon/2011/10/31/la-toussaint-perd-du-terrain34.html034.txt
(us-ascii)
34-utf8.txt34-utf8.txt34-utf8.html1
35http://www.ledevoir.com/international/europe/334673/le-triomphe-de-merkel35.html0 - 35-utf8.txt35-utf8.txt35-utf8.html1
36http://mesetudesfrancaises.blogspot.com/2011/08/les-souvenirs.html36.html0 - 36-utf8.txt36-utf8.txt36-utf8.html1
37http://tvanouvelles.ca/lcn/infos/lemonde/archives/2011/10/20111026-165416.html37.html037.txt
(iso-8859-1)
37-utf8.txt37-utf8.txt37-utf8.html0
38http://www.femmeactuelle.fr/actu/dossiers-d-actualite/elles-se-souviennent-du-11-septembre-2001-0090538.html0 - 38-utf8.txt38-utf8.txt38-utf8.html6
39http://www.charles-de-gaulle.org/pages/l-homme/dossiers-thematiques/1940-1944-la-seconde-guerre-mondiale/les-forces-francaises-libres/temoignages/fred-moore-les-engages-des-ffl-de-1940-se-souviennent.php39.html0 - 39-utf8.txt39-utf8.txt39-utf8.html7
40http://www.lest-eclair.fr/article/a-la-une/les-habitants-du-nogentais-se-souviennent40.html0 - 40-utf8.txt40-utf8.txt40-utf8.html3
41http://news.fr.msn.com/election-presidentielle-2012-et-vous/photo.aspx?cp-documentid=15738609241.html0 - 41-utf8.txt41-utf8.txt41-utf8.html3
42http://www.tuesmarocain.com/tu-te-souviens-besoin-decodeur-2m/42.html0 - 42-utf8.txt42-utf8.txt42-utf8.html6
43http://www.trombi.com/magazine/Jaime-aller-voir-les-professeurs-pour-voir-sils-se-souviennent-de-moi43.html043.txt
(iso-8859-1)
43-utf8.txt43-utf8.txt43-utf8.html5
44http://outre-vie.forumactif.com/t3325-vous-souvenez-vous-de-vies-anterieures-extra-terrestre44.html044.txt
(iso-8859-1)
44-utf8.txt44-utf8.txt44-utf8.html4
45http://elc95.blogspot.com/2011/08/vous-souvenez-vous-de-tang.html45.html0 - 45-utf8.txt45-utf8.txt45-utf8.html3
46http://lambercier.unblog.fr/brassees-de-souvenirs-la-honte-de-la-deculottee-publique/46.html0 - 46-utf8.txt46-utf8.txt46-utf8.html3
47http://sante-medecine.commentcamarche.net/forum/affich-992843-vous-souvenez-vous-de-votre-premiere-rentree-scolaire47.html047.txt
(iso-8859-1)
47-utf8.txt47-utf8.txt47-utf8.html2
48http://www.tierslivre.net/spip/spip.php?article231648.html0 - 48-utf8.txt48-utf8.txt48-utf8.html1
49http://www.rue89.com/2010/08/03/la-sexomnie-inquietante-sexualite-somnambule-160846?page=149.html0 - 49-utf8.txt49-utf8.txt49-utf8.html2
50http://www.cine3mondes.com/vente-dvd/asie,3,oncle-boonmee-celui-qui-se-souvient-de-ses-vies-anterieures,25550.html050.txt
(iso-8859-1)
50-utf8.txt50-utf8.txt50-utf8.html5
 Fichier DUMP
global

49 fichier(s)
Fichier CONTEXTES
global

49 fichier(s)