DigitalLibraries
MarcosAndr´eGonc¸alves∗EdwardA.Fox∗AaronKrowne†
P´avelCalado‡AlbertoH.F.Laender‡AltigranS.daSilva§BerthierRibeiro-Neto‡¶
Dept.ofComputer
ScienceVirginiaTechBlacksburg,VA,USA{mgoncalv,fox}@vt.edu
∗
§
LibrarySystemsEmoryUniversityGeneralLibrariesAtlanta,GA,USAakrowne@emory.edu
†
¶
‡
Dept.ofComputerScienceFederalUniversityofMinas
Gerais
BeloHorizonte,MG,Brazil{pavel,laender}@dcc.ufmg.br
Dept.ofComputer
Science
FederalUniversityof
AmazonasManaus,AM,Brazilalti@dcc.fua.br
ABSTRACT
Structuredorfieldedmetadataisthebasisformanydigi-tallibraryservices,includingsearchingandbrowsing.Yet,littleisknownabouttheimpactofusingstructureontheeffectivenessofsuchservices.Inthispaper,weinvestigateakeyresearchquestion:dostructuredqueriesimproveef-fectivenessinDLsearching?Toanswerthisquestion,weempiricallycomparedtheuseofunstructuredqueriestotheuseofstructuredqueries.WethentestedthecapabilityofasimpleBayesiannetworksystem,builtontopofaDLre-trievalengine,toinferthebeststructuredqueriesfromthekeywordsenteredbytheuser.Experimentsperformedwith20subjectsworkingwithaDLcontainingalargecollectionofcomputerscienceliteratureclearlyindicatethatstruc-turedqueries,eithermanuallyconstructedorautomaticallygenerated,performbetterthantheirunstructuredcounter-parts,inthemajorityofcases.Also,automaticstructuringofqueriesappearstobeaneffectiveandviablealternativetomanualstructuringthatmaysignificantlyreducetheburdenonusers.
AkwanInformationTechnologies
BeloHorizonte,MG,Brazil
www.akwan.com.br
berthier@akwan.com.brGeneralTerms
Experimentation,HumanFactors
Keywords
DigitalLibraries,StructuredQueries,BayesianNetworks
1.INTRODUCTION
CategoriesandSubjectDescriptors
H.3.7[InformationSystems]:InformationStorageandRetrieval—DigitalLibraries
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.
JCDL’04,June7–11,2004,Tucson,Arizona,USA.Copyright2004ACM1-58113-832-6/04/0006...$5.00.
EnsuringthehighqualityofDigitalLibrary(DL)servicesiskeytoguaranteeingDLusefulnessandpatrons’satisfac-tion.Largelybecauseofthisconcernforquality,metadata,andmorespecifically,structuredorfieldedmetadata,hashistoricallybeenthebasisformanydigitallibraryservices,includingbasiconessuchassearchingandbrowsing.Yet,regardingtheeffectivenessofsuchservices,littleisknownabouttheimpactofusingstructure.Moreover,whileafewDLservicestrytoutilizethisinformationthroughtheuseofadvancedinterfaces1,experiencehasshownthatusersrarelymakeuseofthesefeatures,mostprobablyduetothecom-plexityofuserinterfacesandlackofknowledgeofinternalDLstructures.
Inthispaper,weinvestigateakeyresearchquestion:dostructuredqueriesimprovesearcheffectivenessinDLs?Toanswerthisquestion,weempiricallycomparedtheuseofunstructuredqueriestotheuseofstructuredqueries.Sinceusersareoftenunwilling,orunable,tomanuallystructuretheirqueries,wealsoprovideasimplesystemthattriestoclosethegapbetweentheuser’sinformationneedandtheDLcontent.Thisexperimentalengine,builtontopofaBayesiannetworkmodelandaretrievalsystemoptimizedforDLs,triestoinferthebeststructuredqueriesfromthekey-See,forexample,http://www.acm.org/dl,http://www.informatik.uni-trier.de/~ley/db/indices/query.html.
1
or
wordsenteredbytheuser,basedonknowledgeofDLstruc-turesandcollectionstatistics.Averysimpletextboxuserinterfaceguaranteesthesimplicityoftheprocess.Toensurepropertreatmentoftheirinformationneed,userssimplyhavetochoosefromanautomaticallyproducedrankedlistofstructuredqueries.
Totestourhypothesesandmethods,weperformedaseriesofexperimentswith20subjects(graduatestudentsandresearchers)usingCITIDEL(ComputingandInforma-tionTechnologyInteractiveDigitalEducationalLibrary)2,aDLcontainingalargecollectionofcomputersciencelit-erature,includingmetadatafromtheACMDigitalLibrary,theDBLPcollection,NDLTD-Computing(thecomputingsubsetoftheNetworkedDigitalLibraryofThesesandDis-sertations)3,andotherssources.Results,usingthreediffer-entinformationretrieval(IR)measures,indicatethatstruc-turedqueries,eithermanuallyconstructedorautomaticallygenerated,performbetterthantheirunstructuredcounter-partsinthemajorityofcases.Also,automaticstructur-ingofqueriesappearstobeaviablealternativetomanualstructuring,sinceitreducesworkforusers,whileboostingeffectiveness.
Thispaperisorganizedasfollows.Section2explainstheunderlyingmodelsandcontextofthework.Section3de-scribesESSEX,aretrievalsystemoptimizedforDLs,whichprovidesforbasicretrievalcapabilitiesandforthestructur-ingprocess.Section4detailsthequerystructuringprocess,includingtheBayesiannetworkmodelandthequeryrankingschemes.Section5discussesexperimentalsetupandresults.Section6presentsrelatedworkandSection7concludesthepaper,alsoincludingplansforfuturework.
2.CONTEXTANDDEFINITIONS
Inthiswork,weadoptasimplifiedviewofthestructuredmetadatathatdescribesthecontentsofaDL.Accordingtothisview,eachdocumentordigitalobjectdoistoredintheDLisdescribedby,atleast,onemetadataspecification.Thej-thmetadataspecificationforobjectdoiisdefinedasasetofpairs:
msji={A1:v1ji,...,An:vnji},nji≥1
whereeachAkisanattributeormetadatafieldandeachvkjiisavaluebelongingtothedomainofAk.Wenotethattheattributesdonotneed,necessarily,tobethesameforallmetadataspecifications.
Forsomeattributes,insteadofasinglevalue,wemayhaveasetorlistofvalues.Forinstance,inametadataspecifi-cationdescribingapaper,theattributeauthormightbealistofnames.Torepresentthisusingournotation,weal-lowasameattributetoappearseveraltimes,herecalledavaluelist.Thus,ifattributeAp,inthemetadataspecifica-tionmsji,hasndifferentvalues,wecanrepresentmetadataspecificationmsjias:
msji={...,Ap:vp1,Ap:vp2,...,Ap:,...}WedefinethemetadataschemaofaDLasthesetofallattributesthatcomposeanyofthemetadataspecificationsofthatDL.Thus,themetadataschemaofaDLDisdefined
23
Seehttp://www.citidel.org/.Seehttp://www.ndltd.org/.
as:
SD={A|Aisanattributeof
somemetadataspecificationinD}
(1)
WedefineanunstructuredqueryUasasetofkeywords(orterms):
U={t1,t2,...,tk}
Asforametadataspecification,astructuredqueryQisdefinedasasetofpairs:
Q={A1:v1q,...,An:vnq},nq≥1,
whereeachAkisanattributeormetadatafieldandeachvkqavaluebelongingtothedomainofAk.
ThissimplifiedsetofdefinitionsallowsustoignorethedetailsofhowmetadataisactuallyrepresentedintheDL,sinceitcanbemappedfromanyactualrepresentationfor-mat.
3.THERETRIEVALSYSTEM:ESSEX
ESSEXisavector-spaceIRsystemoptimizedforthedig-itallibrarysetting.Itisdesignedtobelightandfast,andtomakefewdemandsonthearchitectureoftherestoftheDLsystem.ItachievestheseobjectivesbyanoptimizedC++implementation,anentirelyin-memoryindex,andaback-grounddaemonmodelusingsocketcommunicationwiththeDLapplication.
Inadditiontothesearchitecturalprovisions,ESSEXhasanumberofquerylanguagefeaturesthatmakeitwellsuitedtodigitallibraries.Besidesbasicfeaturessuchasforce/forbid(“+”and“-”)termoperators,ESSEXsupportsfieldfiltersandadjustablefieldweightings4.
Fieldfiltershavethesyntax“field:term”,where“field”isanindexedmetadatafield,and“term”isthequeryterm.Afieldfiltermodifiesthebehaviorofthesearchsuchthatmatcheswillonlybemadewithtermoccurrenceswithinthespecifiedfield.
ESSEXwasdevelopedprimarilyforCITIDELandcur-rentlyserves5asthesearchengineforCITIDELandPlan-etMath.Ourfamiliaritywiththecodemadeitanaturalchoiceasatest-bedfortheexperimentalquerystructuringsystemdiscussedinthispaper.Inaddition,ESSEX’sfieldfilteringcapabilityservedasthecoreofthequerystructur-ingengine.WealsoutilizedESSEX’ssupportforthe“+”operator,andmayuseitsfieldweightingsupportinthefu-ture.Detailsonhowsomeofthesefeatureswereusedareexplainedinthefollowingsections.
4.RANKINGQUERIES:THEBAYESIAN
NETWORKMODEL
Thissectionpresentsanoverviewoftheautomaticquerystructuringapproach.Westartbydescribingthegeneralqueryingprocessandexplainhowuserqueriesarestructuredautomaticallyandrankedaccordingtothelikelihoodthattheywillsatisfytheusers’needs.
4
theFieldweightingsallowtheDLprovideranduserresultscontributionfeaturesranking.ofForthemorevariousmetadatafieldstotothechangefinalaofESSEXandhowinformationtheymakeitonagoodthisandotherorg/~akrowne/elaine/essex/index.htmldigitallibrarysearchengine,seehttp://br.endernet.choicefor5
.Seehttp://planetmath.org/.
4.1TheQueryStructuringProcess
InourESSEXquerystructureinferencesystem,querystructuringconsistsof:(1)collectingtheunstructureduserquery,(2)buildingasetofcandidatestructuredqueries,and(3)rankingthecandidatequeriesaccordingtotheprob-abilityofbestrepresentingtheuser’sneeds,asproposedin[8,15].
Toexplainthesestepsindetail,assumethattheobjectsinourdigitallibraryhavefieldsauthorandtitle.LetU={t1,t2,t3}betheinitial,unstructuredqueryenteredbytheuser,wheret1,t2,andt3arethreedistinctterms.Tocreatethecandidatequeries,ESSEXsimplybuildsallpossiblecombinationsoffield-termpairs,usingthefieldsinthemetadataschemaoftheDLandthetermsenteredbytheuser.
Toillustrate,iftermt1occursbothinthetitleandintheauthorfieldsoftheobjectsintheDL,pairs Thecreationofthefield-termpairscanbefurtherre-strictedbyconsideringaminimumfrequencyofoccurrenceofaterminthefieldvaluesofthedigitalobjectsintheDL.Thus,if,say,termt1occurslessthanNtimesintheauthorfield,thepairauthor:t1wouldnotbecreated.ThevalueofNcanbeusedbothtoincreaseefficiency,byreducingthenumberofcandidatequeries,asalsotofilteroutspurioustermsthatmayoccurinafieldduetoerrorsinthedata.InourexperimentsthevalueofNwassetto1,sincethisfilteringprocesswasprovedunnecessary. Oncethesetofcandidatequeriesiscreated,eachqueryisevaluatedandrankedaccordingtotheprobabilityoffittingthedataintheDL.ThisisaccomplishedthroughtheuseoftheBayesiannetworkmodelfirstproposedbyCaladoetal.in[8],asexplainedinthefollowingsection. Figure1showsthearchitectureforthequerystructuringprocessinESSEX.Evaluationofastructuredquerytakesplaceintwophases.Thefirstistheevaluationoftheindi-vidualqueryterms(withfieldfilters).Forthisphase,eachtermissenttothesearchengineandaresultssetisreceived.Theranksoftheresultssetdocumentsarecombinedintoascoreforthequeryterm.Becausemanystructuredquerytermsoccurnumeroustimesovertheentiresetofcandidatestructuredqueries,theyarecachedinahashtablewhichmapsthemtotheircorrespondingfusedscores.Inourex-periments,wefoundthatthiscachingspeededuptheentirestructuringprocessbymorethanafactorof3. Inthesecondphase,thescoresofthestructuredquerytermsarecombinedintoafinalscorefortheentirestruc-turedquery.Thisscoreisthenusedtogeneratetheranksforthesetofallpotentialstructuredqueries. raw query{structuredqueries}Structure_MainA_1A_2Generate_Structured_QueriesA_kDomainDBsScore_Structured_QueriesTermCached_Score_Query_TermsHashStructuring SubsystemInvertedSearchIndexSearch SubsystemFigure1:Architectureofthequerystructuringsys-tem. Incaseswherethefulldigitallibrarycontentisnotacces-sible,orinordertoimproveefficiency,notalloftheDLob-jectsareusedinthequerystructuringprocess.Instead,onlyasubsetoftheDLisconsideredforbuildingthecandidatestructuredqueriesandforderivingthenecessarystatisticsfortheBayesiannetworkmodel.ThissubsetiscalledthesampledatabaseandisgenerallybuiltbytakingasampleofobjectsfromtheDLthatarerepresentativeofthewholeDLcontent.Amoredetaileddiscussiononhowasampledatabaseisbuiltcanbefoundin[15]. WenowpresentabriefexplanationoftheBayesiannet-workmodelusedasabasisforthisimplementation,andemphasizethechangesneededtoadaptittoourexperimen-talcollection. 4.2FindingtheBestStructuredQueries Therankingofcandidatestructuredqueriesisaccom-plishedthroughtheuseoftheBayesiannetworkmodelpro-posedin[8].Forclarity,themodelispresentedinFigure2.Wenotethatalthoughthenetworkcanbeeasilyexpandedtomodelanymetadataschema,forsimplicity,hereweshowonlytwofields,A1andA2. ThenetworkinFigure2consistsofasetofnodes,eachrepresentingapieceofinformation.Witheachnodeinthenetworkisassociatedabinaryrandomvariable,whichtakesthevalue1toindicatethatthecorrespondinginformationwillbeaccountedforintherankingcomputation.Inthiscase,wesaythattheinformationwasobserved.Inthenet-work,theDLisrepresentedbynodeO,eachnodeAirepre-sentsafield,eachnodeAijrepresentsthej-thvalueoffieldAi,eachnodeaijrepresentsaterminthevalueoffieldAi, Q1Q2Q3Q4Q11Q21Q31Q22Q32Q42a1a11a12a13a1k1a21a22a23a2k2a2A11A12A13A1n1A21A22A23A2n2A1A2OFigure2:Bayesiannetworkmodelforrankingstruc-turedqueries. eachnodeQirepresentsastructuredquerytoberanked,andeachnodeQijrepresentstheportionofthestructuredqueryQithatcorrespondstothefieldAj.Vectorsa1anda2eachrepresentapossiblestateofthevariablesassociatedwithnodesa1ianda2i,respectively. Asreflectedbytheedgesinthenetwork,fieldvaluesAijarecomposedoftermsaij,afieldAiiscomposedofallitspossiblevalues,andtheDLOiscomposedofallitsfields.Thus,theobservationofasetoftermswillinfluencetheobservationofavalue,theobservationofasetofvalueswillinfluencetheobservationofafield,andtheobservationofafieldwillinfluencetheobservationoftheDL.Similarly,aqueryQiiscomposedoffieldvaluesQijthatarealsocom-posedoftermsaij.ThelikelihoodofacandidatestructuredqueryQifittingtheDLOcanbeseenastheprobabilityofobservingQi,giventhatOwasobserved,i.e.,P(Qi|O).Byappropriatelydefiningtheconditionalprobabilitiesde-scribedbythenetworkinFigure2,weobtainthefollowingequation: P(Q1 i|O)=η×n121− 1−cos(A1j,a1)j=1 n2 +1− 1−cos(A2j,a2) (2) j=1 wherea1anda2arethestatesinwhichonlythequeryterms referringtofieldsA1andA2,respectively,areobserved;n1andn2arethetotalnumberofvaluesforfieldsA1andA2intheDL;andηaccountsfortheconstants1/P(O),P(a1),andP(a2).Thefunctioncos(Aij,ai)representsthesimilaritybetweenthefieldvalueAijandthetermsinthecandidatequerybeingranked.Itisdefinedasthetradi-tionalvectorspacecosinesimilarity[31]betweenthevectoroftermsrepresentingthefieldvalueAijandvectorai,whichrepresentsthetermsinthequery. WecaninterpretEq.(2)asstatingthattheprobabilityofthecandidatequeryfittingtheDLdependsontheprobabil-ityofeachofitsattributevaluesfittingthecorrespondingattributevaluesintheDL—themoreallqueryattributevaluesaresimilartothevaluesintheDL,thehighertheprobability.Thesimilarityofanindividualqueryattribute isrepresentedbyadisjunction,meaningthatitisenoughfortheattributetobeconsidered,ifonevalueintheDLequalsthevalueinthequery.ThisequalitybetweenvaluesinthequeryandvaluesintheDLisdeterminedthroughthecosinesimilarityfunction. Itisimportanttonotethat,althoughin[8]disjunctiveandconjunctiveoperatorsweresuggestedforthefinalcom-binationfunction,empiricaltestswiththecollectionusedinourexperimentsindicatedthatusingdisjunctiveopera-torsforprobabilityP(Ai|Aij)andameanforprobabilityP(O|Ai)yieldedthebestresults.ForfurtherdetailsonthederivationofEq.(2),referto[8,28]. Tocomputethecosinesimilarity,thevalueAijisseenasavectorofkiterms.ToeachtermtinAij,weassignaweightwitthatreflectstheimportanceofthetermforfieldAi: wit=tfj(t)·ftfi(t)·fidf(t) (3) wheretfj(t)isthetermfrequencyoftermtindocumentj,i.e.,thenumberoftimestermtappearsindocumentj;ftfiisthefieldtermfrequency,i.e.,thenumberoftimestermtoccursinfieldi;andfidfistheinversefielddocumentfrequency,i.e.,theinverseofthenumberoffieldstermtappearsin. ThefirstfactorinEq.(3),tfj(t),isverycommoninvector-spaceIR.Itindicatesthatthemoretimestermtappearsindocumentj,themorerepresentativetisofdocu-mentj.AlthoughitisalsocommoninIRtohaveanidf,or“inverse-documentfrequency”factor,weleavethisoutforreasonsdiscussedbelow. Thesecondtwofactorsarenovelinourwork.Ontheonehand,ftfi(t),indicatesthatthemoretimestermtappearsinafieldi,themorerepresentativetisoffieldi.Ontheotherhand,iftermtappearsinmanyfields,thefactorfidf(t)indicatesthatitisprobablytoogenerictobeuseful.Wecallthese“fieldtf”and“fieldidf”respectively,astheyareanalogoustothestandardtfandidfdescribedpreviously.Thedifferenceisthattheyreflecttermdistributionsrela-tivetofieldsratherthandocuments.Thesetermweightingfunctionsdifferfromthoseusedin[8]duetothefactthatCI-TIDEL,andmanyotherdigitallibraries,containcollectionsofscientificpapersand,therefore,manytextualmetadatafields,beingverydifferentfromthecollectionsusedin[8],whichcontainedmostlyinformationoncommercialprod-uctsextractedfromWebdatabases.Inthecontextofquerystructuring,oneofthemaindifferencesintheCITIDELcaseisthatthereisalargeoverlapbetweenthevocabularyintheobject’sfieldssuchastitles,abstracts,publications,andauthors. Theneteffectoftheseweightingsistovalueterms:(1)stronglytotheextentthattheyoccurmanytimesinthespecifiedmetadatafield,(2)stronglytotheextentthattheyoccurinthemostcommonfieldfortheterm,and(3)weaklytotheextentthattheyare“diluted”byappearinginmanymetadatafields. Letusconsideranexampleofhowthisisuseful.As-sumethattheunstructuredqueryisgivenas“jonesalgo-rithm”,andthattheword“algorithm”appearsevenlyinthetitleandabstractfields,andahandfuloftimesinthepublicationfield.Also,assumetheword“jones”appearsasmallamountinthefieldabstract,butmuchmorefre-quentlyinthefieldauthor.Withtheweightingsdescribedabove,occurrencesof“algorithm”willhavesimilarvaluein eitherthetitleorabstractfields.However,occurrencesof“jones”inauthorwillbeworthmuchmorethanoccur-rencesinabstract.Finally,occurrencesof“algorithm”willbeworthlessthanoccurrencesof“jones”,because“algo-rithm”appearsinthreefields,while“jones”appearsinonlytwo. Giventhisweightingscheme,thecosineoftheanglebe-tweenvectorAijandvector aiisdefinedas: cos(A,ai)=∀t∈Tiwitgt(a i)ij(4) ∀t∈Tiw2itwheregt(ai)givesthevalueofthet-thvariableofthevector ai,andTiisthesetofalltermsinthevaluesoffieldAi.WenowcanrankallthestructuredqueriesbycomputingP(Qi|O)foreachofthem.Theuserthencanselectonequeryforprocessingfromamongthetoprankedones,orthesystemcansimplyprocessthefirstquery. 5.EXPERIMENTS Totestourresearchquestions,namely,(1)ifstructuredqueriesarebetterthanunstructuredonesand(2)ifau-tomaticallystructuredqueriescanperformaswellas(orbetterthan)theirmanuallyconstructedcounterparts,weconductedaseriesofexperimentswithrealusersandthestructuringBayesiannetwork,asdescribedinSection4,im-plementedontopoftheESSEXsearchengine. 5.1ExperimentalSetupandDesign ExperimentswereperformedontheCITIDELcollectionwhichcontainsmetadatafromtheACMDigitalLibrary,theDBLPcollection,NDLTD-Computing,andothersources-totalingmorethan440,000metadatarecords.Onlyasub-setoftheACMDigitalLibrary,withapproximately98,000metadatarecords,wasusedasasampledatabaseforquerystructuring.ThismeansthatallinformationusedbytheBayesiannetworkmodeltorankthestructuredquerieswastakenonlyfromthissubsetofCITIDEL.TheACMDLsub-setwaschosenasthesampledatabasesinceitcontainedthegreatestbreadthanddepthofmetadata,henceprovidingacomprehensiveamountofmetadataandcontentwidelyrep-resentativeofthemetadataandcontentofthewholecollec-tion.ThesetofmetadatafieldsconsideredintheexperimentwasSCITIDEL={title,abstract,author,publication},wherepublicationmeansthenameoftheconferenceorjournalwhereapaperwaspublished. Ourexperimentsinvolved20subjectsamongresearchersintheVirginiaTechDigitalLibraryResearchLabandgrad-uatestudentsfromaDigitalLibrarygraduatecourse.TheprocessisillustratedinFigure3.Eachsubjectwasin-structedtoissuefivesearchesforitemsoftheirowninterestintheCITIDELcollectionandproviderelevancejudgments(asrelevantornon-relevant)fortheitemsreturned.Sub-jectsweredividedintwogroups:G1andG2.SubjectsingroupG1werenotawareofthepossibilityofstructuringquerieswithfieldinformation.Theyissuedunstructuredqueries,whichwerethenautomaticallystructuredusingtheBayesiannetworkmodel.SubjectsingroupG2werere-quiredtoissuemanuallystructuredqueries.Forcompar-ison,anunstructuredversionofthemanuallystructuredquerywascreatedbyremovingallfieldstructureinforma-tionfromthesequeries,whichwereagainre-structuredusingtheBayesiannetworkmodel. Figure3:Experimentalprocessandevaluation. Allqueries,i.e.,theunstructuredquery(Q0),thebestofthetop5structuredqueries(Q1–Q5),andthemanuallystructuredquery(QM),weresenttotheESSEXsearchen-gineandthetop25itemsreturnedbyeachweremerged(withremovalofduplicates).Theresultingunionset(L∪)wascompletelyshuffled(L∪R)andpresentedtotheusersforrelevancejudgments. Allrelevantandnon-relevantitemsreturnedforeachquerywereusedtocomputeprecision,recall,andF1values.Pre-cision(P)isthepercentageofretrieveditemsthatarerele-vant.Itisusefulasanindicationofhowaccuratethesystemiswhenretrievingtheanswerstotheuser’squestion.Re-call(R)isthepercentageofalltherelevantitemsthatwereretrieved.Itindicatesifthesystemisabletoretrievealloftherelevantitems.Highrecallisespeciallyusefulwhentheuserneedstobecertainthatallrelevantinformationwillbefound.Specificallyinourcase,weusedrelativerecallregardingthepooledsetofrelevantdocumentsfromallofthequeries.Finally,F1combinesprecisionandrecallwithequalweightsandisdefinedasF1=2PR/(P+R).TheF1measurecombinesprecisionandrecallintoasinglevalue,providingasimplewayofevaluatingthesystem’soverallperformance. Topresenttheresults,wealsoconsidertwoformsofpre-cision:10-precisionandR-precision.The10-precisionmea-sureindicatestheprecisionforthefirst10itemsretrievedbythesystem.Thismeasureisimportantinpracticesinceitisknownthatuserstendtoonlylookatthetopresultsinarankedanswerset.TheR-precisionmeasureindicatestheprecisionwhenallrelevantdocumentswereretrieved.Itisameasureofhowmanyspuriousresultstheuserhastolookatbeforeallrelevantresultsareseen.Bothmeasuresareusefulindeterminingnotonlyifthesystemisabletoshowrelevantresultsatthetopofthelistofretrieveditems,butalsoifitcandiscoverallrelevantinformationwhilestillkeepingthenoiseleveltoaminimum. 5.2Results Beforetheexperiments,alltestsubjectsansweredashortquestionnaireregardingtheirbackgroundandknowledgeintheirareaofinterestincomputerscience.Amongotherquestions,userswereaskedtocitefiveresearchersandthreepublicationsthattheywouldconsiderofimportanceintheir selectedresearcharea.Asexpected,queriesweregenerallyveryshort,averaging2.59termsperquery,independentlyofbeingmanuallystructuredornot. Figure4:Distributionoffieldsandcombinationoffieldsinthemanuallystructuredqueries. Theaveragenumberofitemsindicatedasrelevantbysub-jectsingroupG1wasslightlyhigherthanforgroupG2(18.79vs.14.26records),withahighermedian(12vs.8)andstandarddeviation(19.46vs.14.33).Thismaybeex-plainedbythefactthatwhenusersareforcedtousefieldstructureinformation,queriestendtobemorefocusedandsonaturallytendtoretrieveasmallernumberofrelevantitems.Thissupportstheassumptionthatstructuredqueriesaremoreprecision-oriented. Figure4showsthedistributionoffieldsandcombinationoffieldsusedbysubjectsinG2intheirmanuallystructuredqueries.Itisworthnoticingthat63%ofthequeriescon-tainonlyonefield,withnoqueriesusingonlypublication,onlyoneusingthreefields,andnoneusingallfourfields.Thisdistributionmayagainbeexplainedbythelackofuserknowledgeaboutpublicationsandthedifficultyofcreatingstructuredqueriesmanually.Wenowexaminetheimpactofquerystructuring,manuallyandautomatically,onthequalityoftheretrievedresults. 5.2.1Unstructuredvs.StructuredQueries Tables1,2,3,and4showacomparisonbetweentheun-structuredquery(Q0),thetoprankedautomaticallystruc-turedquery(Q1),thebestofthetop5structuredqueries(Q1–Q5),andthemanuallystructuredquery(QM). Q1vs.Q0F110-precisionR-precisionG1.5%83.3%81.2%G273.4%.3%85.7%Table1:PercentageoftimesqueryQ1isbetterorequaltoqueryQ0consideringtheF1,10-precision,andR-precisionmeasures,ingroupsG1andG2. AverageF110-precisionR-precisionQ0(G1+G2)28.931.129.4Q1(G1+G2)36.451.449.7Table2:AverageF1,10-precision,andR-precisionvaluesforalltheQ0andQ1queries,ingroupsG1andG2together. Best(Q1–Q5,F110-precisionR-precisionQM)vs.Q0G181.2%100%100%G285.1%97.9%97.9%Table3:PercentageoftimestheBest(manualorautomatically)structuredqueryisbetterorequaltoqueryQ0consideringtheF1,10-precision,andR-precisionmeasures,ingroupsG1andG2. AverageF110-precisionR-precisionQ0(G1+G2)28.931.129.4Best(Q1–Q5,QM)62.284.584.7(G1+G2) Table4:AverageF1,10-precisionandR-precisionvaluesfortheBest(manualorautomatically)struc-turedqueryandqueryQ0,ingroupsG1andG2together. Forthe97queries6,thetoprankedstructuredqueryQ1hadabetterperformancethantheunstructuredqueryQ0inmostofthecases.ConsideringtheF1measure,resultsforQ1wereequaltoorbetterthanresultsforQ0inanaverageof69%ofthesearches,inbothgroups.Intermsof10-precision,Q1wasbetterorequaltoQ0inanaverage86.3%ofthesearches.IntermsofR-precision,Q1wasbetterorequaltoQ0in83.4%ofthesearches.Wecanconcludethat,withouttheneedofuserintervention(exceptfromenteringthequerykeywords),thesystemisabletoautomaticallyfindastructuredqueryinthetopoftherankedlistthatoutperformsasimplekeyword-basedsearchinthemajorityofcases.Infact,asshowninTable2,theaverageF1,10-precision,andR-precisionvaluesforQ0were28.9,31.1,and29.4,whileforQ1thecorrespondingvalueswere36.4,51.4,and49.7. WhencomparingthebestoftheQ1throughQ5andQMquerieswithQ0,itisclearthatusingstructuredqueriesisbetterthanasimplekeyword-basedsearch.Thebeststruc-turedqueryshowedresultsbetterorequaltoQ0in83.7%ofthesearches,consideringtheF1measureandin98.9%ofsearchesconsideringboth10-precisionandR-precision.Thebestqueryaveragevalueswere62.2forF1,84.5for10-precision,and84.7forR-precision. Theseresultsclearlyindicatethatstructuredqueries,ei-thermanuallyconstructedorautomaticallygenerated,per-formbetterthantheirunstructuredcounterpartsinthema-jorityofcases.ThesituationsinwhichtheunstructuredqueryQ0performedbettercanbeclassifiedintofourmajorcases: 1.Insufficientoroutdatedsamplingdata ThemostcommonreasonforQ0tooutperformthestructuredquerieswastheabsenceofsupportinthesampledatabaseforveryspecificqueries.Thiswasespeciallyevidentinqueriesconcerningnewtrendsincomputerscienceresearch(e.g.,“peer-to-peercomput-ing”,“cognitiveaffordance”,“multi-modalpresenta-tion”,or“discourseprocessing”).Oneobvioussolu-tionwouldbetousethewholecollectionasthesampledatabase,althoughthiscouldhaveanegativeimpactonperformance.Anotherpossibilityistousebettersamplingstrategies,whichcanguaranteehigh-qualitycoverageusingtheminimumpossibledata,andgoodpoliciesforupdates. 2.VeryspecificqueriesandstrictseparationbetweentitleandabstractItwasnoticedthat,inalmostallcases,subjectsdidnotcareinwhichfieldtherelevantconceptsinthequeryappeared.Forinstance,inveryspecificorshortquerieslike“kerberos”usersdonotcareiftheword“kerberos”appearsinthetitleortheabstract.Sincethequerystructuringprocessmustchooseonefieldtoinserttheword“kerberos”,say,thetitlefield,thestructuredquerybecomestoospecific,thusnotretrievingallrelevantitems,despitehavinggoodpre-cision.Thissuggeststhatsomekindofcombination(forinstance,aBooleanOR)offieldswithalargeover-lapinvocabulary,suchastitlesandabstracts,intheautomaticallystructuredquery,maybebeneficial. 6 Rawdataforthreequerieswaslost. 3.The“+”constraintistoorestrictive Anotherassumptioninthisworkwasthatstructuredqueriesare,bynature,morefocusedandthereforemoreprecision-oriented.Thisledtothedesignchoiceofenforcingthewordsinthestructuredqueriestoap-pearintherespectivefieldsbyusingthe“+”operator.Inafewcases,thisassumptionprovedtoorestrictive,mainlyinlongqueries(e.g.,“parallelallpairsshort-estpath”)orinquerieswithtwoormoreembeddedconceptsthatneveroccurtogetherinthecollection(e.g.,“peer-to-peercomparisonsystems”).Inthesecasesthe“+”constraintbecametoorestrictiveandre-turnednoneorfewresults.Subjectspreferredtomarkatleastafewrecordsasrelevant,evenifallrelevantconceptsinthequerydidnotappeartogetherinthedocument,ratherthanmarknoresultatall.Oneob-vioussolutionistorelaxthe“+”constraint,butinitialtestsshowedthatperformancewouldbedegraded.Abetterchoicecouldbetoidentifytheseextremecasesandonlythenrelaxtheconstraintorapplytechniquesofquerysplitting[22].4.Failureofthemodel Inaveryfewcasesthenetworkmodelwasunabletocorrectlyrankthestructuredqueries,evenwhentherewassupportfromthesampledatabase.Themainrea-sonsforthisproblemaretiesintherankingandskewedkeyworddistributions.Tiesoccurwhenseveralofthestructuredqueriesgetthesamescoreand,thus,theirrankingorderbecomesarbitrary.Keyworddistribu-tionsinthecollectionareskewedbecausesomefieldstendtocontainmorewordsthanothers.Forinstance,theabstractfieldislargerthanmostothersand,thus,containsthemajorityofwordsinthecollection.Forthisreason,theBayesiannetworkmodeltendstoas-signhigherprobabilitiestoqueriesthatcontainab-stracts.Onepossiblesolutiontobothproblemsistoassigndifferentweightstoeachfield,accordingtotheirrelativeimportanceinthecollection.Preliminaryex-perimentshaveshownthatthisstrategymaybeben-eficial,buttheproperchoiceofweightsforallfieldsishardtoobtainandwillrequirefurtherexperimenta-tion. 5.2.2 ManuallyturedQueries Structuredvs.AutomaticallyStruc-Wenextcomparetheperformanceofthebestautomati-callystructuredquerytothemanuallystructuredquery.AsshowninTable5,thebestautomaticallygeneratedquerytiedoroutperformedqueryQMin91.8%ofthesearches,consideringtheF1measure.Considering10-precisionandR-precision,thebeststructuredqueryequaledoroutper-formedQMin97.9%ofthesearches.TheaverageF1,10-precision,andR-precisionvaluesforqueryQMare55.6,72.5,and70.2,respectively,whileforthebeststructuredquerythevaluesare59.8,83.4,and84.7,respectively,asshowninTable6. Wenotethat,inmostcases,oneofthetopfiveautomati-callystructuredqueriespreciselymatchedthequerymanu-allystructuredbytheuser.Also,evenwhennotcompletelycorrectsemantically,automaticallystructuredqueriesgen-erallyoutperformedmanualstructuredqueries. Theseresultscanbeexplained.Itwasobservedduring Best(Q1–Q5)F110-precisionR-precisionvs.QMG291.8%97.9%97.9%Table5:PercentageoftimesthebestautomaticallystructuredqueryisbetterorequaltoqueryQMcon-sideringtheF1,10-precision,andR-precisionmea-sures,ingroupG2.AverageF110-precisionR-precisionManual55.672.570.2Best(Q1-Q5)59.883.484.7Table6:AverageF1,10-precisionandR-precisionvaluesforthebestautomaticallystructuredqueryandqueryQM,ingroupG2. theexperimentsthat,whenjudgingthereturneditems’rel-evance,testsubjectstendedtofocusmostlyonthetitlefield.Thus,incaseswheretheautomaticallystructuredquerycontainstherelevantconceptsinthetitlefield,usersalmostalwaysconsideredthereturneditemsasrelevant.Ontheotherhand,itemsthatcontainedrelevantconceptsinotherfields,suchasintheabstractorpublication,wereveryoftenignored.Partiallyforthisreason,theautomaticstructuringmodelwasabletooutperformtheresultsofthemanuallystructuredqueries,inwhichtheabstractfieldwasoftentheuser’schoice. ThefewcaseswhereQMperformedbetterthanthebestofthefivestructuredquerieswereduetooneofthefourcasesdescribedintheprevioussection,inparticularCase1,inwhichtherewasinsufficientoroutdatedinformationinthesamplingdata.Theseresultsshowthatautomaticstructuringofqueriesisaviablealternativetosubstitutemanualstructuring,andonethatsignificantlyreducestheburdenontheuserswhilestillyieldinggoodperformance. 6.RELATEDWORK Fewworkshaveexploreduserinterfacesthatfacilitatethesearchprocessindigitallibraries.However,therearenotableexceptions:theDLITEproject[13],SenseMaker[5],andthequerysynthesizersdescribedin[4].Inmostcases,DLsearchservicesarelimitedtosimplekeyword-basedqueryformulation,arathercommonresourceinalltypesofinformationretrievalsystems[3].Morerecently,keyword-basedqueriesalsohavebeenintroducedtostruc-tureddatabases[2,16,20].Furthermore,thereisalonghistoryofworkintheinformationretrievalcommunityon(semi)automaticgenerationofqueries[6,21,25,30,38]butitgenerallydidnotfocusonstructuringopportunities. Inthiswork,keyword-basedqueriesformulatedbytheuseraregivenstructurebytheuseofaBayesiannetworkmodel.ThisissomewhatsimilartotheworkofCroftetal.[14],whereBooleanqueriesarederivedfromausergivennaturallanguagequery,andthenimprovedwithautomati-callyinferredphrases.BayesiannetworkmodelswerefirstusedinIRproblemsbyTurtleandCroft[36]andlaterbyRibeiro-NetoandMuntz[29](uponwhoseworkourmodelisbased).Morerecently,Acidetal.[1]furtherrefinedsuchmodelssothatexactpropagationalgorithmscanbeused toefficientlycomputeprobabilities.BayesiannetworksalsohavebeenappliedtootherIRproblemsbesidesrankingas,forexample,relevancefeedback[24],automaticconstructionofhypertexts[33],queryexpansion[17],informationfilter-ing[9],rankingfusion[37],anddocumentclusteringandclassification[7,18].Nevertheless,nootherworkhasyetap-pliedBayesiannetworkstotheproblemofstructuringuserqueries. Recentresearchhasproposedseveralmodelsandlanguagesforretrievalofstructureddocuments[11,23,26,27,32,35].Again,differently,ourworkfocusesonstructuredmetadataandmainlyontheinferenceofthe“best”structuredqueriesbasedonabeliefnetworkmodelandthecollection’sstruc-turesandcontent. PoolingmethodssimilartotheoneemployedinthisworkhavebeenusedbeforetoassessretrievaleffectivenessinlargeanddynamiccollectionssuchastheWeb.PoolinghasbeenusedextensivelyintheannualTRECconferences[39].Theeffectofthepooldepth(i.e.,numberofdocumentstakenfromeachreturnedset)hasbeenstudiedinworkssuchas[40]and[12].Silvaetal.[34]usedapoolofthe10toprankedWebpagesreturnedby6differenttypesofbe-liefnetworksandconcludedthatthecombinationofhub,authorities,andcontent-basedevidentialinformationpro-videdsubstantialgainsinprecisioninasearchenginefortheBrazilianWeb.Canetal.[10]usedthetop200pagesreturnedby8searchenginestobuildatestsetforanau-tomaticsearchengineevaluationprocessandcompareditwithhumanjudgments. Finally,theworkherepresentedisbasedonthemodelfirstproposedin[8].Itdiffers,however,intwomainpoints.First,itprovidesauser-basedevaluationofthequerystruc-turingframework,whichconfirmstheresultsobtainedin[8]withanartificialquerylog.Second,itempiricallydemon-stratestheusefulnessofstructuredqueriesinthecontextofdigitallibrariesandshowsthatthesecanbeobtainedwithminimumusereffort. 7.CONCLUSIONS Inthispaper,throughanumberofuserexperiments,wehaveshownthat:(1)structuredqueriesperformbetterthanpurekeyword-basedqueriesinDLsearchingservicesbasedonfieldedmetadata;and(2)asystemcanbeusedtoauto-maticallyaddstructuretotheusers’queries,thusprovidingaviablealternativetomanualstructuringthatsignificantlyreducestheburdenontheuserswhilestillyieldinggoodperformance.Theexperimentsperformedconfirmthat,inthemajorityofcases,betterresultsareachievedbystruc-turedqueriesthanbyunstructuredqueries.Also,usingtheBayesiannetworkmodelproposedin[8]andanappropri-atetermweightingscheme,automaticallystructuredqueriesoutperformnotonlytheunstructuredqueriesbutalsothequerymanuallystructuredbytheuser. WemayconcludethatasystemsuchastheonedescribedinthisworkcanbeeffectivelyusedtoimproveDLsearchservices.Weenvisionasearchsystemthatisabletosuggestafewalternativestructuredqueriestotheuser.Accordingtoasemi-automaticscenario,thesestructuredqueriescanbepresentedtogetherwiththeresultsoftheinitialunstruc-turedquery.Byclickingononeofthesecandidates,theusercouldgetcorrespondingstructuredsearchresults–arefinementontheinitiallistofitemsretrieved.Alterna-tively,accordingtoafullyautomaticscenario,thesystem cansimplysubmitthehighestrankedstructuredquery,andprovidecorrespondingresultswithoutuserintervention.Furtherimprovementsonthemodelsusedinthisworkarepossible.Forinstance,ifthelistofcandidatestruc-turedqueriesistoolong,toomuchtimewouldbespentbytheuserinselectingthemostappropriatecandidate.Thus,itwouldbeimportanttoguaranteethat,inthemajorityofcases,thebeststructuredqueryisoneofthetoptwocandidates.Webelievethatsuchalevelofperformanceisultimatelyattainablewithminoradjustmentstoourmodelandimplementation.Combinationofothersourcesofevi-dencesuchaspastqueriesalsocouldbeappliedtoalleviatetheproblem. BesidestestingthesysteminproductionmodeinCITI-DELandotherDLshostedintheDigitalLibraryResearchLaboratory,futureworkwillcontinueinanumberofdirec-tions.First,wewanttoinvestigatedifferentandmoreeffec-tivesamplingstrategiesthatminimizethediscoveredprob-lemsofincompletenessandoutdatedinformation,includinggoodpoliciesforupdates.Second,wewillinvestigateau-tomaticfieldweightingbasedontherelativeorperceivedimportanceofthefields,inordertoincreasetheaccuracyofourmodel.Third,weintendtoinvestigatenewmodelsthatcancombinefieldswithlargevocabularyoverlap(e.g.,titlesandabstracts)inthequery,andtostudypossiblewaystorelaxthe“+”constraintwithoutreducingeffectiveness.Further,weplantoinvestigatetheeffectofthe“+”con-straintbyitselfinkeyword-basedqueries.Fourth,weplantoincorporaterelevancefeedbackandpersonalizedrankingstrategiesintoourbeliefnetworkmodels[19].Finally,weintendtoworkonnewapproachestoimprovesystemper-formancethatgobeyondoursimplecachingstrategy. Acknowledgments ThisresearchworkwasfundedinpartbyNSF,grants DUE0136690,DUE0121679,IIS0086227,andITR0325579,bytheI3DLproject,grant680154/01-9,bytheGERINDOproject,grantMCT/CNPq/CT-INFO552.087/02-5,byin-dividualgrantsMCT/FCTSFRH/BD/4662/2001(P´avelCal-ado)andCNPq3040/02-5(AlbertoH.F.Laender),bytheSiteFixproject,grantMCT-CNPQ-CT-INFO55.2197/02-5,byaPHILIPSMDSManausR&Dsponsorship(AltigranS.DaSilva),andbyafellowshipfromAOL(MarcosA.Gon¸calves). REFERENCES [1]S.Acid,L.M.deCampos,J.M.Fern´andez-Luna,and J.F.Huete.AninformationretrievalmodelbasedonsimpleBayesiannetworks.InternationalJournalofIntelligentSystems,18(2):251–265,January2003. [2]S.Agrawal,S.Chaudhuri,andG.Das.DBXplorer:A systemforkeyword-basedsearchoverrelationaldatabases.InProceedingsofthe18thInternationalConferenceonDataEngineering,pages5–16,SanJose,CA,USA,February2002. [3]R.Baeza-YatesandB.Ribeiro-Neto.Modern InformationRetrieval.AddisonWesley,NewYork,NY,USA,1999. [4]M.Baldonado,S.Katz,A.Paepcke,C.-C.K.Chang, H.Garcia-Molina,andT.Winograd.Anextensibleconstructortoolfortherapid,interactivedesignofquerysynthesizers.InDL’98:Proceedingsofthe3rd ACMInternationalConferenceonDigitalLibraries,pages19–28,Pittsburgh,PA,USA,June1998.[5] M.BaldonadoandT.Winograd.Sensemaker:Aninformation-explorationinterfacesupportingthecontextualevolutionofauser’sinterests.In ProceedingsofACMCHI97ConferenceonHumanFactorsinComputingSystems,pages11–18,Atlanta,GA,USA,March1997. [6] D.Cai,C.J.VanRijsbergen,andJ.M.Jose. Automaticqueryexpansionbasedondivergence.InProceedingsofthe10thInternationalConferenceonInformationandKnowledgeManagementCIKM’01,pages419–426,NewYork,November2001.[7] P.Calado,M.Cristo,E.Moura,N.Ziviani,B.Ribeiro-Neto,andM.A.Gon¸calves.Combininglink-basedandcontent-basedmethodsforwebdocumentclassification.InProceedingsofthe12thInternationalConferenceonInformationand KnowledgeManagement,pages394–401,NewOrleans,LA,USA,2003. [8] P.Calado,A.S.daSilva,R.C.Vieira,A.H.F.Laender,andB.A.Ribeiro-Neto.Searchingwebdatabasesbystructuringkeyword-basedqueries.InProceedingsofthe11thInternationalConferenceonInformationandKnowledgeManagement,pages26–33,McLean,VA,USA,2002.ACMPress.[9] J.P.Callan.Documentfilteringwithinferencenetworks.InProceedingsofthe19thAnnual InternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages262–269,Zurich,Switzerland,August1996. [10] F.Can,R.Nuray,andA.B.Sevdik.AutomaticperfomanceevaluationofWebsearchengines. InformationProcessingandManagement,2004.Inpress. [11] T.T.ChinenyangaandN.Kushmerick.ExpressiveretrievalfromXMLdocuments.InProceedingsofthe24thAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages163–171,NewOrleans,Louisiana,USA,September2001. [12] G.V.Cormack,C.R.Palmer,andC.L.A.Clarke.Efficientconstructionoflargetestcollections.InProceedingsofthe21stAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages282–2,Melbourne,Australia,August1998. [13] S.B.Cousins,A.Paepcke,T.Winograd,E.A.Bier,andK.Pier.Thedigitallibraryintegratedtask environment(DLITE).InDL’97:Proceedingsofthe2ndACMInternationalConferenceonDigital Libraries,pages142–151,Philadelphia,PA,USA,July1997. [14] W.B.Croft,H.R.Turtle,andD.D.Lewis.Theuseofphrasesandstructuredqueriesininformationretrieval.InProceedingsofthe13thAnnual InternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages32–45,Chicago,IL,USA,October1991. [15] A.S.daSilva,P.Calado,R.C.Vieira,A.H.F. Laender,andB.A.Ribeiro-Neto.EffectiveDatabasesforText&DocumentManagement,chapter [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] Keyword-basedQueriesoverWebDatabases,pages74–92.IdeaGroupPublishing,Hershey,PA,USA,2003. S.Dar,G.Entin,S.Geva,andE.Palmon.DTL’s DataSpot:Databaseexplorationusingplainlanguage.InProceedingsof24thInternationalConferenceonVeryLargeDataBasesVLBD’98,pages5–9,NewYork,NY,USA,August1998.L.M.deCampos,J.M.Fern´andez-Luna,andJ.F.Huete.QueryExpansioninInformationRetrievalSystemsUsingaBayesianNetwork-BasedThesaurus.InProceedingsofthe14thAnnualConferenceon UncertaintyinArtificialIntelligence(UAI–98),pages53–60,SanFrancisco,CA,July1998. S.T.Dumais,J.Platt,D.Hecherman,andM.Sahami.Inductivelearningalgorithmsand representationsfortextcategorization.InProceedingsofthe7thInternationalConferenceonInformationandKnowledgeManagementCIKM’98,pages 148–155,Bethesda,Maryland,USA,November1998.W.Fan,M.D.Gordon,andP.Pathak.Discoveryofcontext-specificrankingfunctionsforeffectiveinformationretrievalusinggeneticprogramming.IEEETransactionsonKnowledgeandDataEngineering,16(4):523–527,2003. D.Florescu,D.Kossmann,andI.Manolescu.IntegratingkeywordsearchintoXMLqueryprocessing.WWW9/ComputerNetworks,33(1–6):119–135,2000. E.A.Fox.RelationalModelsoftheLexicon:RepresentingKnowledgeinSemanticNetworks,chapterImprovedRetrievalUsingaRelational ThesaurusforAutomaticExpansionofBooleanLogicQueries,pages199–210.CambridgeUniversityPress,1988. E.A.FoxandF.D.Neves.Extendingretrievalwithsteppingstonesandpathways-NSFproposal(funded),2003. N.FuhrandK.Gross.XIRQL:aquerylanguageforinformationretrievalinXMLdocuments.In Proceedingsofthe24thAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages172–180,NewOrleans,Louisiana,USA,September2001. D.HainesandW.B.Croft.Relevancefeedbackandinferencenetworks.InProceedingsofthe16thAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages2–11,Pittsburgh,PA,USA,June1993. M.Mitra,A.Singhal,andC.Buckley.Improving automaticqueryexpansion.InProceedingsofthe21stAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages206–214,Melbourne,Australia,August1998.S.H.Myaeng,D.-H.Jang,M.-S.Kim,andZ.-C.Zhoo.AflexiblemodelforretrievalofSGMLdocuments.InProceedingsofthe21stAnnual InternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages138–145,Melbourne,Australia,August1998. G.NavarroandR.Baeza-Yates.Proximalnodes:Amodeltoquerydocumentdatabasesbycontentand structure.ACMTransactionsonInformationSystems,15(4):400–435,Oct.1997. [28]J.Pearl.ProbabilisticReasoninginIntelligent Systems:NetworksofPlausibleInference.MorganKaufmannPublishers,SanMateo,California,2ndedition,1988. [29]B.Ribeiro-NetoandR.Muntz.Abeliefnetwork modelforIR.InProceedingsofthe19thAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages253–260,Zurich,Switzerland,August1996. [30]G.Salton,C.Buckley,andE.A.Fox.Automatic queryformulationsininformationretrieval.JournaloftheAmericanSocietyforInformationScience,34(4):262–280,July1983. [31]G.SaltonandM.J.McGill.IntroductiontoModern InformationRetrieval.McGraw-Hill,Tokio,1983.[32]T.SchliederandH.Meuss.Queryingandranking XMLdocuments.JASIST,53(6):4–503,2002. [33]D.Shin,S.Nam,andM.Kim.Hypertextconstruction usingstatisticalandsemanticsimilarity.InDL’97:Proceedingsofthe2ndACMInternationalConferenceonDigitalLibraries,pages57–63,Philadelphia,PA,USA,July1997. [34]I.Silva,B.Ribeiro-Neto,P.Calado,E.Moura,and N.Ziviani.Link-basedandcontent-basedevidentialinformationinabeliefnetworkmodel.InProceedingsofthe23rdAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,TheoryandPracticein InformationRetrieval,pages96–103,Athens,Greece,July2000. [35]A.TheobaldandG.Weikum.AddingRelevanceto XML.InInt’lWorkshopontheWebandDatabases(WebDB),Dallas,TX,May2000. [36]H.R.TurtleandW.B.Croft.Inferencenetworksfor documentretrieval.InProceedingsofthe13thAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages1–24,Brussels,Belgium,September1990. [37]R.F.Valle,B.A.Ribeiro-Neto,L.R.S.deLima, A.H.F.Laender,andH.R.Freitas-Junior.Improvingtextretrievalinmedicalcollectionsthroughautomaticcategorization.InProceedingsofthe10th InternationalSymposiumonStringProcessingandInformationRetrievalSPIRE2003,pages197–210,Manaus,Brazil,October2003. [38]E.M.Voorhees.Queryexpansionusing lexical-semanticrelations.InProceedingsofthe17thAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages61–69,Dublin,Ireland,July1994. [39]E.M.VoorheesandD.Harman.Overviewofthesixth textREtrievalconference(TREC-6).Nov.1997.[40]J.Zobel.Howreliablearetheresultsoflarge-scale informationretrievalexperiments?InProceedingsofthe21stAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentin InformationRetrieval,pages307–314,Melbourne,Australia,August1998.
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- axer.cn 版权所有 湘ICP备2023022495号-12
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务