Jorge Miguel Calha Rainho Machado

Função: Professor

Number: 20128

Email Institutional: jmachado@estgp.pt

Courses he teaches

teacher in charge

Work Performed and Contributions

theses

others

  • Title Architecture and user interface for a geo-temporal search service (Não Publicado) -  
    year 2009
    Institution European Conference on Digital Libraries 2009
    Description / Summary This paper describes the architecture and user interface for a digital library where resources have geographic and temporal information. We discuss the importance of separating these two dimensions from the textual one. We detail the service components, emphasizing the search engine component and the support services. We present a case study purposed to rebuild the DIGMAP Search Service architecture. DIGMAP is a coo funded European Union project on old digitized maps. We discuss the overall architecture of the search service summarizing the support components which are: a metadata repository (Repox), a text parsing tool (Geoparser) and the Gazetteer. We follow a mashup approach, which in this scope comprises the quick creation of systems using existent components to provide new functionalities over the WEB. Our search service is based on Mitra system, summarized in this paper, which is a search engine platform to index spaces with both structured and unstructured information. We also detail how it was upgraded to use the geographic and temporal data provided by Geoparser. We detail the new user interface, of the search engine, built to take advantage of the available services and new ones provided.
    Electronic file machadoECDL2009.pdf (292 Kb)
  • Title User interface for a geo-temporal search service using DIGMAP components -  
    year 2009
    Institution DEMONSTRATION in ECDL 2009
    Description / Summary This demo presents a user interface for a Geo-Temporal search service built in the sequence of DIGMAP project. DIGMAP was a co-funded European Union project on old digitized maps, and deals with resources rich in geographic and temporal information. This search interface followed a mashup approach using existing DIGMAP components: a metadata repository, a text mining tool, a Gazetteer, and a service to generate geographic contextual thumbnails. Google Maps API is used to provide a friendly and interactive user interface. This demo will present the resulting geo-temporal search engine functionalities, whose interface uses WEB 2.0 capabilities to provide contextualization in time and space and text clustering.
    Electronic file ecdl2009VersõesFinais.zip (9562 Kb)

articles

  • Title A Micro-analysis of Topic Variation for a Geotemporal Query -  
    year 2013
    Conference / Workshop / Magazine Institution INESC-ID, National Institute of Electroniques and Computer Systems, Lisbon, PORTUGAL
    Description / Summary Bias introduced in question wording is a well-known problem in political attitude survey polling. For example, the question "The President believes our military mission in Afghanistan is a vital national interest -- agree/disagree?" is quite different from the question: "Do you believe that a military mission in Afghanistan is in the USA’s vital national interest?" Response variation according to different question wording has been studied by researchers in survey methodology. However the influence on search results from variations of topic wording has not been examined for geotemporal information retrieval. For the GeoTime evaluation in NTCIR Workshop 9, the organizers decided to attempt to do an experiment in query variability in order to study variability of performance. We took a single information need and expressed it in three different ways: 1) as a single event question, 2) as a question which would yield an open-ended list (e.g. the classic “which countries did the Pope visit in the last three years”), or 3) a reformulation or the single event question as a location (latitude/longitude) and time inquiry. This paper reports the results of this micro-analysis of variation effects upon a single query expressed in different formats, as well as the degree of success (or failure) which we achieved (or did not achieve) our explicit goal of being able to distinguish performance outcomes for the different formulations.
    Electronic file 02-EVIA2011-GeyF.pdf (907 Kb)
  • Title Geo-Temporal retrieval filtering versus answer resolution using Wikipedia -  
    year 2011
    Conference / Workshop / Magazine Institution INESC-ID, Lisbon
    Description / Summary We describe an evaluation experiment on GeoTemporal Document Retrieval created for the GeoTime evaluation task of NTCIR 2011. This work describes the retrieval techniques developed to accomplish this task. We describe the collections used in the workshop, detailing the composition of the collections in terms of geographic and temporal expressions. The first contribution of this work is the collections’ statistics, which by itself reveals the relevance of this subject. Our parsing techniques found millions of references related with the dimensions of relevance time and space. Those references were used to index the documents in order to score them in those dimensions. We also introduce a technique to find extra references in Wikipedia using Google Search Service and the same parsers used in the collections. Those references were used in four different scenarios depending on the queries: first we used the references found in topics to filter documents without geographic or temporal expressions and used pseudo relevance feedback to expand topics with no references using the indexes created for places and dates; in other approach we used the Wikipedia references to filter documents from the result set, in a last approach we expanded all topics with the Wikipedia references. Finally we used another technique based on metric distances calculated through coordinates (latitudes and longitudes) and dates in order to create a scope for documents and topics, and rank them according to the distance between each other.
    Electronic file 06-NTCIR9-GEOTIME-MachadoJ-2011.pdf (1034 Kb)
  • Title NTCIR9-GeoTime Overview - Evaluating Geographic and Temporal Search: Round 2 -  
    year 2011
    Conference / Workshop / Magazine Institution INESC-ID, National Institute of Electroniques and Computer Systems, Lisbon, PORTUGAL
    Description / Summary GeoTime for the NTCIR Workshop 9 is the second evaluation of Geographic and Temporal Information Retrieval called “NTCIR GeoTime”. The focus of this task is on search with Geographic and Temporal constraints. This overview describes the data collections (Japanese and English news stories), topic development, assessment results and lessons learned from this second NTCIR GeoTime task, which combines GIR with timebased search to find specific events in a multilingual collection. Six teams submitted Japanese runs and nine teams submitted English runs. Three teams participated in both Japanese and English.
    Electronic file 01-NTCIR9-OV-GEOTIME-GeyF-2011.pdf (958 Kb)
  • Title LGTE: Sistema aberto de Recuperação de Informação Textual, Geográfica e Temporal. -  
    year 2010
    Conference / Workshop / Magazine Institution II JORNADAS SASIG, Évora, 2-4 Novembro 2009
    Description / Summary Este artigo apresenta o LGTE1 (Lucene Geo-Temporal Extensions), um sistema de Recuperação de Informação (RI) textual, geográfica e temporal que estende o sistema aberto Lucene2, um motor para indexação de texto escrito em Java. O LGTE é o motor por trás do serviço de pesquisa3 do DIGMAP4. O LGTE permite indexar colecções de documentos em XML e inclui uma serie de utilitários para funcionalidades comuns dos motores de busca que podem ser facilmente verificadas numa DEMO5 que vem com o pacote. O LGTE inclui ainda um componente para criar experiências de avaliação de RI que usa o formato CLEF/TREC e que disponibiliza diferentes utilitários como stemmers linguísticos e de n-gramas, modelos de expansão de query, modelos de ranking geográfico [1], modelos de ranking textual tais como o Okapi BM25, o modelo de linguagem do sistema Lucene-LM6, o Vector Space Model do Lucene, e os modelos divergence from randomness do sistema Terrier7. Este artigo apresenta a arquitectura do LGTE e um tutorial de utilização da ferramenta.
    Electronic file machadoPosterLGTE.pdf (2240 Kb)
  • Title GEOTIME: Experiments with Geo-Temporal Expressions Filtering and Query Expansion at Document and Phrase Context Resolution. -  
    year 2010
    Conference / Workshop / Magazine Institution Proceedings of NTCIR-8 Workshop Meeting, June 15–18, 2010, Tokyo, Japan
    Description / Summary We describe an evaluation experiment on GeoTemporal Document Retrieval created for the GeoTime evaluation task of NTCIR 2010. GeoTemporal Retrieval aims at to improve retrieval results using Geographic and Temporal dimensions of relevance. To accomplish that task, systems need to extract geographic and temporal information from the documents, and then explore semantic relations among those dimensions within the documents. Since this is the first time the task is taking place our aim is to evaluate some basic techniques in order to set some research directions of our work. We aim to understand the relevance of temporal and geographic expressions for filtering purposes. The geographic expressions were extracted with Yahoo PlaceMaker and for temporal expressions we used the TIMEXTAG system. We experimented techniques using both the overall document and sentence resolutions, as also one mixed approach. We also used a query expansion mechanism in topics with no filters defined. We used the BM25 as retrieval model and preprocessed the topics with a semi-automatic methodology to create structures that let us create our filters and expansions. We learned that the sentence level is not a very good approach (but we got clues that probably the paragraph context resolution could improve the results) and the geographic and temporal expressions base filters had shown good performance.
  • Title NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search -  
    year 2010
    Conference / Workshop / Magazine Institution Proceedings of NTCIR-8 Workshop Meeting, June 15–18, 2010, Tokyo, Japan
    Description / Summary For the NTCIR Workshop 8 we organized a Geographic and Temporal Information Retrieval Task called “NTCIR GeoTime”. The focus of this task is on search with Geographic and Temporal constraints. This overview describes the data collections (Japanese and English news stories), topic development, assessment results and lessons learned from the NTCIR GeoTime task, which combines GIR with time-based search to find specific events in a multilingual collection. Eight teams submitted Japanese runs (including unofficial three teams who provided runs to expand the pools) and six teams submitted English runs. One team participated in both Japanese and English.
    Electronic file overviewFredGey.pdf (155 Kb)
  • Title LGTE: Lucene Extensions for Geo-Temporal Information Retrieval -  
    year 2009
    Conference / Workshop / Magazine Institution ECIR/WGII, Toulouse, 2009
    Description / Summary This paper presents LGTE, a set of geo-temporal extensions to the Lucene information retrieval framework initially developed as part of the DIGMAP project. This paper overviews the functionalities that are available on LGTE, evaluating the ranking mechanisms proposed for geo-temporal retrieval. That evaluation focus only the geographic and text models, which was done against the GeoCLEF corpus with the 2008 English topics. We assigned to each document a specific geographic region using a geoparser, a text mining tool, and a gazzeteer, to disambiguate locations (both tools also developed in the DIGMAP project). We compared different approaches for geographic information retrieval and concluded that the best performance was achieved by a linear combination of a language model together with a custom function for estimating geospatial similarity. We provide the details over our linear parametric model to maximize the results.
    Electronic file machadoECIR.pdf (299 Kb)
  • Title User interface for a geo-temporal search service using DIGMAP components -  
    year 2009
    Conference / Workshop / Magazine Institution ECDL 2009
    Description / Summary This demo presents a user interface for a Geo-Temporal search service built in the sequence of DIGMAP project. DIGMAP was a co-funded European Union project on old digitized maps and deals with resources rich in geographic and temporal information. This search interface followed a mashup approach using existing DIGMAP components: a metadata repository, a text mining tool, a Gazetteer, and a service to generate geographic contextual thumbnails. Google Maps API is used to provide a friendly and interactive user interface. This demo will present the resulting geo-temporal search engine functionalities, whose interface uses WEB 2.0 capabilities to provide contextualization in time and space and text clustering.
    Electronic file ECDL2009-poster-LGTE.ppt (4237 Kb)
  • Title Experiments with N-Gram Prefixes on a Multinomial Language Model versus Lucene’s off-the-shelf ranking scheme and Rocchio Query Expansion (TEL@CLEF Monolingual Task) -  
    year 2009
    Conference / Workshop / Magazine Institution Cross Language Evaluation Forum
    Description / Summary We describe our participation in the TEL@CLEF task of the CLEF 2009 ad-hoc track, where we measured the retrieval performance of LGTE, an index engine for Geo-Temporal collection which is mostly based on Lucene, together with extensions for query expansion and multinomial language modelling. We experiment an N-Gram stemming model to improve our last year experiments which consisted in combinations of query expansion, Lucene’s off-the-shelf ranking scheme and the ranking scheme based on multinomial language modeling. The N-Gram stemming model was based in a linear combination of N-Gram, with N between 2 and 5, using weight factors obtained by learning from last year topics and assessments. The Rocchio ranking function was also adapted to implement this N-Gram model. Results show that this stemming technique together with query expansion and multinomial language modeling both result in increased performance.
    Electronic file machadoTelClef2009Springer.pdf (87 Kb)
  • Title Definição de Pontos de Vista Arquitecturais: um caso de estudo -  
    year 2009
    Conference / Workshop / Magazine Institution 9ª Conferência da Associação Portuguesa de Sistemas de Informação 28 a 30 de Outubro de 2009
    Description / Summary A gestão de qualidade do Instituto Politécnico de Portalegre (IPP) é um processo de melhoria contínua que envolve grupos de profissionais de todas as escolas superiores do instituto. Os grupos de análise desenharam e gerem uma arquitectura empresarial comum constituída pela modelação dos processos e pela informação necessária, tendo como objectivo a alimentação de indicadores de desempenho organizacionais definidos no Balanced Scored Card. No entanto a arquitectura empresarial, dividida em arquitectura de informação, processos e indicadores, está estruturada em documentos de texto que por sua vez estão pouco detalhados apresentam desalinhamentos. Estas deficiências tornam impossível qualquer extracção automática de vistas. Uma vista é uma representação da organização que captura e apresenta as preocupações de um stakeholder. Neste sentido as vistas facilitam o processo de análise e actualização da arquitectura o que deverá provocar um aumento do desempenho da instituição. Este artigo apresenta, em primeiro lugar os problemas existentes na actual arquitectura do IPP, em segundo lugar o processo proposto para reformulação da arquitectura empresarial e alinhamento das especificações com a realidade do IPP, em terceiro lugar é definido um modelo UML para representar a arquitectura reformulada, em quarto lugar um mecanismo de criação de pontos de vista definidos em conjunto com esses stakeholders a partir do modelo UML e de um conjunto de bibliotecas XQuery.
    Electronic file machadoCAPSI2009final.pdf (561 Kb)
  • Title Experiments on a Multinomial Language Model versus Lucene’s off-the-shelf ranking scheme and Rochio Query Expansion (TEL@CLEF Monolingual Task) -  
    year 2008
    Conference / Workshop / Magazine Institution ECDL/CLEF, Ahrus, in Springer LNCS proceedings, 2008
    Description / Summary We describe our participation in the TEL@CLEF task of the CLEF 2008 ad-hoc track, where we measured the retrieval performance of the IR service that is currently under development as part of the DIGMAP project. DIGMAP’s IR service is mostly based on Lucene, together with extensions for using query expansion and multinomial language modelling. In our runs, we experimented combinations of query expansion, Lucene’s off-the-shelf ranking scheme and the ranking scheme based on multinomial language modelling. Results show that query expansion and multinomial language modelling both result in increased performance.
    Electronic file machadoTelClef2008Springer.pdf (99 Kb)
  • Title MITRA: Uma Solução para Serviços de Pesquisa em Intranets. -  
    year 2007
    Conference / Workshop / Magazine Institution XATA 2007, FCUL, Lisboa, 15 e 16 de Fevereiro de 2007.
    Description / Summary Este artigo descreve o sistema MITRA, uma solução para indexação de conteúdos em linha e metadados descritivos complementares codificados em qualquer esquema XML. Esta capacidade torna este sistema uma solução ideal para serviços especializados de pesquisa em intranets. O MITRA baseia-se numa arquitectura com cinco camadas. A primeira camada é a de recolha de conteúdos que pode ser implementada por sistemas externos ou sistemas especializados de transferência de recursos, como por exemplo arquivos locais estruturados. A segunda camada cria índices invertidos dos conteúdos e dos metadados recolhidos (usando o sistema LUCENE). A terceira camada gere as relações semânticas e as associações dos metadados aos recursos. . Uma quarta camada muito recente permite a implementação de uma metodologia de análise do domínio. A última camada, a de apresentação, permite receber pesquisas estruturadas em pedidos HTTP e responder em XML ou HTML conforme sejam ou não utilizadas XSL’s. O esquema de representação interna dos metadados é o Dublin Core, o qual permite ao MITRA fornecer naturalmente uma interface de SRU/SRW, mas outros esquemas podem ser também configurados. O MITRA combina assim o poder da indexação livre de conteúdos com o poder do processamento de metadados estruturados, oferecendo o melhor dos dois mundos. Esta solução é usada como suporte a vários serviços efectivos, reportados no texto.
    Electronic file 10.pdf (693 Kb)
  • Title SPEAK: SEARCH PROCESS OF ENGINEERING FOR ASSIMILATION OF KNOWLEDGE A GENERIC FRAMEWORK TO SEARCH IN METADATA -  
    year 2006
    Conference / Workshop / Magazine Institution ISBN: 972-8924-13-5 at IADIS Virtual Multi Conference 2006
    Description / Summary
    Electronic file 200603L023.pdf (409 Kb)
  • Title Project Markup Language (PML) Schema Proposal -  
    year 2006
    Conference / Workshop / Magazine Institution XATA 2006, Portalegre
    Description / Summary In this paper we present the steps followed to make a proposal for a Project Markup Language (PML). PML is to use in project management solu-tions, like GPRM (Global Project for Research Management) [1]. PML (Project Markup Language) is a markup language for Project Management Servers (like Microsoft Project Server/EPM Servers [10], Global Project Management/GPM Servers or GPRM Server [1]). PML has the main purpose to establish a stan-dard model to Project information, to use it through the various Project Man-agement Applications and Servers. With that we can use search and retrieval index engines (like SPEAK) to have a free communication between different Project Servers and Applications. This paper focuses on the language features and presentation scheme designed for Project Management.
    Electronic file 54.pdf (345 Kb)
  • Title DEPTAL a Framework for Institutional Repositories -  
    year 2005
    Conference / Workshop / Magazine Institution DELLOS Workshop. Hiraklion, Grecia, 2005.
    Description / Summary This paper describes DEPTAL, an open and flexible framework for institutional repositories reusing open-source technology. DEPTAL is a collection-centric system that manages collections of documents in multiple copies and types. It can manage also users and groups of users, it supports authority control (subjects, authors, etc.), and can interoperate with other systems by interfaces such as OAI-PMH, Z39.50, SRU, web services, etc. DEPTAL recognizes descriptive metadata such as UNIMARC and Dublin Core, and organizes the information objects as HTML sites, with descriptions in the METS structural schema, making it very easy to backup and export those objects. For searching, it interoperates with MITRA, a search engine based on LUCENE, which was extended with new features to index not only the full content but also to recognize the structured metadata.
    Electronic file borbinha.pdf (129 Kb)