LIAKMLogo

 LIAKM  

 Home    About Us      Laboratory      Project Management     Partners

The Laboratory for Information Analysis and Knowledge Mobilization

 

Frontiers of Information Analysis

Opinion mining on healthcare topics from open sources
Computational Study of Natural Language
Understanding and Applying Open Data
Advancements in Intelligent Proactive Systems
Primary care information systems
with Chinese language support
Questi: a declarative, pattern-based query language

Opinion mining on healthcare topics from open sources

Researchers:     Stan Matwin, University of Ottawa
                          
Aijun An, York University

 

Opinions relevant to health can be found almost everywhere on the Web, e.g. news feeds (Google, Reuters, Yahoo), social networks (Twitter, Facebook, LinkedIn), newspaper sites, blogs, etc.  We are able to collect and analyze opinions of others at a scale hitherto unachievable. However, as there are very large quantities of such reviews, it is costly or even impossible, for users to read and analyze them on their own [KBA11]. Therefore there is a need for automatic sentiment analysis (also known as opinion mining) that is a computational technique that seeks to understand and explain opinion and sentiment by analyzing large amounts of opinion data in such an efficient way as to assist in human decision making [KBA11][L10]. Note that not much of opinion mining has been done with health-related data, while there would be a social need to obtain such information about treatments, healthcare providers (eg. Rate My Physician), drugs, etc.

One of the crucial aspects in mining opinions and reviews is a possibility to synthesize the automated results concerning lexical information and the structural ones. On the one hand, evaluating opinions and their reliability depends on linguistic information, such as the style of language used, sophisticated words and linguistic correctness. On the other hand, the structural information, in which the opinion reliability is grounded, concerns e.g. references from other reviews,  popularity/credibility of the opinion author and his/her authority, and influence on the community, the number of visits and readings of a particular opinion, etc.  Next step in the process of mining health-related reviews may be the adaptation of the collected and processed information of these two kinds (lexical and structural) to the needs and preferences of a specific user.

The other requirement concerning analysis of reviews is to rank the opinions (regarding their scores and the level of confidence given to the opinion giver by a specific user). Furthermore, opinion ranks and importance may vary and depend on particular users and their tastes (personalization). Important tools using (to some extent) consolidated data are recommendation systems, which try to also utilize the knowledge about the specific user and her/his taste and measures of similarity between users.

In general, every opinion giving post (text) consists of objective facts about the target and holder of opinion (usually also time of expressing the opinion) as well as the subjective opinions/emotions expressed about the targeted objects, their components and attributes (their features) [L10].

Sentiment mining used for monitoring purposes usually utilizes superficial methods and the results are also superficial, mainly for statistics and shallow business analysis (e.g. product reviews). Nonetheless, there is a growing interest in exhaustive, deep mining of opinion texts, giving an insight into the nature of emotions and their context area or the extended knowledge about the holders of opinion and their reasons for giving particular opinions (actionable insights) [CC09][SC08]. Therefore the data expressed in different kinds of opinions may be divided into subjective and objective information carriers or the opinion giving phrases may have different levels of intensities of positive or negative tone [BES10][W10]. There are some works regarding automatic and semiautomatic generation and scoring of sentiment lexicons, e.g. [Liu09,BES10].

The deep knowledge can be captured with the use of semantic technologies, particularly with modeling of the domain with ontologies. Specifically for health-related data, some relevant ontologies (discussed elsewhere in this application) are available.

The following steps describe our approach and method:

I.      OPINION MINING

1)        OPINION HARVESTING:
2)
        LINGUISTIC processing
3)        Extracting OBJECTIVE and SUBJECTIVE information: OBJECTIVE information is about medical conditions, their described
            features and opinion givers (classifiers, using external resources, e.g. PubMed, Wikipedia, ontologies, etc.
            SUBJECTIVE information is opinions

4)
        SENTIMENT ANALYSIS / VALUATION of the gathered information                     

II. STRUCTURAL TRUST MEASUREMENT module:

5)       HARVESTING expertise and reliability MEASURES using external resources, e.g. ranks of opinion givers’ from
           Web portals, (maybe using crowdsourcing methods for collecting ranks about opinion givers, similarly as in [BGC10])

6)
       COLLECTING STRUCTURAL INFORMATION for trust measurement, e.g. linkage in social networks, opinion citing network
           (references from other reviews), popularity of the opinion author and his/her authority and influence on the community,
           the number of visits and readings of a particular opinion

7)
       PROCESSING of collected STRUCTURAL INFORMATION for trust measurement

III. OPINION INFORMATION SYNTHESIS &PERSONALIZATION module:

8)       PROFILING USERS - Measuring and modeling users and preferences, e.g. comparing with friends in discussion networks
9)
       TRUST MEASURING - assessing trust regarding particular user profiles and lexical information for trust measuring

10)
    SCORING AND RANKING

a.        Weighting opinions with trust, scoring opinioned entities (works of art), with regard to users profiles
b.
        Ranking regarding scores and the level of confidence of the opinion givers perceived by a specific user

11)    OPINION DYNAMICS AND INFLUENCERS

a.        Discovering the leading influencers for particular opinions and trends  - measuring opinion givers’ impact on particular opinion and score (simply who generates a lot of opinions on particular subjects)
b.
        Measuring the dynamics of reviews on particular work of art regarding their attention and opinion expressing evolution over time (e.g. similarly as in [YL11]).


12)
    RECOMMENDATIONS based on automated profile and history of a user

13)
    VISUALIZATION - presenting of information, reviews, ranks, statistics, visualization of changing opinions in time (e.g. tag clouds for reviews about a particular object of culture, charts showing overall sentiment)

References

[BGC10] A. Brew, D. Greene, P. Cunningham. Using Crowdsourcing and Active Learning to Track Sentiment in Online Media. In H. Coelho, R. Studer, and M. Wooldridge, editors, ECAI 2010 - 19th European Conference on Artificial Intelligence, pages 145-150, IOS Press, 2010
 
[BES10] Stefano Baccianella, Andrea Esuli and Fabrizio Sebastiani. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In Proceedings of LREC 2010, 7th Conference on Language Resources and Evaluation, Valletta, MT, 2010, pages 2200-2204.

 
[CC09] Adapting a Polarity Lexicon Using Integer Linear Programming for Domain-Specific Sentiment Classification. Yejin Choi and Claire Cardie. Empirical Methods in Natural Language Processing (EMNLP), 2009.

 
[KBA11] Zied Kechaou, Mohamed Ben Ammar, Adel.M Alimi “Improving e-learning with sentiment analysis of users’ opinions”, 2011 IEEE Global Engineering Education Conference (EDUCON) – "Learning Environments and Ecosystems in Engineering Education", pp. 1032-1038

 
[L10] B. Liu 2010. Sentiment Analysis and Subjectivity. In Handbook of Natural Language Processing, 2nd Edition, Ed. N. Indurkhza, F.J. Damerau, 2010

[Liu09] Jingjing Liu; Stephanie Seneff;
Review Sentiment Scoring via a Parse-and-Paraphrase Paradigm, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

 
[SC08] Topic Identification for Fine-Grained Opinion Analysis. Veselin Stoyanov and Claire Cardie. Proceedings of the Conference on Computational Linguistics (COLING 2008), 2008.

 
[W10] Aleksander Wawer: Is Sentiment a Property of Synsets? Evaluating Resources for Sentiment Classification using Machine Learning. Proc. of LREC 2010: 1101-1104

 
[ZC09] Taras Zagibalov, John Carroll (2009) Multilingual opinion holder and target extraction using knowledge-poor techniques. In Proceedings of Language and Technology Conference..

 

Computational Study of Natural Language

Researchers:     Sheila Embleton.  Distinguished Research Professor of Linguistics,  York
                           Dorin Uritescu. French Studies, Glendon College, York University
                           Eric S. Wheeler. Adj. Prof., School of Information Technology, & Dept of Linguistics, York

In the Computational approach to the study of human languages, we use information technology on the one hand for its power in processing data with sophistication and speed, and on the other hand for its logical consistency and ease of repetition in modelling language.  In different sub-projects, we have digitized large sets of dialect data and applied sophisticated statistical methods to uncover new understanding of how dialects vary; we have created mathematical models of language change, and of the semantics of language,  and implemented them as computer programmes. 

Online Dialect Atlas

RODA: The SSHRC-funded Romanian Online Dialect Atlas (RODA) project has digitized 3 of 5 volumes of a hardcopy atlas of dialects in North-West Romania, and developed a sophisticated tool for accessing the data, presenting the data as dynamically created maps of user-selected subsets of the data and user interpretations of the data. 

Currently, we plan to expand the RODA interface to permit even more sophisticated searches, and to digitize additional data.   In addition to Romanian, we have worked on extensive collections of Finnish and English dialect data, and we have shared our technology with other groups working on Romanian, Finnish and other languages.

This project is the subject of an SSHRC Insight Grant application for 2012-2015, and has identified milestones in that time frame for delivering new technology and data, and for communicating it broadly to both professional and general audiences.

Embleton, Sheila, Dorin Uritescu  and Eric Wheeler. 2008c. Identifying Dialect Regions: Specific features vs. overall measures using the Romanian Online Dialect Atlas and Multidimensional Scaling.  Leeds, UK: Methods XIII Conference. August 2008. in Barry Heselwood and Clive Upton.eds. 2009. Proceedings of Methods XIII. Papers from the Thirteenth International Conference on Methods in Dialectology, 2008. Frankfurt am Main: Peter Lang. pp 79-90
Embleton, Sheila, Dorin Uritescu and Eric Wheeler. 2011b. Defining Dialect Regions with Interpretations. Advancing the Multidimensional Scaling Approach. presented at
Methods In Dialectology 14 Conference. London, Canada August 2011; to be published in the proceedings of the conference.

FODA: Beyond the scope of the RODA project, we hope to  extend the technology to an extensive set of Finnish dialect data (Finnish Online Dialect Atlas or FODA).   While the data has been digitized, there remains some final editing, adaptation to the RODA technology, and publication of both the data and the associated findings.   This project has had SSHRC funding but is currently not funded.

Embleton, Sheila and Eric S. Wheeler. 2000. Computerized Dialect Atlas of Finnish: Dealing with Ambiguity. J. of Quantitative Linguistics. 7.3. pp 227-231.

Mathematical Modeling

Wheeler has created a mathematical model of language change in a communications network, and produced some theoretical results.   The model needs application to specific data situations.  There are possibilities to test the model on existing data sets for Acadian French.   The next step is to explore these possibilities.   This project is currently not funded.

Wheeler, Eric S. 2007. Language Change in a Communication Network.  Exact Methods in the Study of Language and Text. (Quantitative Linguistics, 62.) Dedicated to Gabriel Altmann on the Occasion of this 75th Birthday. Peter Grzybek, Reinhard Köhler (eds). Berlin and New York: Mouton de Gruyter. Pp 689-698

Computational Modeling

Theatre of the Mind is a system to display an animated interpretation of a natural language text on a virtual stage:  Given “Jack and Jill went up the hill...” the system shows animated characters going up a hill.  To be general, the system needs a robust natural language processor that integrates a wide range of linguistic theories, from basic syntax to semantics, pragmatics and discourse analysis.

A prototype system is operational.  The next milestone is to create thousand-word lexicon and grammar with a suitable semantics.    The focus here is on developing linguistic sophistication, rather than animation techniques.  The project is currently active, but not externally funded.

Wheeler, Eric S. 2009. Theatre of the Mind: A Project to Animate the Language of Thought and Communication. in e-Learning. 6.3 Special Edition. Sep 2009.
Wheeler, Eric S. 2009. Visualizing Language.
  Presentation to the International Linguistic Association. New York. April 2009.

Understanding and Applying Open Data

Researchers:      Sara Diamond, OCAD University and Barbara Crowe, York University
                            Fanny Chevalier, Patricio Davila, Martha Ladly, Vera Roberts, Jutta Treviranus
  OCAD University
                            Michael Longford,
York University

Partners:            Mozilla Foundation, IBM, ReSRC, Google

In creating a major centre for data analysis it will be necessary to undertake research regarding the open data movement, appropriate policy regarding open data, the value and limits of open data and its applications. Our research will approach this complex phenomenon from the social science, humanities and computational perspectives. The movement towards “open data” has accelerated through the last two decades, exacerbated by the rise of social media, gaining momentum with governments, some members of the research community, some business advocates, large companies such as Mozilla[1] and Google and citizens’ movements.[2]  Advocates of open data propose that restrictions on access to data hold back discovery, while opponents suggest that competition is dampened when all developers have access to the same databases.[3]  The meaning of “open data” is loose, usually implying access to all manner of raw data sets for the benefit of public information, democratic engagement or discovery.  Data sets in question can range from all manner of municipal, provincial and federal data sets regarding the economy, urban conditions and districts, to non-textual data such as maps, genomes, formulae, medical or biological data. In the context of politics the Open Data movement focuses on publicly held databases. This detail of everyday life is argued to enable transparent government. At the same time it encourages individuals and companies to build applications, services and experiences using the data.

Governments around the world are increasing providing access to their data. The federal government’s Digital Economy Strategy, released May 10, 2010 included the statement, “Governments can help by making publicly-funded research data more readily accessible to Canadian researchers and businesses.” [4]A Canadian open data 12 month pilot launched in March 17, 2011. [5] It made geospatial data from Natural Resources Canada available at no cost as well as data collections from Environment Canada. The Ministry of Research and Innovation of Ontario funded Regional Strategic Resource Centre Program (ReSRC), an open data portal at MaRS Innovation. The goal of ReSRC is to strengthen innovation in the region. As stated,  By sharing and integrating disparate sets of data – often collected in institutional silos – from government, academia as well as the private and non-profit sectors, we will better understand the unique strengths, opportunities and needs of our communities and can more effectively work together to build vibrant, productive regional innovation economies.”[6] LIAKM will collaborate with ReSRC to provide overall capacity in data extraction and analysis.  We will link the LIAKM provincial focus with comprehensive research on open data. 

Perhaps the governments most engaged in open data are municipalities, with many cities engaged in Open Data Day projects and encouraging developers to use their data to create all manner of applications.  Toronto, Edmonton, Ottawa, Vancouver have joined forces to develop an open data framework. Open data available through the City of Toronto includes ward profiles, GIS data, TTC data, employment districts, committee of adjustment decisions, water use and hydrant placement.[7] Innovative partnerships have emerged to develop structures for the use of open data. For example FutureEverything, an art and technology hub, has been funded to, “lead the city of Manchester’s transition to an Open Data framework, a major policy initiative which in most cities is led by the mayor’s office”.[8] Their initiative has included significant investment in a framework, application development and a series of design and art projects that make use of Manchester’s data in concert with the larger community – providing a model for research that LIAKM will undertake. Open data projects at times go hand in hand with crowd sourcing.  Rachel Sterne the City of New York’s Chief Technology Officer has created a bureau that uses all manner of social media, whether Facebook, Twitter, QR codes and mobile applications to “crowd source” information on the city’s challenges and find solutions to these.[9] European governments have asked designers, developers, journalists, researchers and the general public to develop ideas, applications and visualizations as well as contributing additional relevant data sets through the European Open Data Challenge.[10]

LIAKM researchers have a track record in investigating the possibilities of open data and developing prescient applications for such data.  The Mobile Digital Commons Network was co-led by Michael Longford (York University) and Sara Diamond (OCAD University) and included researcher Barbara Crowe (York University) and Martha Ladly (OCAD University).[11] It developed strategies to open mobile networks and data for public access and use, developing a wide-ranging series of demonstration applications.  Recent research entitled Taking Ontario Mobile led by Sara Diamond and Vera Roberts has included an investigation of the future of mobility and its reliance on access to open data.[12] Sara Diamond has investigating the historical debates and emergences of the open data movement.[13] LIAKM research will consider the following:

  1. What are appropriate protocols for open data access?

  2. What data bases, finding aides and algorithms are necessary to access open data?

  3. What are effective means of engaging publics in analysis and application of open data?

  4. What policies provide appropriate tools for access and protection for open data?

  5. How can open data be used to stimulate entrepreneurial activity?

  6. What role can data visualization play in providing tools to analyze open data?

  7. How can art and design engage open data to provide new understandings of the phenomena, processes and structures represented by open data?

[2] How can We Build a City that Thinks Like a Web? Sara Diamond, Cory Doctorow, Mark Surmon. Dan Misener, Subtle Technologies, 2011. http://www.subtletechnologies.com/wp-content/uploads/2011/05/Full-Festival-Program-2011-part-2.pdf

[5] http://data.gc.ca

[10] http://www.Opendatachallenge.org

[11]  See Mobile Nation: Creating Methodologies for Mobile Platforms, Edited by Martha Ladly and Philip Beesley, Waterloo: Riverside Architectural Press, 2008 and  The Wireless Spectrum: The Politics, Practices and Poetics of Mobile Communications, edited with Michael Longford and Kim Sawchuk, Toronto: University of Toronto Press, 2010.

[12] Taking Ontario Mobile, Sara Diamond and Vera Roberts, Toronto: OCAD University, (in press).

[13] Euphoria and Dystopia: The Banff New Media Institute Dialogues, 1995 – 2005. Edited by Sarah Cook and Sara Diamond. Banff Centre Press and Riverside Architectural Press, Banff/Toronto: 2011.

Advancements in Intelligent Proactive Systems

Researchers:      Ken Ono, Nexj Systems, Nick Cercone, Zhenmei Gu, York University
                            Diane Inkpen,  University of Ottawa

Partners:            Nexj Systems

Large repositories of unstructured textual data and structured relational exist in many businesses, such as shown in integrated CRM (Customer Relationship Management) systems. Examples of such textual data include emails, chat history, internal chats and messages about a customer, call-center records, pathology reports and doctor’s notes. External unstructured data is also available from business reports, medical reports, and social media feeds.  Examples of structured data include activities (e.g. appointment types, task types, dates, times), financial holdings, medical test results (blood sugar levels, blood pressure, cholesterol, etc.), and fitness info (steps, distance, calories).

Free-form notes are frequently used because they can capture many types of information easily without prior knowledge of the subject. Although these textual data likely carry much useful information, their use is limited as long as these data remain unstructured. Therefore, such potentially informative textual data are usually difficult to utilize by traditional Business Intelligence (BI) techniques like data mining. Attention has been drawn to the need of the text mining for improving BI [1]. Technology from Natural Language Processing (NLP), especially Information Extraction (IE), is helpful for this purpose, providing ways to automatically process texts, extract specific information from texts, and make them easily accessible (usually in a structured format), and ready to integrate into the existing BI system.

There are three primary goals of the proposed research:

  • Find better ways to convey large amounts of free form and structured information about a person (or group of people) to a user of a CRM or patient centered system

  • Derive structure from free form information and integrate it with structured data to prepare it for machine learning tools and automated proactive interactions or interaction suggestions

  • Use techniques such as data mining, text mining, rules and self-learning agents to determine optimal next best actions.

In current systems, the users need to browse through many entries in order to search specific customer information buried in the texts and structured data. Even if all of a person’s health record was available to doctor, it would take too long to understand the key issues by reading each individual journal entry, test result and message.  In the financial services realm, the abstract problem is the same, but the subject matter changes to emails, call notes and financial holdings.

With the amount of available data growing, the problem of information overload must be ameliorated.

A key focus of this proposal is to find better ways to extract and summarize information from the textual data. This capability will be leveraged to improve how such texts are conveyed to end users. In the simplest form, the extracted and summarized text could be displayed as text (i.e. displaying summaries instead of raw text), but we also seek graphical ways of conveying information.

It is also important to integrate textual and structured information. A goal of the system is to convey or act on the most important information regardless of its source. For example, in a financial context, if a customer’s holdings were down in value very sharply or an email arrived notifying the advisor of a divorce or death, then this information would need to be highlighted above other less relevant information. A simple rules based system may create too much clutter.

The integrated information from text and relational sources can also be used to create predictive models and detect relevant situations.  These models and situation detectors can be used to compute next best actions. In a financial context, this could relate to product suggestions (e.g. we detect a young family situation and recommend a RESP over all other possibilities). In a medical scenario, it could suggest the next step in a prescribed care plan or warn of a diversion from known best practices.

Furthermore, quantified sentiment of the text could be conveyed to end users or used in predictive models.

In the shorter term, we propose a series of research investigations, the combination of which will support achievement of the primary goals. These shorter-term research investigations will guide future steps based on their estimated impact, market differentiators and estimated commercialization costs.

Primary care information systems with Chinese language support

Researchers:      Nick Cercone, Zhenmei Gu, York University
                             Weidong Yu,  Empress

Partners:             Empress  Systems, Inc.

The primary care information system will provide operational, clinical, and research capabilities for the physicians and staff who utilize the system.  Operationally, it will provide registration, appointment booking, billing, consultant referrals, and various reporting functions.  Clinically, it will provide a full cumulative patient profile, which will include ongoing conditions, treatment regimen, history, allergies, consultant lists, personal and family data, pediatric prevention, adult prevention, lab results, and various disease specific modules.  In addition, it will provide prescription ordering, which can identify drug/drug interactions in real-time based on the drug in question and the treatment regimen of the patient.  To improve recordkeeping, this clinical record will also allow the physician to provide progress notes either hand written or dictated through voice input directly into the record.  On the research side, the system will allow for aggregate and specific analysis of all operational and clinical data to satisfy the needs of the physicians in question. Again, this will be done by means of Empress. 

The research component is especially important, since these are academic health science centers.  Examples would be the tracking of treatments for diabetic, as well as hypertensive patients.

 

Questi: a declarative, pattern-based query language

Researchers:      Parke Godfrey, Jarek Gryz, Xiaohua Yu, York University  

Partners:            IBM CAS

We propose to develop a declarative, pattern-based query language, Questi (Italian for "these"), for exploration and extraction over XML collections.  Our aim is that Questi queries and transformations

be easy to compose and refine, for lay users, but yet be formally understood, with a clear semantics.  A Questi evaluation would comb large, heterogeneous XML collections to transform them into increasingly structured, schema-uniform tables.  In the limit, the transformations lead to lists of "answers", query exemplars.

Support technologies for XML have well matured, including formal query languages for it, such as XQuery and XSLT, and in bridge languages for "relational" databases within SQL such as SQL/XML. For information retrieval, there has been wide research effort on keyword search over XML collections, to identify and rank best matching twigs.  (See [1].) There has been little done in the way, however, of simpler to use, more flexible pattern-based query languages (along the lines of UnQL [2] and Xcerpt [3]) that could be used to explore iteratively and interactively over large collections.  We feel the need for such tools is keen, and that XML technology is mature enough to pursue this with success.

We propose to develop a formal transformation language over SQL/XML, call it Balance, as the core of Questi.  Balance is to operate over relational tables with XML columns.  At one extreme is a simple table of two columns: an ID column, and an XML column which, in aggregate, stores a collection of XML documents.  At the other extreme are tables with many columns, but for which the XML column's values consists only of leaf nodes with no deeper structure.  Balance queries will offer transformations over such tables, both to "schematize" the XML data (by extracting parts into columns), and to "de-serialize", folding columns back into the XML structure.  We will develop an algebra for Balance that preserves lossless transformations.

Approaches that bridge preliminary work done in pattern-based query languages, traditional path-based XML query languages, and the information retrieval techniques for keyword search over XML could accomplish this.  Our efforts would be case driven, by working with other projects in the LIAKM, to understand their challenges with data curating, transformation, and exploration.

Stages of the work are to be as follows.

  I. The Questi Language

     A. Design the language.
    
B. Research operators that should be supported, such as grouping and categorical aggregation.
    
C. Design Balance, a transformational language on SQL/XML.
    
D. Implement, compiling Questi queries into Balance (SQL/XML) in the back-end.

 II. The Questi Cache

     A.  Cache resulting Questi evaluations (as Balance transformations) into a local relational-XML

 hybrid system, to facilitate efficient drill-down, refinement, and transformation.

     B. Research how to index Balance tables for query optimization. (See [4] as an example.)

III. The Questi System

     A. Design and implement a query-by-example engine using Questi.
         
* Use positive and negative instances to generate Questi patterns, which, in turn, collect more answers.
    
B. Support interactive refinement of Questi queries as part of interactive exploration.
    
C. Build in traceability and explanation, so users can easily  interpret and validate Questi answers,    
         
and trace them back  into source data.

We draw on the strengths we have at York University in core database research.  Later stages of this work would collaborate with industry such as with the IBM Toronto Laboratory where the DB2 database system and WebSphere are developed.  The researchers are in active research collaboration with IBM through various projects supported by IBM's Centre for Advanced Studies.

References

[1] S. Amer-Yahia, R. Baeza-Yates, M.P. Consens, M. Lalmas. XML Retrieval: DB/IR in theory, web in practice. (Tutorial) Proceedings of the International Conference on Very Large Data Bases,pp. 1437-1438, 2007.
[2] P. Buneman, M. Fernandez, and D. Suciu.UnQL: a query language and algebra for semi-structured data based on structural recursion. The VLDB Journal, 9(1):76-110,1999.
[3] F. Bry and S. Schaffert. The XML Query Language Xcerpt: Design Principles, Examples, and Semantics. Proceedings of the 2002 Workshops on Web, Web-Services, and Database Systems. Springer, LNCS 2593, pp. 295-310, 2003.
[4] P. Godfrey, J. Gryz, A. Hoppe, W. Ma, C. Zuzarte. Query Rewrites with Views for XML in DB2. In Proceedings of the IEEE International Conference on Data Engineering, pp. 1339-1350,2009.

Copyright ©2012 LIAKM, Toronto, Canada