The Laboratory for Information Analysis and Knowledge Mobilization
Applications-Oriented Informatics Development
technologies for people with disabilities
Assistive technologies for people with disabilities
Researchers: Nick Cercone, Melanie Baljko, Ron Owston,
Deborah Davidson,York University
Assistive technologies (AT) are used by people with disabilities help them accomplish tasks that they cannot do easily or even at all. Our research specializes on technologies for persons with different types of impairments, including vision, hearing, motor, and mild cognitive impairment and target the support of learning and communication in academic learning environments.
Using our early 2000’s research platform into machine translation, the “generate and repair machine translation” (GRMT) paradigm , we have developed IT3STL software for text to sign language translation  and language learning as well as SignMT , which translates sign language into text, for the hearing impaired. For visually impaired students we are developing MathReader , software that enables such students to independently study and practice mathematics; MathReader can automatically read aloud the equations and formulae with mathematical symbols inserting correct prosody where needed to disambiguate math expressions. We are also developing MathMaster  to help students with learning disabilities to gradually develop skills required to solve math word problems. For students with moderate-to-severe motor impairment, we have developed a novel Switch-Activated Writing System [9,10,11], which affords learners with severe vision- and motor-impairment a way to complete written schoolwork. We are developing Software-Control Motor Skills Profiling, a complementary software platform and framework for developing user profiling, as related to the operation of input devices and time-sensitive software control.
Thus, our research into AT concentrates on software development for specific goals, with an emphasis on generalizable principles for generating correct and reusable software components for diverse and heterogeneous user populations. Our broad goals for this research are fourfold: (1) give students opportunities to learn and help them learn more effectively; (2) increase the enjoyment of learning; (3) make teaching more enjoyable and rewarding, and (4) make the enabling technologies available to the students who are the intended beneficiaries through novel channels. For the first three goals, we will explore techniques and develop advanced software prototypes to aid people with disabilities, focusing on the user populations as identified above. For the last goal, we will develop a deployment mechanism whereby innovative techniques from research labs (such as Baljko’s) can be deployed in clinical settings using modes of knowledge mobilization that complement tech transfer and commercialization. The are large diseconomies of scale in commercial AT development that we hypothesize can be circumvented by novel modes of delivery if an “open science” ethos were to be embraced. The project includes significant analysis and innovation relative to Ontario's Assistive Devices Program (ADP) and the existing clinic-client pipelines. Another project of this nature is the EU OATS project http://www.oatsoft.org/. The analysis work relates to Steinstra's analysis of Canadian ITC as it relates to AT. We will target the Software-Control Motor Skills Profiling project as the pilot project for the knowledge mobilization mode of deployment.
Brief Review & Methods: We will investigate and extend the methods for AT outlined in [2-6, 10-11]. In  we proposed MathReader, a tool to enable visually impaired students to independently study and practise mathematics. MathReader automatically reads aloud equations. The initial design requires augmentation in order to accommodate all forms of mathematical expressions and interface issues require solutions, for example how to correctly insert prosody in the generated expression so that math expressions are spoken unambiguously. We will be guided by our research into focus to emphasize tone analysis for prosodic generation . As another example, our language/sign language translation systems [2,3] can be extended to other (sign) languages to promote literacy in the deaf and the hearing-impaired community and decrease communication gaps between the deaf and others. In , we developed the Switch-Activated Writing System and conducted an extensive, longitudinal case study. We will extend the system to further exploit lexical and syntactic linguistic properties, as well as to accommodate other types of motor impairment. In addition, the user profiling module will be extended as the basis for advanced adaptive techniques.
Evaluation plays a crucial role in AT. Three distinguishing types of evaluation appropriate to three different goals will be studied. Adequacy Evaluation determines the system‘s fitness, which may be comparative and may require considerable work to identify a user’s needs. Diagnostic Evaluation identifies limitations, errors and deficiencies, which may be corrected or improve by researchers, but offered to end-users as well. Performance Evaluation is a measurement of system performance in one or more specific areas. We utilize all three evaluations in our research.
K., and Cercone, N. (2002) Generate and Repair Machine Translation,
Computational Intelligence, 18(3), 254-270.
Application of data mining and KDD techniques to environmental impact assessment
Researchers: M. Erechtchoukova, P. Khaiter, York University
Objectives: Environmental impact assessment (EIA) requires predictions of future states of the environment under proposed anthropogenic activities. Traditionally, these predictions were generated by simulation models. However, the complexity of modern projects calls for very sophisticated non-linear high-dimensional simulation models. This creates additional technical obstacles in a model application and subsequent interpretation of modeling results. Innovative methods of data analysis look appealing for the EIA problems since they are aimed to reveal non-trivial patterns in data which can be used to predict future environmental states in an efficient way. The main goal of the project is to develop a framework for applications of data mining and knowledge discovery techniques (KDD) to quantitative assessment of potential anthropogenic impact on the environment.
Subject matter: Applicability of data mining algorithms for predicting the levels of physico-chemical characteristics in an aquatic environment is investigated. The analysis is conducted based on a data set collected by the Toronto Regional Conservation Authority (TRCA) over the period of 20 years (subject of TRCA’s approval) and additional data sets collected on different water bodies. Data mining approaches are classified and mapped to the EIA tasks. The challenge lies in developing the criteria for using these techniques in specific problem domains (Gibert et al., 2008). The recommendations on the application of innovative methods of data analysis to EIA problems include the comparison of these methods with traditional quantitative methods used in EIA based on the criteria of the quality of their outcomes and required information and data sets (Erechtchoukova and Khaiter, 2010).
Importance of the project: There is a growing interest in using data mining algorithms to solve various environmental problems. Thus, Artificial Neural Networks have been successfully applied to predict hydrological or weather conditions for many years. At the same time, evaluation and projections of concentration of chemical constituents in aquatic and other environments still rely mainly on process-based models with a necessity to identify a large number of parameters (Khaiter and Erechtchoukova, 2010). Innovative methods of data analysis have been applied to the environmental problems on an ad-hoc basis. Systematic application of data mining and KDD techniques to environmental domain is novel and may have significant impact on environmental decision making. Choosing the right method for data analysis is a critical step in KDD process which improves the efficiency of EIA and the quality of its outcomes.
Khaiter P.A., 2010. “Efficiency Criteria for Water
Quality Monitoring”. In David
A. Swayne, Wanhong Yang, A. A. Voinov, A. Rizzoli, T. Filatova (Eds.)/ 2010
International Congress on Environmental Modelling and Software Modelling for
Environment’s Sake, Fifth Biennial Meeting, 5 – July 2010, Ottawa, Canada. pp.
1380-1389. ISBN: 978-88-9035-741-1.
Early Stage Breast Cancer Detection
Researchers: Amir Asif, York
Breast cancer is the second leading cause of cancer death after lung cancer among women. X-ray mammography is the preferred method for detecting early-stage breast cancer. Although mammography provides high-quality images at low radiation doses in majority of patients, its inherent limitations are well recognized . Over the years, research has shown that microwave breast cancer imaging has the potential to become a successful clinical complement to conventional X-ray mammography . It is based on the difference between the electromagnetic (EM) properties of malignant tumor tissues relative to normal fatty tissues with a ratio that can reach one order of magnitude . As further measurements aiming to describe the electromagnetic properties of the breast continue, microwave imaging systems, which focus on detecting and localizing tumors in the breast, are being developed. In active microwave imaging, several microwave emitters illuminate the breast (Array A in Fig. 1 (a)). The resulting scattered field is measured at multiple detectors (Array B in Fig. 1 (a)). The cancerous tumors produce stronger backscattered electromagnetic energy return than normal tissues. However, unlike X-rays, which are non-diffractive and travel in straight lines, electromagnetic microwave propagation in breast tissues is characterized by refraction and multipath effects, i.e., the backscattered cancer signature signal reaches the detector via two or more paths. As a result, standard information processing algorithms do not perform well due to multipath propagation and cannot locate cancer tumours with sufficiently fine resolution. In this proposal, we aim to develop a different backscatter-imaging paradigm based on time reversal (TR) processing that uses multipath to its advantage.
TR represents a powerful adaptive focusing technique in complex and heterogeneous media , which has shown promising results in biomedical applications [5, 6]. TR focusing through inhomogeneous media requires three steps: (i) Transmitting a wave front through the inhomogeneous medium from the array (forward probing); (ii) Recording the backscattered field by the array; and, (iii) Re-transmitting the time-reversed of the recorded field back into the medium (TR probing), which finally focuses on the target. Unlike conventional array imaging systems, which are generally based on forward probing, TR includes an additional stage (performed either physically or computationally), where the reflected signals from the target (tumour) are time reversed, and retransmitted back into the breast. TR array imaging algorithms are based on the second backscatter observations.
The proposed research will develop advanced, computer-aided information processing techniques for early detection of breast tumours. We will test our proposed TR beamforming imager (TRCD) (based on our initial results reported in TR signal processing ) to enhance anonymized breast MRIs provided by the Southlake hospital. The objectives of this research are as follows:
(a) (b) (c)
Figure 1: (a) FDTD simulation run with two antenna arrays (Arrays A and B) using a breast model derived from an MRI. (b) Standard imaging algorithm (Direct Subtraction Beamforming). (c) Detection results using TRCD. Symbol o represents the centroid of the tumour and × represent the output of the simulation with dark red region showing the high confidence area.
Methodology and novelty of approach and/or application: In spite of a number of reported TR microwave imaging algorithms in the literature [7, 8], we believe that many research questions related to TR array imaging for breast cancer detection are still open. The performance of TR algorithms for breast cancer detection has been investigated so far in a laboratory setup and a thorough experimental verification on real datasets is missing. Explicitly for breast cancer detection, isolating the target response (the tumor backscatters) from the overall response is a challenging problem. Our hypothesis in this research is that our proposed TR imager - exploits successfully the multipath EM scattering due to the inhomogeneous breast tissues to detect and locate small cancer tumors with higher sensitivity comparing with other existing TR imaging algorithms. In our initial investigations, the performance (resolution) of our proposed TRCD algorithm was tested for a two-dimensional simplified breast model and geometries using FDTD method, more recently, for a realistic breast model based on MRI measured data. Fig. 1 provides a flavor of results possible. These results show that if the tumor response can be estimated accurately, then the proposed algorithm can resolve small 3-mm-diameter tumor-like scatterers. A more realistic situation in which the tumors response is not known and can only be estimated from the total reflected signal (clutter + tumor response) needs more investigation. We note that our proposed scheme is noninvasive, does not require any additional data acquisition, and is based on the FDTD technique using already available MRI datasets. The results of this development will demonstrate that TR adaptive focusing improves significantly (at very low cost) the image resolution equivalent to a deployed antenna array with much larger aperture size. Finally, as the implementation of the proposed technique is based on the use of signal processing and TR operations, it can be implemented in real time.
Our partner organization in this research is the Southlake hospital located in New Market. Southlake hospital will provide approximately 100 MRIs of breasts with malignant and benign pathologies, normal breast MRI studies, radiologist reports of findings on all MRIs, any post-surgical and biopsy pathologist reports listing the presence of tumours and their locations, follow-up data and some basic age, health and racial data on the patients. All data will be de-identified by Southlake staff by deleting all identifying data and labeling each MRI and pathology report with a numerical identifier unique to the patient. Feedback from Southlake will be sought in improving the accuracy of our results. We will collaborate with the pathologists at the hospital to better understand the scope of the problem and the kind of results that physicians are interested in. In this research, we will develop a quantitative understanding of the benefits and limitations of this method by applying our TR imaging algorithms to a set of MRIs of actual breast tumours provided by Southlake hospital. As is generally the case, access to a higher number of data sets will provide more statistically significant results.
“Mamography and beyond: Developing for the
early detection of breast cancer”, Institute of Medicine, National Academy
Press, Washington D.C., 2000.
Enriching ways to research Athenian data
Researchers: Nick Cercone, York University
Athenians Project is a multi-year, ongoing project of compiling, computerizing and studying data about the persons of ancient Athens. By applying modern technology to ancient data, over a 100,000 entries have been digitized and maintained in an Empress Embedded Database for over 30 years. The project is headed by Professor John S. Traill of the University of Toronto in the Classics Department.
The Athenians Project began back in the 1970s to preserve and make searchable the age-faded, handwritten card-files of Dr.Benjamin Dean Meritt. Meritt had written information about the persons of ancient Athens and accumulated the card files over the preceding 40 years. This included data collected by Johannes Kirchner who also lent his work to the field of epigraphy and prosopography of ancient Athens in his book “Prosopographia Attica”. Published back in 1901, Kirchner's "Prosopographia Attica" had 15,588 entries, was limited to the pre-Augustan period, and contained only registered Athenian citizens.
Athenians Project data is available in two main parts. The first is a set of hardbound printed volumes titled “Persons of Ancient Athens” of more than 100,000 entries and typeset in ancient Greek. The second is a relational database of Athenians data which is used to search data using a computer in a variety of ways for further study. The Athenians prosopography Project includes Athenian citizens at home and abroad, slaves, resident aliens, and foreigners honored at Athens—all the known men and women of Athens from the beginning of alphabetic writing to the Byzantine period.
Part of the data is made available to anyone via the Website Attica. Website Attica is designed to be complementary to the published volumes of “Persons of Ancient Athens” (PAA). There are currently 18 published volumes of PAA and two more are scheduled to be published within the next two years. The Athenians Project is Toronto’s designedly complete database of all “Persons of Ancient Athens”.
Searches in the Website Attica may be made on about 10,000 names, all within half of volume four, the entirety of volume five, and the first third of volume six of “Persons of Ancient Athens”, i.e. names beginning with the letters beta through delta. The possible searches range from selecting every person in a particular Deme or of a specified profession to more sophisticated searches, e.g. to find all Athenians who lived between specified years and/or are related to a certain person and/or are attested in a class of document, etc.
This project deals is based on providing software and database research and development to the Athenians Project. The Athenians Project has been utilizing Empress Database technology for over 30 years to store and retrieve data about citizens of Ancient Athens. A good part of the data is stored as Ancient Greek Characters in an encoded form. There are many different research areas that could be worked on to help the Athenians Project as well as providing the basis of innovative research work in database management.
1. Traill, John. “Classics:
Athenians Research Project”, “Classics: Athenians Research Project” Faculty of Arts and Science,
University of Toronto, Volume 10, May 2003, retrieved November 3, 2010
Prototyping session-based data-driven DB replication
Researchers: Jimmy Huang, York University
Nowadays, data seems to be everywhere. Users have been asking for immediate access to accurate and reliable data anytime and anywhere. The proliferation of mobile devices has emphasized this problem even more.
Database replication is a technology often used to satisfy this need. Database replication is the act of copying data between different databases and encompasses analysis, administration and monitoring for data consistency across multiple databases. To address the ever-growing need for data replication in heterogeneous database mobile environments, a prototype of session based data-driven database replication that allows two-way synchronization of distributed data in heterogeneous mobile environments is proposed.
objective is to start with the current Empress DBMS replication mechanism and prototypetwo way replication with performance constraints. The prototype can then be
modified to include another database management system. The prototype can be
further modified to include several other database management systems with
operations in a “cloud” environment. The current replication allows
updates only on the Master database. This proposal allows any database to be
updated either as master or replicate, hence allowing for a two-way replication
We propose a phased approach to design and implement a session-based synchronization system that allows two-way synchronization between a main database, called the master database, and many remote databases, called replicate databases. The synchronization in-between them can be partial or full replication.
the first phase, the current Empress DBMS replication model will be extended to support two-way replication and tested in Mobile environments such as Android
tablets. User defined conflict rules will be implemented.
In the second phase, the model will be extended to support foreign DBMS (e.g. Oracle, DB2, Microsoft SQL Server) as a Master database and Empress DBMS as a replicate database.
In the third phase, the model will be extended to support multiple foreign DBMS and multiple Empress DBMS in the network. The combined DBMS system could be operating in a Cloud, with mobile and non-mobile nodes. The Empress replication module will be responsible for storing all meta-data about synchronization that takes place across the network.
Challenge #1: Provide methodology to support heterogeneous database environments
Data-driven replication is proposed to unify the way data is replicated over heterogeneous database environments. This presents the challenge since a log-based replication may be common to many DBMS implementations, such as Oracle DBMS.
Other heterogeneity problems include: different database models, syntactically and semantically different DBMS, different locking schemas, different transactional models, etc.
Challenge #2: Frequent disconnection of mobile devices
Mobile devices often get disconnected from the network due to various factors like power failure or loss of their access to the network. Furthermore, some mobile users switch their units on and off regularly to save power, causing additional network disconnections. In the presence of frequent disconnection, this method of replicating data needs to maintain data consistency over heterogeneous database environments.
Challenge #3: Support user defined conflict rules
Predefined default conflict rules may not be adequate for every situation. User defined conflict rules may be needed to achieve the best possible resolution for each conflict.
Phase 1: End of the first year
the first phase, the current Empress DBMS replication will be prototyped to support
two-way replication (Fig 1). It will be tested in Mobile environments such as
Android tablets. A methodology for user defined conflict rules will be prototyped.
Figure 1. Empress DBMS replication model will be prototyped to support two-way replication
Phase 2: End of the second year
In the second phase, the prototype will be extended to support foreign DBMS (e.g. Oracle, DB2, Microsoft SQL Server) as a Master database and Empress DBMS as a replicate database.
Figure 2. A prototype of a session based data-driven replication that allows two-way synchronization of distributed data with a Foreign Master database/Empress Replicate database
Phase 3: End of the third year
In the third phase, the prototype will be extended to support multiple foreign DBMS and multiple Empress DBMS in the network. The combined DBMS system could be operating in a Cloud, with mobile or non-mobile nodes. Empress replication module will be responsible for storing all meta-data about synchronization that will take place across the network.
Figure 3. A prototype of a session based data-driven replication that allows two-way synchronization of distributed data in the heterogeneous database mobile environments
Geospatial urban information & communication technologies for real-time applications
Researchers: Gunho Sohn, Costas Armenakis, York University
Part I: Project Description
In recent years, the urban ICT (Information and Communication Technology) such as digital media, ubiquitous computing, urban screen, mobile and wireless communication technologies and internet is rapidly modifying city living. The geospatial information becomes centric as more numbers of location-based services has been rapidly accepted for commerce, entertainment, social networking and public purpose. The augmented city is a modern definition of urban spaces where “virtual” and “physical” spaces are no longer two separate dimensions, but just parts of a continuum, of a whole (Aurigi and De Cindio, 2008). The “virtual” urban space (digitally enhanced space) is a digital environment where the urban ICT is connected to “physical” urban space. The “virtual” urban space is often represented by three-dimensional (3D) cityscapes model. These 3D city models are virtual representations of the “physical” urban spaces that digitally reproduce all the urban objects with a semantically enriched polyhedral world. The human beings’ physical-digital intersection exists through the awareness of their locations (sense of belonging to place) on the global level, but also on a very local scale. This location awareness does not only mean having knowledge of “where we are”, but also the ability to perceive or to be conscious of objects, events and patterns of surrounding environments with respect to “where we are” (Wang et al., 2011).
The proliferation of data collection tools, from satellite sensors and GPS receivers to sensor webs and camera-equipped mobile phones and vehicles, and the resulting availability of huge amounts of geo-spatial data have created a rapidly increasing demand and need for timely and accurate geo-spatial information. Wireless location-based services and applications are accessible by mobile users using their spatial mobile location. Mobile communications & mobile computing are becoming geo-spatial information dependent because of their motion in space. The conditions, events and phenomena are continuously changing in space and time for many time-critical situation awareness applications such as emergency response, fire fighting, traffic and challenging environments. On-time the required data and information to support real-time knowledge-based decision-making are required. Methods and services for data exploration and analysis by interpreting and evaluating the patterns in the geo-spatial data are needed. Spatial data are characterized by their spatial complexity, their spatial relationships and the neighbouring correlations. Spatial patterns from spatial databases can be discovered and extracted using spatial data mining approaches. The data patterns could reveal spatial clustering, spatial neighbouring associations, detection of spatial trends, spatial characterization and classification.
In this proposal, we propose to develop advanced real-time geospatial data exploitation technology for time-critical applications such as search & rescue and tracking for the emergency responders (e.g., firemen in complex urban or forest environments) and public safety in 3D virtually augmented world. Geo-spatial semantic information is communicated, discovered and visualized through the proposed virtual reality model. The proposed project will exploit cutting-edge Geospatial Urban ICT technologies in the fields of: 1) seamless integration of in-door and outdoor geospatial models, databases and 3D geo-sensor networks 2) reality-augmented virtual tracking and geo-spatialization of emergency responders in real-time; 3) in-door/outdoor data collection and mapping using Unmanned Ground Vehicles (UGV). This will be accomplished by developing an augmented virtual reality in which the location of the emergency responders is seamlessly tracked in 3D indoor and outdoors using mobile devices such as smart phones, in a real-time. A brief milestone of the proposed project is as follows:
Aurigi, A. and De cindio, F., 2008.
Augmented urban spaces: articulating the physical and electronic city.
|Copyright ©2012 LIAKM, Toronto, Canada|