Track B: Data and application integration

THIS TRACK IS NOT AVAILABLE FOR TI STUDENTS IN QUARTILE 4


Introduction
Today's computer applications rely heavily on information systems being interconnected. In the early days of computing, we had "stand alone" programs. In the 21st century networked world, most applications exchange data with other applications, and new services are built on top of existing, heterogeneous applications. The integration of data and applications raises several issues, each of which can be studied further in the track on Data and Application Integration.
 
(1) Data semantics. In order to exchange data, it isn't enough to define a specific syntax. One has to agree upon the meaning of data. XML and RDF (semantic web) are the current buzzwords in this area. With XML and RDF it is possible to define ontologies of meaning and specify the semantics of data.
 
(2) Data integration. When data from different sources are brought together, and the semantics of each data source is defined by an XML schema, it is typically the case that the schemata are not entirely compatible. How, then, do we integrate these data? This is in fact an old problem, very similar to database schema integration, which was studied in the 1980s. There are several approaches to integrate conflicting schemata, but there isn't a fail-safe algorithm to do it automatically. Furthermore, the values in the fields/attributes often need to be converted and there are often many data quality problems in the data like incompleteness, ambiguity, errors, impreciseness. How, then, do we properly convert/complete/clean the data?
 
(3) Information extraction. Often, data is not available in a nicely structured form such as a database, but the information is on a web page, in a text document, message, or text field. Extracting the target information and storing in a nice structured form, usually for the purpose of data integration, is called "information extraction". For web pages, one also talks about web harvesting as the HTML-data can be viewed as semi-structured. In the other cases, the algorithms need to interpret natural language ... but wait, computers can't read, at least not understand what they read ... or can they? Furthermore, whatever an algorithm can extract under these unfavorable circumstances, will definitely be imperfect, so the data quality problem mentioned with (2) is even more severe here.
 
(4) Data exchange. Once we know which data to exchange and how to interpret them, there is the technical issue of interconnecting heterogeneous systems. Web services is a relatively new technique to do this, in which applications call upon each other by means of WWW protocols.
 
Suggested Topics
Data and Application Integration can be addressed in the context of specific application areas as, e.g., hospital information systems. Also, there are research issues connected to the technical topics mentioned above, e.g., privacy and security issues.
For the course Bachelorreferaat, any topic that fits the theme of data and application integration is acceptable.
 
Suggested Reading
* T. Berners-Lee, J. Hendler, O. Lassila (2001). The Semantic Web. Scientific American 284(5), 34-43.
* Chia Hui Chang; Kayed, Mohammed; Girgis, M.R.; Shaalan, K.F., "A Survey of Web Information Extraction Systems," IEEE Transactions on Knowledge and Data Engineering, vol.18, no.10, pp. 1411-1428, Oct 2006. doi: 10.1109/TKDE.2006.152.
* Jim Cowie, Wendy Lehnert. Information extraction. Communications of the ACM 39, 1 (January 1996), pp. 80-91. DOI 10.1145/234173.234209
* C.C. Marshall & F.M. Shipman (2003). Which Semantic Web. Proc. 14th ACM Conf. on Hypertext and Media.
* E. Rahm and P.A. Bernstein (2001). A Survey of Approaches to Automatic Schema Matching. The VLDB Journal 10, 334-350
* T.F. Stafford (Ed.) (2003). Special issue on E-services and Web Services, Communications of the ACM 46(6), 26-67.
* M.P. Papazoglou and D. Georgakopoulos (Eds.) (2003). Special issue on Service-Oriented Computing, Communications of the ACM 46(10), 24-89.
 
Further Information
For further information on the content of this track, you may contact the track chair Maurice van Keulen,M.vanKeulen@utwente.nl. For any information on the conference organisation, please contact the conference chair on h.koppelman@utwente.nl.