Robert Uzgalis: Research Interests

Research Interests

Microscope I find most theory in computer studies boring, lifeless, and misleading mostly because the assumptions backing it are seldom close to the real world. Improper assumptions cause most theory to mislead one into believing that something which is relatively simple is difficult or impossible. This property usually comes from the finite nature of real computation as opposed to the infinite domain postulated by most theorists. I prefer to keep my research grounded in the finite world, using theory to help, not hinder, the creation of ideas.

My main research interests are listed below:

Reliable Representation of Knowledge
Hashing, Random Numbers, and Encryption.
Programming Languages
Graphics
Human-Computer Interaction
Data-Structures and Efficient Retrieval

Hyperion as the Sun

Reliable Representation of Knowledge

I began work on a system to help one create reliable knowledge in 1985. Reliable knowledge is an relative notion, rather than an absolute one. One gains confidence in data as progressive checks are made to the data. A series of consistency checks, both internal and external on the data provides more and more confidence in the data. Thus as the data is filtered by each consistency check and errors are discovered and corrected, the data itself becomes more reliable and trustworthy. The invariants associated with the consistency check become data characteristics allowing reasoning within the domain of the consistency check.

While complete consistency checking is not possible, this successive partial checking of independent models is an important way to validate the data. The more models that can be applied to the data the more reliable the data can be made. This same idea could be applied to computer programs as well, to insure correct programs.

A flexible form for recording unqualified facts has been defined. Facts can look like written English which are examples of particular relationships in an entity/relationship model, with a subject being related by to zero or more object entities by some relations.

For example: `Ophelia is a female.' In which `Ophelia' is the subject entity, and there are no object entities. Thus, relations which don't involve object entities provide attributes for the subject.

Another example: `Ophelia gave her flowers to Hamlet.' `Ophelia' is the subject entity, `her flowers' and `Hamlet' are object entities, and `gave ... to' is the relation.

A series of Unix filters for manipulating and checking facts written in this form has been created. This system is called Factotum. Using it researchers can create their own formal models of data by specifying the syntax of relations, the types of entities, and the meaning of relations in terms of pre-existing paradigms. This formal model is described using Factotum facts, in a standard form, which are interpreted by the filters which do the checking of the data.

Over the years since 1985 when I started, Factotum, has gained in power and flexibility. It is still an experimental system in the sense that it has no unified user interface, couplings between filters are ad hoc, and it only works on Unix systems.

Current plans include the following:

Create a facts to html filter to allow hypertext viewing and publishing of the Factotum data.
Design a unified user interface for entering data.
Create a series of display tools so one can see the big picture.
Add statistical tools for building automated taxonomies of the data.
Add statistical tools testing hypotheses about data.
Create new paradigms for checking data.

Dice

Hashing, Random Numbers and Encryption

In building the Factotum tools I discovered that I needed a better hash function than the one I was using. To my surprise it is possible to build a general hash function, one that works on any domain and produces a uniform random distribution of hash values. Theory says this is impossible. However it is clear that noise is a relative notion, that is an algorithm that produces random bits that are independent of the keys that are to be hashed will add sufficient information to mask all the information contained in the keys, the result is a random uniform distribution.

There is no theory to predict how well this will work and how many keys can be hashed using this technique before the information in the keys begins to show through. I currently have a PhD student working on trying to build a model which will quantify this performance.

Empirically, however the performance of the hash function is impressive and independent of domain. The C-soure code for 'buzhash' for use on a 32-bit computer is small and suitable for use in a library. There is also a 50 page technical report from the University of Hong Kong called `General Hashing' which describes the development of the buzhash function and gives empirical results, it comes as a very large post-script file, don't down load it unless you really want it. There is also a 20 page technical report from the University of Auckland describing `Hashing Myths'.

Hyperion as the Sun

Human-Computer Interface

Programming Languages were the human-computer interface of the 1950s 60s and 70s. The experimental principles tested in those years provided some successes and many failures. In the 1980s and 90s graphic user interfaces have become common and new principles need to be developed. This new development should not take place without looking at the lessons of the past.

Much of the recent research into human-computer interfaces is led by psychologists who have become interested in computer science and are trying to apply what they learned in psychology to a new area. They are not well founded in computer science formalisms and principles. Often they approach the problem more as a psuedo-science than as a science. Generally this psychology based computer science research has resulted in failure and intellectual fatuation. This area of research is essential to computer progress and good clean academic work needs to permeate it.

At UCLA during the period 1979 to 1981 my research group made a concerted effort to identify, classify and develop methods for presenting high level language diagnostic information. As a test bed the group instrumented several compiler's so that we could determine how programmer's responded to kinds of diagnostic messages. We instrumented Calgol, PL/C, and other compilers.

The analysis of results were discouraging. We discovered that after a short period of adjustment the form of diagnostic had little effect on the number of program submittals necessary to accomplish a programming task. This was because humans quickly learn to understand even the most misleading diagnostics and quickly solve most bugs.

Almost every program contained a persistent error; an error which recurred job submittal after job submittal. Examination of these persistent errors yielded no single cause nor was it associated with some single type of diagnostic. Rather the problem usually came about because of a mis-conception of the progammer's interpretation of the semantics of the program he had written. Until the misconception was resolved he continued to try random changes to the code. Even if the error was isolated to a single statement. Sometimes the programmer would even read his own code, not as it was written, but as he expected it to be written. So that even with the awareness of exactly where the error was it took a long time to find.

This study convinced me that debugging systems do little to improve productivity. It also made clear the necessity of extensive practice in programming if one is to be become a productive programmer. This practice is something a university education never encourages or even offers as an alternative. Universities promote ideas and understanding not expertise.

Applied to human-interface problems today the research we did with programming language diagnostics tells me that fuzzy systems are liable to cause more disasters than they will ease life for users. Good simple semantics that a user can assimilate and will lead him through the computer interaction will finally be seen as essential to good design.

Quetzalcoatl

Graphics and Animation

In 1981, at UCLA, I designed, developed, and over the next three years built a system to do realistic computer animation called Glinda. Glinda was written in C under UNIX. Glinda used new multi-rendering techiques to create pictures. It also distributed rendering across a large number of networked mini-processors and reassembled the results into an animated picture.

In 1984 Glinda was used to create a five minute section of animation for a documentary film about Mexican culture. This involved the creation of a 3D model of Mayan/Aztec temple sites near Mexico City; recreating their appearance as they were half a millinenum ago.

Algorithms

Data Structures and Data Retrieval Algorithms

From the very beginning of my involvement with computing I have always been interesed in data structures and algorithms for the fast retrieval of data. In the 1970 I began work on designing efficient retrieval algorithms for data structures in two level memory, that is primary and secondary storage. This led to several papers with Richard Muntz and Rob Matthijsen. In the 1970 Muntx/Uzgalis article on how to store binary trees in two level memory, preceeded B-trees by many years and for many applications provides better all-over performance.

Later in the 1970s I developed a method for searching linked lists in square root time by making a trivial change to the node allocation method. These a called wholesale linked-lists. In 1995, Matthew Tong simulated the method and with me wrote it up in a technical report for the Computer Science Department at the University of Auckland.

With the work on general hashing Matthew and I are close to achieving an ideal primary/secondary storage retrieval system. However the understanding and modeling of general data structures in secondary storage is still in its infancy.