Project Proposals for William Flynn Scholarship:
IIndex
page
Project Number: 5
Project Title: Database Design in Bioinformatics
Project Supervisor: Mr John Mc Gregor plus another member of staff
The development of fully automated DNA sequencing technologies
has resulted in an information deluge in the field of molecular
biology. The so-called 'sequence structure deficit' - the exponentially
increasing available of sequence data which far outstrips the resultant
information relating to actual 3 dimensional structures - represents
a very significant problem (arguably getting worse) which can only
be addressed using ever more sophisticated computing technology.
This general field has become known as Bioinformatics.
The formulation of hypotheses as to (for example) protein structure
depends on ready access to sequence data presented in a useful way.
Two distinct analytical approaches are now common. The first approach
involves the use of pattern recognition techniques to detect similarities
between sequences and thus to deduce related structure and function;
the second attempts direct predictions from the linear structure
to deduce 3D structure and infer function.
Both of these approaches, and others, require sensible data organisation.
As more is discovered about sequence data, and the resulting genetic
information encapsulated therein, this in turn has consequences
for the retooling of the underlying structure of the database design.
In addition, as geneticists and other researchers refine the nature
of their interactions with the data, this too has implications for
database design.
As well as databases of sequence data there have evolved secondary
databases containing metadata. These have arisen because within
multiple alignments it has been found regions of data which have
little variation between constituent sequences. These regions constitute
identifying motifs having some specific biological function and
which can be classified. The structures of these databases have
evolved in markedly different directions, posing yet further computational
challenges.
The ever increasing quantities of data, and the wide variety of
data enquiry systems and algorithmic techniques, has inevitably
dictated that the access to and delivery of information is carried
out using the internet. Web technology, distributed database systems,
object orientation, and intelligent interfaces represent just some
of the areas to be looked at, linked by the common theme of enhancing
understanding of genetic sequence data.
Hand in hand with database issues will come new algorithms and methods
for computational biology, especially those aimed at addressing
efficiency, scalability, and cost issues associated with high-performance
computing. Areas such as sequence analysis, structure and function
prediction, neural information theory, whole genome analysis, pharmacgenomics,
expression microarrays, large structure and in-vivo imaging, will
benefit as the appropriate database design becomes better understood.
This project will examine database structures as they are used in
these large datasets and attempt (based on analysis and experiment)
to propose some future sensible developments.
[1] Attwood, TK, Parry-Smith, DJ, Introduction to Bioinformatics,
Prentice Hall, 1999.
[2] Schulze-Kremer, S, Molecular Bioinformatics: Algorithms and
Applications, Walter de Gruyter, 1996.
If you are interested in being considered for a studentship please
contact
the Group Director, Professor T.M. McGinnity by email:
tm.mcginnity@ulst.ac.uk
or telephone: +44-(0)28-71375417.
See the current research section of this website
for details on research projects pursued by existing PhD students
|