# Lecture Notes in Computer Science

Edited by G. Goos and J. Hartmanis

# 457

H. Burkhart (Ed.)

# CONPAR 90-VAPP IV

Joint International Conference on Vector and Parallel Processing Zurich, Switzerland, September 10–13, 1990 Proceedings



Springer-Verlag

Berlin Heidelberg New York London Paris Tokyo Hong Kong Barcelona

#### **Editorial Board**

D. Barstow W. Brauer P. Brinch Hansen D. Gries D. Luckham C. Moler A. Pnueli G. Seegmüller J. Stoer N. Wirth

Editor Helmar Burkhart Institut für Informatik, Universität Basel Mittlere Staße 142, CH-4056 Basel, Switzerland

5180

CR Subject Classification (1987): C.1, J.2

ISBN 3-540-53065-7 Springer-Verlag Berlin Heidelberg New York ISBN 0-387-53065-7 Springer-Verlag New York Berlin Heidelberg

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, reoitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is only permitted under the provisions of the German Copyright Law of September 9, 1965, in its version of June 24, 1985, and a copyright fee must always be paid. Violations fail under the prosecution act of the German Copyright Law.

© Springer-Verlag Berlin Heidelberg 1990 Printed in Germany

Printing and binding: Druckhaus Beltz, Hemsbach/Bergstr. 2145/3140-543210 - Printed on acid-free paper

# Preface

While parallel architectures were pure research vehicles some years ago, this situation has changed substantially. There are many commercial systems available now that compete for market segments in scientific computing. The 1990s are likely to become the decade of parallel processing.

The past decade has already seen the emergence of the two highly successful series of CONPAR and VAPP conferences on the subject of parallel processing. The Vector and Parallel Processors in Computational Science (VAPP) meetings were held in Chester (1981), Oxford (1984) and Liverpool (1987). The International Conferences on Parallel Processing (CONPAR) took place in Erlangen (1981), Aachen (1986) and Manchester (1988). Despite the importance of parallel architectures and parallel computing, the Standing Committees of both conference series got the impression that there are too many conferences, workshops, summer schools, and exhibitions at the moment. The idea of a joint conference came up. On one hand we succeeded because for the first time these conferences are being held together. During the preparations for this conference, however, several new meetings appeared, with the result that there is a tremendous number of events this year. The organizers of CONPAR 90 - VAPP IV are thus satisfied to see that their conference series is already quite mature. We have received such a lot of good and well-written papers that we had to reduce the number of published papers considerably. Whether CONPAR and VAPP continue as joint conferences in future is still open at the moment. Hopefully this joint conference series can be developed into the important European event.

This is the first time such a big conference on parallel processing is taking place in Switzerland. Returning home from the Frankfurt meeting in February 1989 where the final vote was given for a Zurich conference, it was a great relief to get so much support from colleagues. I would like to express my special thanks to Ernst Rothauser, who took the heavy load of coordinating all local arrangements and the organization.

I would also like to thank the other members of the Steering and Organizing Committees for their continuous help. Special thanks go to Peter Arbenz, Armin Friedli, Walter Gander, Hans-Jürgen Halin and Richard Wait. I wish to extend my sincere thanks to the members of the Program Committee for their contributions to the shaping of the conference program and their help in reviewing papers. I also express my gratitude to all other referees for their assistance in this process. The idea of the Computation Race came up in an early lunch with Jürg Nievergelt. I would link to thank him as well as the Awards Committee for this and other enrichments of the conference program.

Two prominent computer architects offered their help in organizing this event. Professor Speiser will act as the Honorary Chairman and will address the conference with his keynote "Digital Electronics for 50 Years: No Limits to Growth?". Professor Händler, the founder of the CONPAR series, is acting as the Chairman of the Standing Committee. We are indebted to him for his continual advice on, and confidence in, our Zurich conference.

The preparation of the technical program was a time-consuming process. I would not have managed to fulfill all the deadlines without my assistants Stephan Gutzwiller and Stephan Waser, who carefully co-ordinated all steps and many times suffered with me. The secretaries Mrs. A. Mathys and Mrs. Rothauser helped a lot to ease our job. Let me thank them all.

No conference preparations can be made without initial funding. The Swiss Informatics Society/Swiss Chapter of the ACM and IEEE Switzerland Section provided this help without hesitation. GI-PARS, BCS-PPSG, and IEEE CS later co-operated.

We would like to thank ETH Zürich for acting as the host site, as it provides such a pleasant conference environment. Last but not least I would like to thank the University of Basel for providing an infrastructure which enabled us to organize the conference from a distance.

The proceedings in hand start with the keynote address given by the Honorary Chairman. Next come two papers given by invited speakers, V. Bhatkar and E. Odijk. The main part of the proceedings consists of 77 papers written by authors from 20 different countries. These contributed papers have been selected by an international program committee. The topics of the papers are manifold; please note that we have grouped the table of contents according to the session titles. We have also added the rules for the Computation Race for future reference. The results of this competition will be presented at the conference and possibly publisbed later.

Now it is up to you, the conference participant and reader of these proceedings, to make the final assessment.

Basel, June 1990

H.Burkhart

# Contents

| Keynote Address                                                                                                                                                                                         |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| A. P. Speiser<br>Digital Electronics for 50 Years: No Limits to Growth?                                                                                                                                 |
| Invited Presentations                                                                                                                                                                                   |
| V. P. Bhatkar<br>Parallel Computing: An Indian Perspective                                                                                                                                              |
| New Models of Computation                                                                                                                                                                               |
| P. Evripidou and J-L. Gaudiot<br>A Decoupled Data-Driven Architecture with Vectors and Macro Actors                                                                                                     |
| R. W. Hartenstein, A. G. Hirschbiel and M. Weber<br>A Novel Paradigm of Parallel Computation and its Use to Implement Simple High<br>Performance Hardware                                               |
| H. Kikuchi, T. Yukawa, K. Matsuzawa and T. Ishikawa<br>Presto: A Bus-Connected Multiprocessor for a Rete-Based Production System 63                                                                     |
| Performance Prediction, Analysis, and Measurement                                                                                                                                                       |
| A. Basu, S. Srinivas, K.G. Kumar and A. Paulraj                                                                                                                                                         |
| A Model for Performance Prediction of Message Passing Multiprocessors Achieving<br>Concurrency by Domain Decomposition                                                                                  |
|                                                                                                                                                                                                         |
| Concurrency by Domain Decomposition                                                                                                                                                                     |
| Concurrency by Domain Decomposition                                                                                                                                                                     |
| Concurrency by Domain Decomposition 75   G. Lyon and R. D. Snelick 86   Workloads, Observables, Benchmarks and Instrumentation 86   F. Sötz A Method for Performance Prediction of Parallel Programs 98 |
| Concurrency by Domain Decomposition                                                                                                                                                                     |

# Logic Programming

| Gao Yaoqing, Sun Chengzheng and Hu Shouren<br>Study of a Parallel Inference Machine for Parallel Execution of Logic Programs 143 |
|----------------------------------------------------------------------------------------------------------------------------------|
| A. Gupta, A. Banerjea, V. Jha, V. Bafna and PCP Bhatt<br>Parallel Implementation of Logic Languages                              |
| P. Kacsuk<br>Prolog Implementations on Parallel Computers                                                                        |
| Performance Monitoring and Debugging                                                                                             |
| B. Mohr<br>Performance Evaluation of Parallel Programs in Parallel and Distributed Systems 176                                   |
| M. Moser<br>The ELAN Performance Analysis Environment                                                                            |
| M. Zitterbart<br>Monitoring and Debugging Transputer-Networks with NETMON-II                                                     |
| Algorithms for Matrix Factorization                                                                                              |
| Ch. H. Bischof and Ph. G. Lacroute<br>An Adaptive Blocking Strategy for Matrix Factorizations                                    |
| J. Du Croz, P. Mayes and G. Radicati<br>Factorizations of Band Matrices Using Level 3 BLAS                                       |
| M. Hegland<br>On the Computation of Breeding Values                                                                              |
| Large-Grain Data Flow                                                                                                            |
| Kechang Dai<br>Code Parallelization for the LGDG Large-Grain Dataflow Computation                                                |
| D. C. DiNucci and R. G. Babb II<br>Development of Portable Parallel Programs with Large-Grain Data Flow 2                        |
| O. C. Maquelin<br>ADAM: A Coarse-Grain Dataflow Architecture that Addresses the Load Balancing<br>and Throttling Problems        |
| S. B. Murer<br>A Latency Tolerant Code Generation Algorithm for a Coarse Grain Dataflow<br>Machine                               |
| Compile-Time Analysis and Restructurers                                                                                          |
| R. Eigenmann, J. Hoeflinger, G. Jaxon and D. Padua<br>Cedar Fortran and Its Compiler                                             |

| H. M. Gerndt and H. P. Zima<br>Optimizing Communication in Superb                                               | • |
|-----------------------------------------------------------------------------------------------------------------|---|
| Sang Lyul Min, Yarsun Hsu and Hyoung-Joo Kim<br>A Design of Performance-optimized Control-based Synchronization |   |
| K. L. Spier and B. K. Szymanski<br>Interprocess Analysis and Optimization in the Equational Language Compiler   |   |
| Architectures and Algorithms for Image Processing                                                               |   |
|                                                                                                                 |   |
| B. Chardonnens, R. D. Hersch and O. Kölbl<br>Transputer Based Distributed Cartographic Image Processing         | • |
|                                                                                                                 |   |

| Parallel Implementation of the Convolution Method in Image Reconstruction                                  | 355 |
|------------------------------------------------------------------------------------------------------------|-----|
| D. Stokar, A. Gunzinger, W. Guggenbühl, E. Hiltebrand, S. Mathis, P. Schaeren, B. Schneuwly and M. Zeltner |     |
| SYDAMA II: A Heterogeneous Multiprocessor System for Real Time Image                                       |     |
| Processing                                                                                                 | 365 |

# Interconnection Networks

| A. Harissis, C. Jam and A. Ambler<br>Analysis and Design of Circuit Switching Interconnection Networks Using 4x4<br>Nodes | 374 |
|---------------------------------------------------------------------------------------------------------------------------|-----|
| R. Holzner and S. Tomann<br>Design and Simulation of a Multistage Interconnection Network                                 | 385 |
| R. J. Richter<br>A Reconfigurable Interconnection Network for Flexible Pipelining                                         | 397 |
| Load Balancing and the Mapping Problem                                                                                    |     |
| J. E. Boillat and P. G. Kropf<br>A Fast Distributed Mapping Algorithm                                                     | 405 |
| F. Dehne and M. Gastaldo<br>A Note on the Load Balancing Problem for Coarse Grained Hypercube Dictionary<br>Machines      | 417 |
| P. Eklund and M. Kaufmann<br>Hierarchical Wiring in Multigrids                                                            | 423 |
| Efficient Use of Vector Processors                                                                                        |     |

# Efficient Use of Vector Processors

| O. Haan and W. Waelde<br>FFTVPLIB, a Collection of Fast Fourier Transforms for Vectorprocessors                             |
|-----------------------------------------------------------------------------------------------------------------------------|
| H. Weberpals<br>Improving the Vector Performance via Algorithmic Domain Decomposition                                       |
| Communication Issues                                                                                                        |
| J-Y. Blanc and D. Trystram<br>Implementation of Parallel Numerical Routines Using Broadcast Communication<br>Schemes        |
| P. Istavrinos and L. Borrmann<br>A Process and Memory Model for a Parallel Distributed-Memory Machine                       |
| L. Mugwaneza, T. Muntean and I. Sakho<br>A Deadlock Free Routing Algorithm with Network Size Independent Buffering<br>Space |
| Process Partitioning and Work Distribution                                                                                  |
| R. Beccard and W. Ameling<br>From Object-Oriented Programming to Automatic Load Distribution                                |
| F. Bieler<br>Partitioning Programs into Processes                                                                           |
| R. Jakob and H. F. Jordan<br>An MIMD Execution Environment with a Fixed Number of Processes                                 |
| Performance Considerations                                                                                                  |
| B. A. W. Baugstø, J. F. Greipsland and J. Kamerbeek<br>Sorting Large Data Files on POOMA                                    |
| R. Knecht<br>Parallelizing Divide-and-Conquer Algorithms - Microtasking versus Autotasking 548                              |
| E. Schnepf<br>The Performance of Linear Algebra Subprograms on the Siemens S Series                                         |
| Reconfigurable and Scalable Systems                                                                                         |
| K. Boyanov and K. Yanev<br>A Family of Highly Parallel Computers                                                            |
| F. Höpfl, J. Schirrmacher and M. Trent<br>A Distributed Shared Memory Multiprocessor Kit with Scalable Local Complexity 581 |
| M. Thapar and B. Delagi<br>Scalable Cache Coherence for Large Shared Memory Multiprocessors                                 |

# **Concurrency** Control

| V. Issarny<br>Design and Implementation of an Exception Handling Mechanism for<br>Communicating Sequential Processes                                                          |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| H-J. Plewan and P. Schlenk<br>Creating and Controlling Concurrency in Object Oriented Systems - A Case Study 616                                                              |
| J. Rost and E. Maehle<br>A Distributed Algorithm for Dynamic Task Scheduling                                                                                                  |
| Transputer Tools and Applications                                                                                                                                             |
| JM. Adamo and Ch. Bonello<br>TéNOR++: A Dynamic Configurer for SuperNode Machines                                                                                             |
| G. W. Chege, R. W. Taylor and J. M. Tealby<br>Parallel Modelling of Electromagnetic Field Scattering: A New Approach Using the<br>Edinburgh Concurrent Supercomputer Facility |
| G. J. Shaw, A. Stewart and L. C. Waring<br>3D Multigrid Correction Methods for Transputer Networks                                                                            |
| Cellular/Systolic Architectures and Algorithms                                                                                                                                |
| J. H. Gonçalves Romero<br>A Comparative Study of Two Wavefront Implementations of a LU Solver<br>Algorithm                                                                    |
| S. G. Sedukhin<br>Systolic Array Architecture for Two-Dimensional Discrete Fourier Transform 682                                                                              |
| A. Zsótér, T. Legendi and G. Balázs<br>Design and Implementation of M1 Cellprocessor                                                                                          |
| Implementation Issues for Novel Architectures and Languages                                                                                                                   |
| H. Garsden and A. L. Wendelborn<br>A Comparison of Microtasking Implementations of the Applicative Language<br>SISAL                                                          |
| Guang R. Gao, H. H. J. Hum and Yue-Bong Wong<br>An Efficient Scheme for Fine-Grain Software Pipelining                                                                        |
| D. H. Grit<br>Sisal on a Message Passing Architecture                                                                                                                         |
| The TOPSYS Tool Environment                                                                                                                                                   |
| Th. Bemmerl<br>The TOPSYS Architecture                                                                                                                                        |

| Th. Bemmerl and Th. Ludwig<br>MMK - A Distributed Operating System Kernel with Integrated Dynamic<br>Loadbalancing                                                       |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Th. Bemmerl, R. Lindhof and Th. Treml<br>The Distributed Monitor System of TOPSYS                                                                                        |
| Array Processors and Applications                                                                                                                                        |
| M. Clint, J. S. Weston and C. W. Bleakney<br>Hybrid Algorithms for the Eigensolution of Large Sparse Symmetric Matrices<br>on the AMT DAP 510                            |
| P. Flanders<br>Virtual Systems Architecture on the AMT DAP                                                                                                               |
| M. Schäfer<br>Numerical Simulation of Thermal Convection on SIMD Computers                                                                                               |
| High-Performance Systems and Accelerators                                                                                                                                |
| M. Makhaniok, V. Cherniavsky, R. Männer and O. Stucky<br>Massively Parallel Realization of Logical Operations in Distributed Parallel<br>Systems                         |
| N. N. Mirenkov<br>High-Performance Computer System "SIBERIA"                                                                                                             |
| M. Ward, P. Townsend and G. Watzlawik<br>EDS Hardware Architecture                                                                                                       |
| Visualization and Runtime Analysis                                                                                                                                       |
| F. Abstreiter<br>Visualizing and Analysing the Runtime Behavior of Parallel Programs                                                                                     |
| Th. Bemmerl, O. Hansen and Th. Ludwig<br>PATOP for Performance Tuning of Parallel Programs                                                                               |
| S. Sharma<br>Real-Time Visualization of Concurrent Processes                                                                                                             |
| Algorithmic Studies for Hypercube-Type Systems                                                                                                                           |
| M. Cosnard and J-L. Philippe<br>Achieving Superlinear Speedups for the Multiple Polynomial Quadratic Sieve<br>Factoring Algorithm on a Distributed Memory Multiprocessor |
| M. Cosnard and P. Fraigniaud<br>A Performance Analysis of Network Topologies in Finding the Roots of a<br>Polynomial                                                     |
| M. Vajteršic<br>Parallel Multigrid Algorithms for some Specialized Computer Systems                                                                                      |

| Computation Race | 897 |
|------------------|-----|
| Authors Index    | 899 |

xι

# Committees

# **STANDING COMMITTEE**

| W. Händler   | Univ. Erlangen (FRG)<br>Chairman |
|--------------|----------------------------------|
| P.C.P. Bhatt | IIT Delhi (IND)                  |
| K. Boyanov   | IMIS Sofia (BG)                  |
| H. Burkhart  | Univ. Basle (CH)                 |
| M. Cosnard   | ENS Lyon (F)                     |
| L.M. Delves  | Univ. Liverpool (UK)             |
| I. Plander   | Ac. Bratislava (CSFR)            |

# STEERING COMMITTEE

| H. Burkhart  | Univ. Basle (CH)    |
|--------------|---------------------|
|              | General Chairman    |
| A.P. Speiser | ABB Baden (CH)      |
| -            | Honorary Chairman   |
| W. Gander    | ETH Zurich (CH)     |
| M. Gutknecht | ETH Zurich (CH)     |
| A. Kündig    | ETH Zurich (CH)     |
| E. Rothauser | IBM Ruschlikon (CH) |
|              |                     |

# **PROGRAM COMMITTEE**

| H. Burkhart     | Univ. Basle (CH)<br>Chairman |
|-----------------|------------------------------|
| H. Aiso         | Keio Univ. (J)               |
| R.G. Babb II    | Oregon Univ. (USA)           |
| V. P. Bhatkar   | CDĂC Pune (IND)              |
| P.C.P. Bhatt    | IIT Delhi (IND)              |
| K.C. Dai        | GMD Berlin (FRG)             |
| L.M. Delves     | Univ. Liverpool (UK)         |
| R. Eigenmann    | Univ. Illinois (USA)         |
| Ph. de Forcrand | ETH Zurich (CH)              |
| W. Gander       | ETH Zurich (CH)              |
| R. Gruber       | EPF Lausanne (CH)            |
| D. Haupt        | RWTH Aachen (FRG)            |
| Ch. Jesshope    | Univ. Southampton (UK)       |
| G. Joubert      | Philips Eindhoven (NL)       |
| M. Kaiserswerth | IBM Ruschlikon (CH)          |
| A. Kündig       | ETH Zurich (CH)              |
| T. Lake         | Glossa Reading (UK)          |
| O. Lange        | TU Hamburg (FRG)             |
| T. Legendi      | Cellware Budapest (H)        |
| H. Liddell      | Queen Mary Coll. (UK)        |
| P. Meier        | Univ. Zurich (CH)            |

| T. Muntean        | Univ. Grenoble (F)     |
|-------------------|------------------------|
| J.D. Nicoud       | EPF Lausanne (CH)      |
| E. Odijk          | Philips Eindhoven (NL) |
| J. Pachl          | IBM Ruschlikon (CH)    |
| D.A. Padua        | Univ. Illinois (USA)   |
| D. Parkinson      | Queen Mary Coll.(UK)   |
| R.H. Perrott      | Univ. Belfast (UK)     |
| B. Quatember      | Univ. Innsbruck (A)    |
| J.K. Reid         | Harwell Lab. (UK)      |
| L. Richter        | Univ. Zurich (CH)      |
| B. Sendov         | Academy Sofia (BUL)    |
| D. Sorensen       | Univ. Houston (USA)    |
| O. Sykora         | Ac. Bratislava (CSFR)  |
| M. Vajtersic      | Ac. Bratislava (CSFR)  |
| M. Vanneschi      | Univ. Pisa (I)         |
| A. J. Vasconcelos | Unipede Brussels(B)    |
| R. Wait           | Univ. Liverpool (UK)   |
| T. Yuba           | Tsukuba-shi (J)        |
| C. Yen            | Beijing Polytech.(PRC) |

# AWARDS COMMITTEE

| J. Nievergelt | ETH Zurich (CH)        |
|---------------|------------------------|
|               | Chairman               |
| M. Annaratone | ETH Zurich (CH)        |
| J. Dongarra   | Univ.of Tennessee(USA) |
| I. Duff       | Harwell Lab. (UK)      |
| W. Händler    | Univ. Erlangen (FRG)   |
| H. Jordan     | Univ. Colorado (USA)   |
| P. Kropf      | Univ. Berne (CH)       |
| E. Rothauser  | IBM Ruschlikon (CH)    |
| H. Simon      | Nasa Ames (USA)        |
| J. Staunstrup | Univ. Lyngby (DK)      |
| P. Stucki     | Univ. Zurich (CH)      |

#### ORGANIZING COMMITTEE

| E. Rothauser  | IBM Ruschlikon (CH)   |
|---------------|-----------------------|
|               | Chairman              |
| P. Arbenz     | ETH Zurich (CH)       |
| A. Friedli    | ETH Zurich (CH)       |
| J. Halin      | ETH Zurich (CH)       |
| R. Henzi      | Sulzer Winterthur(CH) |
| W. Juling     | RWTH Aachen (FRG)     |
| H. Liddell    | Queen Mary Coll. (UK) |
| K.D. Reinartz | Univ.Erlangen (FRG)   |
| R. Wait       | Univ. Liverpool (UK)  |
|               | · · ·                 |

M.Annaratone P.Arbenz R.G.Babb II V. P.Bhatkar P.C.P.Bhatt V.C.Bhavsar H.Bieri A.Bode J.Boillat H.Burkhart A.Coen M.Cosnard K.C.Dai M.Dal Cin K.Decker L.M.Delves J.Dongarra I.Duff R.Eigenmann W.Erhard C.Falcó-Korn **B.Faltings** Flück A.Friedli W.Gentzsch R.Gruber D.W.Gruntz A.Gunzinger M.Gutknecht Gutzmann St.Gutzwiller A.Hagerer W.Händler T.Härder D.Haupt R.Henzi R.Herbin D.Hogreve F.Hosfeld Ch.Jesshope H.Jordan G.Joubert W.Juling J.P.Katoens H.Kirrmann R.Klar P.Kropf A.Kündig T.Lake O.Lange

H,Liddell

ETH Zurich (CH) ETH Zurich (CH) Oregon Univ.(USA) CDAC Pune (IND) IIT Delhi (IND) CDAC Pune (IND) Univ. Berne (CH) TU Munich (FRG) Univ. Berne (CH) Univ. Basle (CH) Politech.di Milano (I) ENS Lyon (F) GMD Berlin (FRG) Univ. Erlangen (FRG) Univ. Berne (CH) Univ. Liverpool (UK) Univ.of Tenessee (USA) Harwell Lab. (UK) Univ. Illinois (USA) Univ...Erlangen (FRG) Univ. Basle (CH) EPF Lausanne (CH) EPF Lausanne (CH) Ph. de Forcrand ETH Zurich (CH) ETH Zurich (CH) FH Regensburg (FRG) EPF Lausanne (CH) ETH Zurich (CH) ETH Zurich (CH) ETH Zurich (CH) Univ. Erlangen (FRG) Univ. Basle (CH) Univ. Passau (FRG) Univ. Erlangen (FRG) Univ. Kaisersl. (FRG) RWTH Aachen (FRG) Sulzer Informatik (CH) EPLF Lausanne (CH) Univ. Berne (CH) KFA (FRG) Univ. Southampton (UK) Univ.of Colorado (USA) Philips Eindhoven (NL) RWTH Achen (FRG) M.Kaiserswerth IBM Ruschlikon (CH) Philips Eindhoven (NL) ABB (CH) Univ. Erlangen(FRG) Univ. Berne (CH) ETH Zurich (CH) Glossa Reading (UK) TU Hamburg (FRG) Queen Mary College (UK)

B.B.Madan E.Maehle R.Männer P.Meier Ma.Miyakawa M.Moser T.Muntean H.H.Nägeli J.D.Nicoud J.Nievergelt E.Odijk K.Ohmaki J.Pachl D.A.Padua D.Parkinson R.H.Perrott W.P.Petersen B.Quatember J.K.Reid K.D.Reinartz R.Reith L.Richter M.G.Sami **B**.Sanders H.Schmeck H.Scholian P.Schorn D.Sehr J.Seib S.Sekiguchi B.Scndov H.Simon F.Sötz J.Staunstrup D.Stokar A.Strey P.Stucki O.Sykora C.Szyperski H.Thoma J.Tusk Ch.Ullrich M.Ulot M.Vajtersic M.Vanneschi U.von Matt R.Wait St.Waser D.Würtz C.Yen T.Yuba

IIT Delhi (IND) Univ. Paderborn (FRG) Univ. Heidelberg (FRG) Univ. Zurich (CH) Tsukuba-shi (J) ETH Zurich (CH) Univ. Grenoble (F) Univ. Neuchâtel (CH) EPF Lausanne (CH) ETH Zurich (CH) Philips Eindhoven (NL) Tsukuba-shi (J) IBM Ruschlikon (CH) Univ. Illinois (USA) Queen Mary College (UK) Univ. Belfast (UK) IPS Zürich (CH) Univ. Innsbruck (A) Harwell Lab. (UK) Univ. Erlangen (FRG) Univ. Basle (CH) Univ. Zurich (CH) Politech.di Milano (I) ETH Zurich (CH) Univ. Kiel (FRG) ETH Zurich (CH) ETH Zurich (CH) Univ.of Illinois (USA) Univ. Mannheim (FRG) Tsukuba-shi (J) Academy Sofia (BUL) NASA Ames (USA) Univ.Erlangen (FRG) TU of Denmark (DK) ETH Zurich (CH) Univ. Erlangen (FRG) Univ. Zürich (CH) Ac. Bratislava (CSFR) ETH Zurich (CH) Ciba-Geigy Basle (CH) Philips Eindhoven (NL) Univ. Basle (CH) Philips Eindhoven (NL) Ac. Bratislava (CSFR) Univ. Pisa (I) A.J.Vasconcelos Unipede Brussels (B) ETH Zurich (CH) Univ. Liverpool (UK) Univ. Basle (CH) ETH Zurich (CH) Beijing Polytechnic (PRC) Tsukuba-shi (J)

# Keynote Speech by Professor A. P. Speiser Honorary Chairman

#### **Digital Electronics for 50 Years:**

No Limits to Growth?

To talk about the past of digital electronics and about the limits to its growth certainly presents a challenge to someone who has been associated with digital electronics research for his entire professional life. Looking back over the five decades of its history, meditating what might have or ought to have been done differently, and trying to draw conclusions with the purpose of predicting the future is a temptation that is hardly resistible. But not all of this is suitable to be presented in a keynote lecture. Some of the thoughts that come to mind could, if misunderstood, be mistaken as criticism by colleagues who are still on the scene und could feel offended. Other thoughts might contain ideas on the future that could turn out to be incorrect.

But besides this, my assignment has other pitfalls, especially when it comes to jugde what is happening to-day. It is a general fact that we humans understand least the times in which we ourselves live. Understanding comes later, when the facts can be viewed within the historic perspective and when it is easier to separate significant events from unimportant ones - or, to put it differently, when one can see what has been signal and what has been noise. Thus, interpretation of current events must be done with caution.

# The Start

What do we mean by digital electronics? Two elements characterize digital electronics:

- The use of vacuum tubes or transistors as switches, in other words in a mode where only the states "on" and "off" are of importance.
- The use of a multiplicity of such switches to represent numerals and letters by means of a suitable code: "On" means "one", "off" means "zero".

Separately these two elements had been known for a while. The Flipflop, a basic circuit, was invented in 1919 by Eccles and Jordan in England; and the use of electromechanical relays to perform calculations was practised both in the USA and in Germany in the late 30's. But the combination of the two had to wait for the 40's, and it was realised on a truly grand scale. The word ENIAC is known to every computer scientist, it stands for "Electronic Numerical Integrator and Calculator", a huge machine with no less than 18000 tubes which opened the age of digital electronics. The acronym ENIAC was followed by countless others, all ending with -AC for Automatic Calculator, as EDSAC, EDVAC, SEAC, SWAC, UNIVAC. (Only Howard Aiken at Harvard University detested acronyms, he called his machines Mark I through Mark IV).

Until the early 50's, all computers were built at Universities or government establishments; industry in the beginning took a wait-and-see attitude. But then the industrial companies stepped in. And yet, a vacuum tube machine of the size of ENIAC was never built again - the limited reliability of components would have prevented satisfactory performance in a customer environment. Large machines had 2 000 or at most 4 000 tubes. But within this size limitation, electronic computers became not only a technological, but also a commercial success, a success which far exceeded all predictions.

But then came an event which opened new dimensions and which demonstrated to the world that the spread of computers, which had been regarded sensational, in reality was only a small rehearsal for what was to come: The transistor, invented in 1948, had reached maturity and could be used in computers beginning about 1957. Now it became possible to use thousands and even tens of thousands of active elements routinely in one machine. Computers experienced a further, enormous growth. And yet, the main event was still forthcoming: Integrated circuits became available for industrial use after about 1963. For the second time, an astonished world realised that what they thought was the

beginning of the computer age in reality was only a minor rehearsal for the real performance.

What had happened since then is well known to all of you - the subject of our conference and the substance of the papers which we are hearing is an impressive demonstration of what has resulted from ENIAC.

The key events which have opened the doors to to-day's digital electronics are:

| ENIAC              | 1943-46 | Eckert, Mauchly                     | Philadelphia               |
|--------------------|---------|-------------------------------------|----------------------------|
| Stored Program     | 1944    | Von Neumann                         | Princeton,<br>New Jersey   |
| Transistor         | 1948    | Bardeen, Brattain, Shockley         | Murray Hill,<br>New Jersey |
| Integrated Circuit | 1958    | Simultaneously at several locations |                            |

#### Who Invented Digital Electronics?

Success has a thousand fathers, failure is an orphan. Digital electronics has been enormously successful, and it comes as no surprise that numerous persons are credited to be the fathers. The early history of this endeavour to-day is well documented. There is general agreement that the two leading figures of ENIAC, J. P. Eckert and J. W. Mauchly, have opened the path for digital electronics. They have laid their development down in a voluminous patent application which has led to an issued patent. But then something very unexpected happened: Attention was drawn to the previously unknown name of J. V. Atanasoff, a physics professor at Iowa State College, born 1903. He conceived and built a complete electronic computer in the years from 1939 to 1942 which to-day is well documented. There is no doubt that Atanasoff had invented and implemented all essential elements of digital electronics. His computer was completed, but it was never operated productively; Atanasoff was assigned other work by the Armed Forces. His results fell into oblivion, Atanasoff has not influenced the course of technology. Still, in view of his work it is justified to say that digital electronics to-day is 50 years old.

The late recognition of Atanasoff's computer has had a legal epilogue with considerable financial consequences: At the end of a lengthy and costly lawsuit, the court declared the

ENIAC patent invalid, on the grounds that all its essential elements were recorded and were accessible in Atanasoff's documents before the date of filing. As a consequence, large sums that would have been owed as licences did not become payable.

To-day it is a subject of intense debate among historians, whether Eckert and Mauchly had known Atanasoff's notes and whether they used this information in their work and in the patent. Both denied it, during the lawsuit as well as later in interviews with historians. But the possibility is not excluded that their memory had blurred their recollection of the real course of events. From a legal standpoint this question is irrelevant - the patent was invalid in any case. But it is of considerable historical interest. The full truth will probably never come to light. Eckert and Mauchly, and not Atanasoff, are regarded as the principal computer pioneers. The sequence of events confirms a fundamental rule of life: In technology, recognition does not go to the person who first had the idea, but to the one who convinced the world and thus paved the way to realization.

#### **Economic and Socio-Cultural Impact**

Digital electronics is not merely a technological development - it has had, and it is still having, major economic and socio-cultural impact. The proliferation of professions and occupational disciplines relating to computers to-day is a fact known to everyone. In the early days it was predicted that the computer would create unemployment. A glance at the job advertising sections of our newspapers is a piece of evidence, if one was needed, that the contrary has occured: There is a severe shortage of computer-related manpower.

But there is also a global aspect to digital electronics. In the USA an influential group of experts recently sent a memorandum to President Bush with some alarming observations: If the US semiconductor industry would loose ground in the near future at the same rate as in the past, the nation would have to pay a price measured in millions of lost jobs; technical leadership in key areas such as communication and computers would be threatend, with serious consequences not only for economic stability but also for national security.

A judgement of this kind is a vivid illustration of the validity of the statement "silicon is the new steel": Steel and its use in the construction of buildings, railroads and machines was the motor of economic growth 100 years ago. To-day silicon, the raw material of electronics, has assumed a comparable role. I can think of no example which better illustrates the new dimensions opened up by digital electronics that the fly-by of the space probe Voyager by the planet Neptune. This is perhaps the greatest achievement of mankind in any of its technological endeavours. Voyager was sent on its journey 12 years ago. Accordingly, it is equipped with systems that are far behind to-day's state of the art. But the ground installations were continually updated, and the latest means and methods were included. Digital electronics has played a key role in this process.

#### Limits to Growth

The year 1969 has been a historical year in more than one respect. It marks the moon landing - the first time humans have set their foot on a celestical body outside the earth. It also is the year when the Club of Rome assigned a group at MIT with the report which later resulted in the famous book "The Limits to Growth" by Dennis Meadows. The report showed that Planet Earth is a closed system, that accordingly its ability to supply resources and to accept wastes is limited, and that we are currently much closer to these limits than we had believed. It expressed something that to-day is quite common knowledge. But at the time it came as a real shock to most people.

The book does not address itself to electronics and information processing. It does address itself, among other subjects, to energy. As we all know, progress in energy processes and energy systems has always been slow. The reason is that these systems operate at a point which is very close to the limits set by the basic laws. As an extreme example of how close one can get to the ultimate limits, I would like to mention electric generators. The efficiency of a large generator to-day is 99 %. At this point, further progress is almost impossible, and also the rewards are rapidly diminishing, because, as you all know, 100 % can never be attained, let alone exceeded.

How about the limits of growth in digital electronics? Take the well-known curve of fig. I which shows the number of components on an IC. It has grown about 70 % per year for decades. Every growth of course has its limits. But the limits set by the fundamental laws are still so far away that as yet they do not seem to impose any restrictions.

How about the resources? Let us look at silicon, the basic material of our semiconductors. It is well known that 28 % of the earth's crust consists of silicon. A shortage thus is not in sight, even if we make many more circuits than we now do! In the light of all the alarming reports about shrinking resources it is good to remember that not only silicon, but some other raw materials also are still abundantly available.

Here are the elements that make up the earth's crust:

growth in electronics are very very far away.

| Oxygen     | 47  | % |
|------------|-----|---|
| Iron       | 5   | % |
| Aluminum   | 8   | % |
| Silicon    | 28  | % |
|            | 88  | % |
| All others | 12  | % |
| Total      | 100 | % |

88 % of the earth's crust is made up by the four elements oxygen, iron, aluminum, and silicon. You see how well Nature has provided for modern society: Oxygen for people to breathe, iron for the automobiles, aluminum for the airplanes, and silicon for the transistors! From this, it is obvious that as far as resources are concerned the limits to

#### **Other Limits**

But growth is not only limited by resources; other limits are set by the basic laws of nature, they cannot be surpassed. In energy conversion and energy handling, to-day's machines are quite close to these limits. Progress will remain slow. Where are the limits in digital electronics?

Fig. 1 shows the number of components on one chip of an integrated circuit. It has grown exponentially even since IC's were introduced, at the enormous rate of 70 % per year. This is more than a hundredfold in a decade. I know of no other parameter in technology that has grown at such a rate over more than three decades. Where are the limits? Every exponential growth must come to a stop, and when the limits make themselves felt, the growth starts to taper off. This tapering has repeatedly been predicted in the 80's. So far there is no sign of it. The limits posed by the wave length of visible light which is needed in wafer fabrication and which at earlier times has been thought to be the eventual barrier is no more an obstacle: Light waves are being replaced by X-rays. The ultimate limits come from elsewhere: No structure can be finer than the diameter of the

constituent atoms. And quantum mechanics, together with the basic laws in Shannon's theory of communications, tell us how many electrons or photons are needed to represent and transmit one bit of information with a specified error rate. But the barriers set by these laws leave a space of at least a factor 100 compared with to-day's technology.

The limits to growth are not an issue.

Or are they? They are. But they come from an entirely different corner than has been anticipated. They reside in software not in hardware.

Let me look back to the early 50's, in other words, to a period when vacuum tubes and not transistors were the active elements in digital electronics. A considerable number of computer projects were under way, most of them had up to 2 000 or at most 4 000 tubes. All of them, without exception, were plagued by major hardware problems. Not only tubes but also condensors and even resistors had a mean time between failure which seemed quite good if regarded as individual elements but which became simply unacceptable when thousands of them were combined. Hardware problems were the main issue throughout. How about software problems? They did not exist. There was more then enough software (good software, to be sure) at hand waiting to be handled by the unreliable hardware. The idea that some time enough hardware could be available to use all the software sounded like a dream!

We all know that to-day's landscape is different. Despite an enormous increase in software producing manpower, the production of software is the great bottleneck. Moreover, it seems that the limits to growth are not only in sight, they are right in front of us. Experts say that the number of lines of code that can be generated in one man-year in producing a large and complex software system lies somewhere between 1 000 and 3 000. And yet, despite the considerable progress that has been made in programming environment and that is still expected in the future, the experts predict that this figure will not be significantly increased before the end of the century. *The limits of growth are here*. We have to face the fact that there are limits to what can be done with digital electronics. There source lies not in the capabilities of the electronic circuits but in the capabilities of the humans to intelligently configure components with ever-increasing performance into large systems and to program them.

#### New Science?

In conclusion, let me leave software and return to hardware. What will the future bring us? Will silicon remain the raw material of electronics? How about biotechnology? When will the biological computer replace our silicon chips? This is a difficult question, and a loaded one. Biotechnology as a science is making rapid progress, and new discoveries come at a fast place. Events are happening fast particularly at the interface between biological manufacturing processes, for example in the food or pharmaceutical domains, and the computer-based process control. The enzyme sensor is becoming a reality. But speculations go further. There is a vision that circuits could be grown through biological processes on the basis of information coded in the same way as the genetic code in animals and humans. I tentatively call such a circuit the "biodigital chip". Signals would be no more carried by moving electrons, but by carbon-based molecules called neurotransmitters. When will this become a reality? There is always a risk in predicting that something will not occur. The idea of the biodigital chip does not violate fundamental principles. After all, nature has implemented it in countless different forms; many of them are right in this room to-day, as parts of our brains. It is obvious that biology has the potential for systems that are vastly more complex than to-day's silicon systems, and nature has exploited this potential to the full. But nature does not easily lend itself to be copied. The difficulties in reproducibility, in interfacing, and in long-term stability would be formidable. I do not believe that the biodigital chip in the true sense of the word will become commercial within the next few decades.

But biotechnology is not the only source of innovation that should be regarded. Perhaps something unexpected could come from basically new science. Such events have happened in the past: The transistor is an example, also the laser. What should be expected from basically new science? Is basically new science needed? One thing can be said for sure: The present scientific structure still leaves room for much technological progress. The potential in silicon integrated circuit technology is still enormous - the present scientific structure is not overloaded, it is not saturated, so to speak. There is no pressing *need* for new science. But it would be careless to say there is going to be no *use* when it comes. Certainly the von Neumann machine was not predicted. But when it arrived it was immediately put to work. Personally, I hope for new basic science; but nobody can say where and when it will occur. Great scientific discoveries are unexpected. They come from truly gifted individuals, and they are a wonderful, but also a rare gift of nature. New science will never come from an Artificial Intelligence machine

or from a 5th Generation (or any other Generation) computer. The creative process on the highest level is inherently human, and I am quite certain that it will so remain.



Fig. 1 Number of components on a chip in an integrated circuit

# PARALLEL COMPUTING : AN INDIAN PERSPECTIVE

Vijay P. Bhatkar

Centre for Development of Advanced Computing Poona University Campus, Pune, India email: uunet!shakti!parcom!bhatkar!

#### ABSTRACT

The launch of the Centre for Development of Advanced Computing (C-DAC) in August '88, marked the begining of high-performance computing in India. The charter of C-DAC is to design, develop and bring into commercial production, in a time-bound manner, state-of-theart and internationally competitive parallel computers. Several compute-intensive applications in science and engineering are to be demonstrated on the C-DAC target machines. The third component of the project is spawning of advanced research. The paper summarises the progress realised in the C-DAC project and also presents briefly an overview of other notable research projects in parallel computing that are currently underway in various academic and research institutions, including those in the Indian computer industry. A future perspective in advanced computing, that has been proposed for implementation during 1990-95 period, is also sketched.

# I. Background

India has taken a major national initiative in parallel computing through the launch of the Centre for Development of Advanced Computing (C-DAC) as a 3-year time-bound project mission with an outlay of about Rs 300 million (US\$ 20 million). The mission goal is to design, develop, and bring into commercial production, internationally competitive high-performance parallel computers with a peak computing power exceeding 1,000 Mflops, and demonstrate applications of national importance on the target machines. The project has three main components, namely **Technology, Applications** and **Research**, which represent respectively "height", "width", and , "depth" of the project implementation.

Launching of C-DAC (and other R&D projects in parallel computing) marks the beginning of a new phase in development of Science and Technology in India. Several national R&D projects and industry operations would directly benefit

through the C-DAC project implementation. In recent years, India has launched many strategic projects, such as, space missions, remote sensing, oil exploration, enhanced oil recovery, medium range weather forecasting, biotechnology, semiconductor manufacture, superconductivity research, advanced materials' development, high-energy accelerators, giant metre-wave radio-astronomy, VLSI design and artificial intelligence. These research projects, spanning several leading institutions and universities in the country, could be significantly enhanced and accelerated, leveraging upon simulation through high-performance computing.

It is felt that development of supercomputers will have a direct impact on indigenous development capability in hardware, system software, applications and packaging. Another motivation for launching the project is to harness the export potential. High performance computing based on parallel processing is a fast growing industry all over the world, and India can be a major player in this advanced technology, as it can leverage on its relatively low-cost sustainable intellectual resources in hardware design and in creating programming environments and parallelising a large body of application software, besides making contributions in development of *algorithms, modelling and simulation*.

# II. C-DAC Parallel Computer

#### Architecture

Right from the begining, it was clear that the destiny of supercomputing in India would be carved through the parallel processing route based on advanced, but, standard and commercially available microprocessors. The theme of the project is to bring forth affordable supercomputing to the Indian scientific community and also to design superior parallel computer products through better hardware design, enhanced parallel programming environments and a rich collection of application specific software, including, some unique parallel processing tools, conforming, as far as possible, to industry standards for compatibility and portability.

Further, several factors had to be considered before finalising the architecture of the target machine. These included the use of industry standard hosts, providing incremental computing power, availability of critical components, making available development platforms for application development from the very beginning, and predictable and reasonable performance for a range of scientific and engineering applications.

Keeping in view the above factors, the design of the target machine, based on CSP paradigm and incorporating 256/512 processors has been completed. The architecture can provide for seamless, and scalable peak performance over 1 Gflops and 3000 MIPS. Outstanding features of the machine are industry

standard hosts as front-ends, seamless back-end computing power from 1Mflops to over 1000 Mflops, user configurable topology, partitionability for multi-user support, high-capacity high-bandwidth concurrent file system, orthogonal supervisory and control bus, unique engineering for easy productionability and maintenability, forced air cooling and advanced parallel programming environment for applications development.

#### Hardware

The target machine has been named *Model 90*. The system is based on a modular, scalable architecture, organised into clusters, switching networks and a filing system. A cluster can have upto 64 processing nodes with a built-in switch network. Multiple clusters with additional switching networks are used to realise configurations of Model 90 with varying degrees of performance. A concurrent filing system can be added to complete these configurations.

The system is hosted on a wide variety of machines: IBM PC AT/XT, DEC MicroVAX II, SUN Workstations, ECIL Medha (Cyber 170/830/930) and other popular VME or Multibus II machines with UNIX / XENIX environments.

The Model 90 series machines will be fully reconfigurable. Users can define, in software, the actual connections between processing nodes thereby optimising the interconnection network to suit the problem to be solved. Static configuration is implicit. However to provide maximum flexibility, dynamic reconfiguration, dedicated hardware supported by software is provided.

The system will allow multiple users to use the processing nodes with good isolation between users. The switching network has been specially designed to allow allocation of processing nodes to multiple users and create any topology having upto four-degree connectivity for each user.

The high computational power of the system is complimented with a filing system with a very high I/O bandwidth and large storage capacity. The Concurrent Filing System (CFS) would be a specially designed mass storage system where a large number of disks are pooled to get the required capacity and special techniques employed to achieve the high bandwidth requirement. The CFS would also be scalable to proportionately match the computing power. In addition to the CFS, the system would provide I/O support through dedicated I/O nodes for Graphics, Communication or Application Specific I/Os.

User interface to system will be through the *Advanced Parallel Program development Environment (APEX)*. APEX supports FORTRAN, C and OCCAM2 besides a host of utilities.

The hardware development of 64-Node cluster is complete. The entire system has been tested along with basic system software and is being used by the Applications Development Group for porting and parallelization of end-user codes.

In order to bring parallel processing power to personal computers and standard workstations, it was decided to develop several add-in boards with a view to catalyse applications. Towards this objective, several PC add-in accelerators with scalable memory interface have been designed, fabricated, and undergone field-testing and test-marketing.

These boards operate on the same paradigm as Model 90 and user can easily migrate to larger machines for scale-up, speed-up and response time. These board level products are supported with software environments similar to the Model 90 system.

# System Software

The System Software Development program is aimed at creating superior parallel programming environments and tools for C-DAC target machines.

At the very beginning, problems were faced in getting hardware for studying, learning and catalysing parallel processing applications. In order to circumvent this problem, development of a product called 'CODE' was launched. CODE, C-DAC OCCAM Development Environment, is an interactive tool for migration to parallel programming based on the CSP paradigm, through OCCAM as a language. CODE is a fully integrated environment with an editor, translator, interpreter and symbolic debugger, which runs on a PC without any extra hardware. CODE, a 40,000 lines of C code, has already been released for domestic and international market.

In order to provide a multi-user support for PC AT's endowed with transputer hardware, the transputer development system of INMOS under DOS environment was ported to run under SCO XENIX environment. TDS under XENIX is an OCCAM 2 development system with a fully integrated editor that runs on transputer add-in card. The work was further extended to provide this facility on PC UNIX, VAX VMS, and SUN UNIX platforms. A graphic server was developed to extend the graphics facilities of the host to C-DAC's parallel processing platforms. Programming in professional graphics language (PGL) and GKS-3D is supported. These products are currently under field-testing and in test-marketing phase and will be released for distribution very soon.

C-DAC has already made available INMOS software layer on its parallel computer with some additional extensions. However, some major limitations of this layer have been identified such as : support for only near neighbour communication

making it difficult to use for certain applications; poor disk I / O support in terms of very low bandwidth and only root processor can perform disk I/O; weak debugging support and absence of profiler support. Keeping these limitations in view, development of Advanced Parallel programming Environments (APEX) was launched for the C-DAC target computer.

APEX is a software environment for Message Passing, MIMD machines designed for use as back-end compute-engines attached to workstation hosts. APEX runs partly on the back-end and partly on the host, and together with the host software, provides a seamless and easy-to-use parallel programming environment.

APEX supports all phases of application development. It includes an algorithm prototyping tool for performance prediction, compilers for ANSI C and ANSI + FORTRAN, placement and configuration tools, run-time performance profiler, advanced debuggers, and a comprehensive run-time kernel library support for interprocess communications, concurrent file system and several other utilities.

APEX is designed to be a portable environment, though currently, it is being targetted for a transputer network being developed by C-DAC. APEX aims at providing a standard *Application Programming Interface (API)* Environment across a class of parallel machines. APEX will need some amount of recoding for retargetting to other machines.

The Algorithm Prototyping Tool with a powerful graphics front-end allows rapid prototyping of an algorithm to study the efficiency, speedups and load balancing performance. The parallel FORTRAN and C compilers of APEX meet the ANSI specification, with several parallel extension in the form of libraries. The advanced debugger support is currently provided in the form of an *Interactive Debugger*. Enhancement for providing trace-based debugging is also planned, since non-determinism related program bugs are better supported in this mode.

The *Profiling Tool* will provide a graphical representation of the processor load, processor utilisation, and a communication event trace, and, help the user in identifying performance bottlenecks and fine-tuning the program partitioning to avoid hotspots. The *Static Load Balancing Tool* is being designed to suggest an optimum process placement among processors to evenly distribute the load as well as to minimize communication delays, and also to suggest the interconnection topology among processors if this is not provided by the user. The Kernel library consists of several sub-libraries needed to support communication, local bus and host interfaces, file system I/O and graphics. The File System Interface provides the access to the Host File System in the conventional fashion. In addition, the *Concurrent File System* provides a backend high bandwidth secondary storage support, with Unix-like file operations.

As regards the implementation of APEX, the development of switch manager and configurer utilities is complete and toolset has been ported on Model 90. The development of other components of APEX is going on as per schedule.

# **III. Applications Development**

As stated in the project mission, application development is the major driving force of the C-DAC project. It is also essential to create a large pool of user institutions and assist them with the techniques of parallel programming. To this effect, interactions with prime user agencies were vigourously pursued from day one. In collaboration with the user agencies, compute intensive segments of applications were re-coded and test-run on C-DAC's PC based parallel processing platforms, parallel processing workstations, and 64-node parallel computer. Substantial speed-ups were achieved, which clearly demonstrated application feasibility of the target machine in a range of science and engineering problems.

Major accomplishments on development of application-specific software in various areas are as follows:

# Image Processing / Remote Sensing

Image processing is one of the foremost applications which has consumed the Centre's attention right from the begining, not only due to its growing applications and markets but also due to immediate user requirements, namely, Space Application Centre(SAC), National Remote Sensing Agency (NRSA), Survey of India (SOI), National Crime Records bureau (NCRB), Vikram Sarabhai Space Centre (VSSC) and the industry.

Parallel processing is vital for handling larger data volumes, analysing complex images, supporting compute intensive algorithms, and processing in real time. To meet general purpose requirements, a Parallel Image Processing System (PIPS) is being developed, as an intermediate product. Towards this objective, already some algorithms covering image enhancement, analysis, compression and restoration have been parallelised and implemented on a 32 node machine. In most cases near linear speedups have been obtained.

Anticipating the huge computational requirement to process the remotely - sensed data from European ERS - 1, Indian airborne SAR, IRSI - C/D in near future at ISRO / DOS, it was decided to launch a joint project, in June '89, for the development of application software, with the objective to replace the attached array processors with C-DAC's parallel computer. Efforts are on to provide the

current version of ISROVision around C-DAC's 4-node parallel processing platform which will give 2.5 times the performance of VAX 780. Recently a major project has been launched for image processing based on Synthetic Apperture Radar (SAR) data acquisition.

#### **Computational Fluid Dynamics**

Modelling and Simulation of CFD including turbulence, separation, vortex dynamics and hypersonic flows is considered critical for future space missions and design of new aircrafts. In this context it will significantly aid design of PSLV, re-entry vehicles, development programmes. C-DAC has taken CFD as a large scale application for its parallel computer. As a first step the unsteady, compressible, Navier-Stokes equations were solved using the explicit finite difference method. For the mesh system of 2000 points, a transonic flow past a blunt nosed cone was computed on C-DAC's parallel computer using 15 nodes and a near linear speedup was obtained. An order of magnitude performance improvement was obtained over the Cyber 170 / 730 mainframe computer, on a 15 node system. Simultaneously, a grid generation software was developed, for solving this problem Solution of a large size problem on the 64 node parallel computer is underway.

#### Finite Element Methods

One of the large scale problems undertaken is the parallelisation of a general purpose FEM package called FEAST, developed by VSSC. This package, which is currently ported on Cyber 170/730 and VAX computers, has a 16 element library and 10 analysis capabilities including static and dynamic stability, optimisation, visco-elastic, dynamic response and fracture analysis. FEAST is being currently parallelised and ported on C-DAC's parallel computer.

C-DAC has already ported SAP-4 on its parallel platform. Efforts are also underway to develop good pre and post processors for effective utilisation of computed data. To augment its capabilities, collaboration has begun with IIT Bombay and IIT Delhi in bringing out parallel FEM solutions.

#### **Oil Reservoir Modelling**

C-DAC and Institute of Reservoir Studies (IRS) have started working together on parallelising Black Oil Reservoir Simulator (BOS) packages. The first objective of this joint project is to demonstrate 10 times speedup of the overall performance of BOS packages on the C-DAC machine as compared to VAX 11/780 system. While achieving this, the overall structure and maintainability of the package will be retained and scalability for further speedup on the 64 node machine will be incorporated. Algorithmic leverage has been employed to obtain desired speedups on the parallel machine. Parallelised ORM package will be used by Oil and

Natural Gas Commission (ONGC) for Enhanced Oil Recovery (EOR) from available fields.

#### Selsmic Data Processing

Advanced processing of seismic data is essential for harnessing India's Hydrocarbon resources. C-DAC parallel computer could provide the needed computational power and data management capabilities required for SDP involving full 3D pre-stack migration techniques for obtaining high quality 3D images of the target zone, in realistic processing times. A detailed feasibility report for parallelising and porting seismic software system has been put up for consideration of the Geodata Processing and Iterpretation Centre (GEOPIC) of ONGC. C-DAC's three phase proposal covers development of Basic, Extended and Advanced Seismic Software System.

# Molecular Modelling

Some important molecular modelling packages have been identified for parallelising and porting. Work has already been started in collaboration with Biotechnology Department of the University of Poona, on parallelisation and porting of AMBER molecular package, which is a modelling and simulation package for macromolecular system and its usual application in geometery refinement, using molecular mechanics.

# Signal Processing

C-DAC has undertaken, in association with IIT Delhi, development of a general purpose, interactive signal processing package on its parallel computer, performing a wide variety of DSP functions including signal generation and simulation, design of filters, spectral estimation, and array processing. Some spectral estimation algorithms have already been parallelised and run on C-DAC's parallel processing platforms. A general framework with proper user interface has been evolved for developing a comprehensive parallelised signal processing code.

# **Circuit Simulation**

C-DAC has taken up SPICE for porting on its parallel machine. An implementation on a single node machine has already been done and efforts are underway for a multi-transputer implementation of SPICE. Further enhancements to SPICE, like incorporating more device models are under consideration. Implementation of SPICE on C-DAC's parallel computer will pave way for porting VLSI design package such as IDEAS of Semiconductor Complex Limited (SCL), VINYAS of Indian Telephone Industries (ITI), and other industry standard packages.

#### Speech Recognition

C-DAC's project in this area is to develop parallel implementation of the auditory model developed by Tata Institute of Fundamental Research (TIFR), so that it could run on its parallel machine achieving significant computational speed-up over the current TIFR implementation on VAX. The auditory model was parallelised using algorithmic parallelisation. Test runs on four node parallel platform have shown near linear speed-ups. Future plans include providing the parallel computer as a platform for neural network simulation for speech processing, that is being done at TIFR under the Knowledge Based Computer Systems (KBCS) project. Already neural network simulation models have been developed for KBCS projects at C-DAC.

#### **Computational Physics**

C-DAC's current projects in computational physics at the Department of Physics of Poona University relate to two specific problems in condensed matter physics, namely, electronic structure of solids and molecular dynamics. A combination of Gaussian Orbitals method has been used to calculate the electronic structure. The present code comprising of 15,000 lines of FORTRAN source code was parallelised and run on a 4 node configuration. Future plans are to develop pseudo-potential method with a Gaussian basis in order to do calculations for many atoms per unit cell and to incorporate the molecular dynamics approach to simulate the evolution of structure in solids. A molecular dynamics algorithm has been implemented on 1, 4 and 32 nodes, using different topologies where ring topology was found to be the most convenient. Typical results are very encouraging.

#### Materials Science

The scope of the project in materials science, supported by C-DAC at the Department of Physics of Poona University, relates to crystal growth from vapour phase. The leading material system of III - V semiconductors are being investigated using relaxation Monte-Carlo and Molecular Dynamics methods. In first phase, existing FORTRAN codes for the calculation of energetics and atomic displacements were converted into sequential OCCAM codes. In the second phase, sequential codes have been parallelised to run on a 32 node machine which has yielded a significant speed-up. The molecular dynamics code has also been parallelised using algorithmic parallelisation. Geometric parallelisation is being tried to improve efficiency.

# **Computational Chemistry**

A noteworthy contribution has been made in the field of computational chemistry

in C-DAC's project at the Department of Chemistry of Poona University. This has been possible via the use of rigorous bounds to the electron repulsion and molecular potential Integrals (MEP) in Gaussian-band molecular calculations. MEPs have been widely used in Chemistry for the study of toxic and explosive materials as well as for drug receptor interactions. A parallel MEP code in OCCAM and the corresponding visualisation have been developed with the ingenous use of these bounds. Partial parallelisation of MICROMOL has been implemented. Further, project on developing a one-and-two electron integral packages (IND-MOL) has made significant progress.

#### **Computational Mathematics**

Very little software is currently available, internationally, for the parallel solution of problems in computational mathematics, particularly problems involving differential equations. ODE 2 PAR developed by C-DAC is a user-friendly package, with visualisation software, for the parallel solution of system of ordinary differential equation. It is capable of solving a wide range of differential equations, over a range of local tolerances.

This package has been tested with the satellite transient thermal analysis problems for the thermal design of INSAT 2. With ODE 2 PAR running on a 4 node PC add-in accelerator, the transient analysis over one hour, for a tolerance of 1.0E-6 may be done in about 1/6 th of the time taken by the mainframe UNIVAC 1100/70. The same package can be used in a variety of mathematical problems that involve ordinary differential equations.

C-DAC has accomplished the parallelisation of multigrid methods for solving problems involving partial differential equations using a combination of domain decomposition and parallelised smoothing together with simulation software for fine-grained parallelisation. Currently, effort is underway to fit a multigrid solver to FEM software. Using these methods, parallel solvers will be developed for elliptic and hyperbolic partial differential equations.

#### Scientific Visualisation

C-DAC's parallel computer will be supported with scientific visualisation software for effective analysis of computed data. A layered visualisation system is being designed to support database/datasource management, geometric modelling, surface rendering, and display. A scheme for storage and retrieval of geometric and image database is under development. An image synthesis system is being developed to render surfaces and to combine different images into a single image. Display layer includes development of both hardware and software. A library of rasterisation of algorithms in OCCAM has been developed. Parallelisation of viewing pipeline and its implementation to develop a geometric engine using transputer is near completion. A single transputer version of GKS-3D is over and available for use. A scientific visualisation software configured around 386 PC is under development.

# IV. Other Projects in Parallel Computing

Complementing C-DAC's project, there are other projects in parallel computing which more or less began concurrently with it. Notably amongst them are, Parallel Processing System (PPS) by Centre for Development of Telematics (C-DOT), Bangalore, Flosolver by National Aeronautical Laboratory (NAL), Bangalore, PACE by Advanced Numerical Research and Analysis Group (ANURAG), Hyderabad, and MULTIMICRO by Indian Institute Science (IISc), Bangalore, Array Processor by CMC Ltd., MACH and Sparse Matrix Computation Architecture at Indian Institute Technology (IIT), Bornbay, and Static Dataflow Multiprocessor, Broadcast Cube Multiprocessor and Dataflow Multiprocessor under the KBCS projects at IISc, Bangalore. Several leading academic institutions have now introduced educational research programmes in parallel processing architectures, programming environment, algorithms and some leading computer industries have also announced shared memory multiprocessors. Many of the abovementioned projects are funded by C-DAC. C-DAC has also farmed out a number of system software projects to several computer and software industries. Some notable developments are follows:

# C-DOT

In February 1988, a commercial contract was signed between DST and C-DOT, under which C-DOT will design and build a 640 MFLOPS, 1000 MIPS (peak) parallel computer. C-DOT set a target of 200 MFLOPS sustained performance for weather forecasting and image processing for radio astronomy applications.

C-DOT Parallel Processing System (PPS) is based on a novel SAMD - Single Algorithm Multiple Data, architecutre. Up to 256 Processing Elements and 256 banks of Multi-Dimensional Access Memory (MDAM) (global memory) are connected through a 512 x 512 logical cross-bar, time-space-time (TST) switch. Each processing element has its own local memory and no computation is performed on MDA memory directly. There are two copies of MDA memory. While read operation for computation is performed on data in one copy of the MDAM, the results are written into the other. Each MDAM copy is 16 megawords (64-bits). The switch is implemented as 256x256 switch without any blocking, however it can also operate as 512x512 switch with 'essentially no blocking' for linear algorithms which warrants regular structure. The switch provides for PE- MDAM bank connection as well as PE-PE connection. The switch and MADM is controlled by a UNIX-based Main Controller which is the overall master of the system.

The software is split into three levels: system software, system library routines, and application software. The system library routines are modules called by the user programmes to manipulate the switch and MDAM to suit the algorithm that is being executed. These routines are used as preamble and postambie to the actual number crunching algorithm. The application software itself is split into a number of modules e.g., FFT, matrix inversion, correlation, etc. These routines are then called by a main programme.

A 16-node, 25 MFLOPS, system was demonstrated in August 1989. The processing elements were implemented around Transputer T800, 68010, and communication controller. Work on integration of a 128-node system is in advanced stages of completion. Application development is going on in collaboration with IIT Delhi and GMRT / TIFR.

# NAL

In February 1986, work on project Flosolver began at NAL to design, develop, fabricate, and use a suitable parallel-processing computer for applications to computational fluid dynamical and aero-dynamical problems. NAL is collaborating with a leading computer company WIPRO for hardware development of Flosolver.

Following the successful development of 4-processor prototype MK1, in October '87, by July 1988, the MK-2 was released based on 80386 processors. MK-2 delivers around 1.3 MFLOPS, roughly four times the performance of Univac 1170 mainframe at NAL. MK-2 supports four sets of memory modules connected by a standard IEEE P796 Bus, the CPU modules employ the Intel 80386 processor and a 80387 numeric data processor. The system comprises of two nodes each having four processors which constitute a node. In each node one of the processors acts as the host using Intel's iRMX real-time operating system. The inter-processor communication and synchronisation are handled by the shared memory on the system bus. Inter-node communication is achieved by using parallel ports on Intel memory cards.

Many CFD codes of practical importance have been made operational on these machines. Solution of 2-D compressible Euler and Reynolds - averaged Navier-Stokes equations for practical aerospace configurations has been successfully attempted. Some of the CFD problems solved on the Flosolver are : Laplace equations, Transonic small perturbation equations, Navier Stokes equations (2D), Euler equations (2D), Monsoon Model (2D) and Panel codes (3D). The overall efficiency of a single node Flosolver in running the above codes is in the range

of 96% to 98%.

At present three activities are going on at NAL: first, development of MK - X; second, applications development on 16-node transputer machine; and third, simulation of architectures on Univac mainframe. Under MK-X version, a 16-processor prototype is ready delivering about 5 to 8 MFLOPS and 95% efficiency on a range of CFD problems. Solution of 3-D compressible Euler equations for flows past realistic aircraft configurations, with grid points in millions is being attempted, using domain decomposition technique, with a view to keep execution time under an hour. As a long term goal it is proposed to build a 64 processor system based on 16 MK-2 nodes.

# ANURAG

The Advanced Numerical Research and Analysis Group (ANURAG), located at Hyderabad, a unit created within the DRDO to handle verious computer oriented projects, is executing the PACE parallel computer in collaboration with the Electronics Corporation of India Ltd. (ECIL). One of the computationally intensive problems that PACE is aimed to solve is the Computational Fluid Dynamics (CFD) calculation. The three year PACE project began in August 1988.

PACE is based on the Hypercube architecture. At the end of the three-year period, the project PACE plans to come out with a prototype of a 128 processing element (PE) system, a 7-cube system in the Binary Cube architecture terminology. A four node system, the Pilot Test Vehicle (PTV), to be put to stringent tests, was to be developed by first half of 1989. And this will be replicated to build the full 128-node parallel computer.

Initially, 68020 based single board computers, being developed by ECIL were to be used. Presently the coprocessor that will be used for the 4-node PTV will be based on WEITEK 1164/65 chip. However, for the final 128-node system an indigenous coprocessor chip set is to be used. The logic simulator for the chip design has already been worked out.

# llSc

A four year project sponsored by Department of Electronics, Government of India, was initiated in March 1986 in the Department of Electrical Engineering of IISc, to develop a parallel processor, together with system and application software for high-speed power system computations.

A 7 processor multiprocessor, called MultiMicro, with 2 MBytes of shared memory and specialized interfaces for targetting parallel programmes from a host processor has been developed. Each processing element is built around the 80286 / 80287 processor. The processors communicate via a high bandwidth parallel bus.

User programmes can be written in FORTRAN 77. Special system calls provided in the indigenously developed parallel operating system permit the user to write parallel FORTRAN programmes or translate existing sequential programmes very conveniently for execution on the MultiMicro. Tests have shown that for matrix computations (the most crucial of computations in power system analysis, control systems, optimization, etc.) the MultiMicro provides near linear speedups.

A number of new parallel algorithms have been developed for various power system studies like load flow, fault analysis, transient analysis, and stability. The system is also being used for weather modelling and crystallography.

A ten fold enhancement of speed, and support for a number of higher level languages is underway.

Under the UNDP assissted KBCS Project of the Department pf Electronics three different architectures based on 80386 boards have been designed at IISc, namely, 8 processor coarse grain static dataflow multi-processor, 8 processor 3-D broadcast cube multi-processor, and a 16 processor tree architecture system. All the architectures are based on message passing paradigms. Each system is designed as consisting of two sub-systems namely processor sub-system and interconnection sub-system. High level programming language environments have been provided on these machines for software development.

#### IIT Bombay

The design and development of a multiprocessor operating system is underway at Department of Computer Science, IIT Bombay since May 1988. The project is being sponsored by several industries and research organisations, including C-DAC. The project envisages development of a composite UNIX + MACH (Multiple Asynchronously Communicating Hosts) operating system.

Research on a parallel architecture optimized for Sparse Matrix Computation is being done in collaboration with AT&T Bell Laboratories. The architecture utilizes a novel interconnection scheme based on finite geometry for connecting partitioned, shared memory to processors and a compiler which seeks to reduce communication cost rather than arithmetic cost by rearranging the data flow graph representing the computation to be performed. The machine proposed here is meant to be used as a co-processor to a general purpose machine or host processor. Theoretical results on the proposed architecture are encouraging. A preliminary compiler has already been developed to evalute the efficiency of the proposed concept.

#### **Computer Industries**

Parallel processing is taking firm roots in the fast growing Indian Computer Industry. Some leading computer companies have already announced boardlevel and workstation products based on parallel processing technology. A notable development is from CMC who have completed the design of a 24 MFlops add-in array processor along with a vectorising Fortran compiler. CMC has already taken up prodductionising of their array-processor and have embarked on new enhancements. A major application that is pursued is finger-identification for India's Crime Records Bureau.

In 1988, HCL had already announced their shared memory multiprocessor called MAGNUM which supports upto six 68030 CPUs on a VME and memory bus. With extended AT&T UNIX operating system and a Fortran Compiler the machine has been released for national and international market. Similar shared memory system called Landmark II based on multiple 80386 processors and Multibus II was announced by WIPRO in 1989. WIPRO is also the recepient of Multimicro know-how from IISc and is assissting NAL for the latter's Flosolver hardware development. Several companies are working on i860 based multiprocessors and Godrej has recently announced their i860 based high-performance workstation. TEC R&D has developed several application specific multiprocessor embedded systems, mainly for industrial and defence simulators. Another development at TEC R&D is initiation of a joint development of a real-time i860 based parallel processing with a US company. TEC R&D, ESSEN and WIPRO have also come out with transputer-based board-level products.

Soon C-DAC will be transferring its technology for board-level parallel processing products, parallel processing workstations, and 64-node MODEL 90 machine to the Indian computer industry which will herald an era of paralle computing in India.

# V. Future

Based on the progress realised so far in the project, C-DAC has prepared an Agenda for second mission for implementation during the 8th Five year Plan period (1990-95). The Agenda for the Second Mission includes development and delivery of 20 Gigaflops distributed memory memory passing replicated scalar processor architecture supercomputer addressed not only to targe class of scientific engineering applications, but also to real-time, database, OLTP and symbolic processing aplications. Also proposed is development distributed memory memory message passing replicated vector processor machine with 20 Gigaflops

peak performance for scientific and engineering applications. A high-performance, shared-memory nultiprocessor machine is to be developed in collaboration with the computer industry.

These machines will be provided with an advanced parallel programming environment APEX II. It will be a UNIX based environment and would support multi-user program development in familiar workstation setting, while providing a set of state-of-the-art tools for developing and debugging concurrent applications. The software would include in addition to Node Operating System (NOS), popular languages FORTRAN and C with vectorising and parallelising compiler support, logic programming languages like Prolog, functional languages like LISP, MIRANDA, data-flow languages like STRAND, object oriented languages like C++, and concurrent debuggers, performance monitioring tools, simulators, scientific libraries, and backend graphics.

In the second mission, applications development program will be further amplified and vigorously persued on the second generation parallel computers. Extracting gigaflops sustained performance in realistic applications will be a major challenge. Parallelising and porting of industry standard application software worldwide will be taken up as a major activity. Simultaneously, scientific visualisation will be launched as a new initiative.

Advanced research programme will include building proof-of-concept systems of VLSI array processors, artificial neural networks and optical computing. Advanced research will be spawned in parallel computing paradigms, distributed operating systems, advanced compilers and program restructurers, applicative languages and parallel algorithms.

It is proposed to develop multimedia DVI workstation, and ASICs required for the hardware sub-systems of parallel computers. An upgradation of general technological infrastructure for the computer industry.

Simultaneously, a national programme called Advanced Computing Education and Research Initiative (AERI) has been prepared for implementation during the 8th Five year plan. The prime objective of the AERI is to bring about a major transformation in advanced education and research by installing in a select academic institutions 15 Gigaflops of computing power in the next four years, compared to about 100 Mflops presently.