CIS 4307: First Lecture: Almost for Fun ..

One thing certain about your future is the extent of change you will see.

That is so because we live a long time! It will be surprising if you will work less than 50 years (check about life expectancy and Social Security costs). Examining the past, ask your parents (grand parents) about the changes that have taken place since they were young.
That is so because in the future change will be more rapid than in the past (measured by worldwide GDP, patents given, products brought to market, etc. - Ray Kurzweil believes that the rate of progress doubles every 10 years!).

The extent of change requires that we learn fast, be able to act and react rapidly. Frequently we need to put aside much of what we know and move to new domains, carrying over into the new domain only problem solving and knowledge structuring skills. Yet when we know something we need to know it expertly.

In the particular case of information technology (IT), how does change express itself? Let's start with a famous "experimental" law, Moore's Law stating that the density of transistors in semiconductors doubles every 18 months. In another form it says that computing power doubles every 18 months [or more probably every two years] (hence increases by a factor of 10 in 5 years). This law concerning computing power is complemented by experimental laws reflecting progress in communications technologies and storage:

Transmission capacity of fiber doubles each year. Also total internet traffic doubles (at least) each year: Gilder's Law (Metcalfe suggested it - though Metcalfe's Law is something else "A network's value grows proportionally to the square of the number of its users").
Density of magnetic disk storage doubles about each year. Also total magnetic storage capacity increases by a factor of 10 each 4 years ( Shugart's Law - imagine what this means: if today I have a 100GB drive, in 4 years I will have 1TB, and in 12 years I will be able to store all the NetFlix movies on my drive!).

As repeatedly remarked these laws are experimental, made possible (for a few more years) by the physical properties of materials and the technical, manufacturing, and economic efforts of humans. After about 2015 new science and technology will be required to maintain the current rates of progress in the physical infrastructure of information technology. [Actually Moore's Law is already being limited: chip speed is reaching a limit, 4..6 Ghz, due to power consumption and heat. Intel and AMD are moving to multi-core chips, whose use requires reprogramming for concurrency, and even that is not likely to be effective for too long, if the memory speed bottleneck continues.] Interesting developments are with the increased use of Graphic Processing Units in computations. An example is the CUDA, a general purpose parallel computing architecture developed by NVIDIA, which represents a proof that the effective use of multiprocessing systems is a current central problem.

These growth laws on computing, storage, and communications are reflected by price: it is estimated that the cost of processing power halves each year. Thus IT is becoming a commodity. [But the cost of semiconductor plants, the tools for making chips, is going up in accordance with Rock's Law: "The cost of semiconductor tools doubles every 4 years"] For a few years more the limiting factors on IT are less in its technological foundations and more in our ability to conceive new uses for it, our ability to program and maintain its infrastructure, our economic ability and political willingness to share its fruits. [Why am I stressing ideas on the evolution of technology? because the decisions we make often have consequences for years to come, thus it is helpful to have good guesses about future conditions - (I should have bought Microsoft, CISCO, .. stock in the '80s). And yes, the laws I have mentioned are experimental, uncertain - yet they improve our capability to understand our environment and better predict future evolutions.]

As the IT infrastructure improves rapidly, so does our ability to build applications to use it, though at a slower rate. The programming tools that are now available enable us to implement standard components and applications very rapidly. Protocols have been agreed for describing components, for supporting their life-cycle, for accessing and composing their behavior. We are getting better at assembling systems from "small" components, such Java classes, and from "large" components, such as XML processors, DBMSs, web servers. We know how to extend the functionality of running systems (DLLs, plugins). Frameworks like .NET and J2EE also facilitate reliability and security when we build and integrate components. Whole problem domains in the business and medical field are carefully analysed into components for some of which are provided generic solutions. Access to information has become very easy: we can pick up needed information by exploring the web, querying news groups and chat rooms.

As the IT infrastructure evolves at the rapid rate we have described, the complexity of the systems that use it has increased apace. Think of ATMs that safely give us money whether we are in our hometown, or in China, or in Spain. Think of Google that using hundreds of thousand of computers intelligently indexes the web, integrating implicit feedback from millions of users. Or think of Amazon, where customers are integrated as critics and evaluators of the merchandises being sold. Or think of CERN's Large Hadron Collider that processes 40 million events per second to recognise among them 100 significant events per second and record 1 megabyte per significant event, for a total of 10 petabytes per year - it will use a computing grid involving about 500 institutions. Google uses an infrastructure that uses over one million computers.
New technologies like Virtualisation (it is old but has become of general use only recently), and what goes by the name of Cloud Computing, or "Computing as a Utility", are making IT more easy to use and manage and often more secure.

A consequence of the improvement in software development tools, computing infrastructure, and information support mechanisms, is that a lot of development work has become easier and/or decomposable into parts that can be developed separately. Because the hardware is ever more powerful, often solutions need not be optimised. The expertise required for common development has been reduced. And pay has been reduced in correspondence. As in the case of mechanical/electrical manufacturing that even while becoming more productive it has moved abroad, also in the IT world outsourcing, often offshoring, facilitated by the IT infrastructure, has become extensive, especially for low level tasks.

At present enterprises incur substantial costs maintaining and upgrading their IT infrastructure. In the future, as much of IT becomes standardised and commoditised, it is likely that the number of positions in maintenance and service will be reduced, outsourced, paid less. Thus a very common current job destination for IT graduates is likely to be reduced or made less economically rewarding.

This description of the evolution of the IT market seems very pessimistic for IT professionals. Yet I firmly believe that the future of our profession remains promising. Certainly the demand and the rewards for technical wizards who can work at the technological frontier of our hardware/software world are large and increasing. The same can be said for people who can build sophisticated or innovative applications. Also, since computing is moving from mainframes, to personal computers, and now to cellphones, personal digital assistants, embedded systems, hardware constraints continue to be relevant, and with it the need for people with systems/programming cleverness. The arrival of ubiquitous computing, the secure adoption of RFID-like technology, ubiquity of networked sensors and effectors, the assimilation of multimedia in the digital world, the recognition and generation of speech, will open up, like the Internet did - and even more, a large number of potential applications. But the largest market is for people who will be able to integrate expertise in IT with good and agile understanding of problem domains and of business. For example in the pharmaceutical and financial industries there is need not just of developers, but of individuals who understand the processes specific to those industries and see how to use IT to make it more productive, to make better information available in more timely fashion, and to use it more effectively, that is, to add value to the business. There is need for individuals who can identify novel IT based processes for satisfying traditional needs (think for example of Customer Relations Management, of FedEx, of Amazon, of Google).

As an educator it is hard to know how best to prepare you for the future. CIS4307 is only one course within the CS program of study. It gives you a specific, limited, technical perspective on distributed systems. We hope that our program in its totality offers you many of the conceptual tools and experiences you will need to deal with the changes and complexities you will face in your long professional lives. But your personal efforts and contributions while you are in the program are fundamental for your future success: you cannot passively wait for us to give you all you need, you must actively look for experiences and interactions that will strenghten you.

As in the future you will have to face changed applications, technologies, environments, and the possibilities are too many and we have no cristal ball,this course will try to help you understand what will not change and remain essential: fundamental issues of concurrency, distributed systems, and networks. We will often look at general principles and issues through small examples. You will need to pay attention and learn details. In time I hope you will be able to remember and use the principles even when forgetting the details. Issues we will confront are:

Concurrency: two or more activities taking place at the same time. Since the order of operations may impact the results, what can we do to obtain only the desired outcomes. The issues change fundamentally when we move from systems where the concurrent entities share a memory, to the case where they can communicate only through messages, without knowledge of a shared state or a common clock. Concurrency is becoming ever more relevant as in new chips performance is improving not by increasing GHz (sequential performance) but creating multiple cores (concurrent performance). Yet writing concurrent programs at the level of threads and locks is hard and error prone. Alternatives are becoming available, with Transactional Memories and Google's MapReduce paradigm. At a much fine computation grain we have mentioned CUDA earlier.
Communication and Networks: how we progress from signals to data, and the infrastructure that has evolved to make possible usable, reliable, and plentiful communication facilities.
Reliability: Failures can take place in processing nodes, in storage, in communication. What can we do to obtain fault-tolerant system from unreliable components, and at what price. In limited domains we can find very good solutions. For example transactions when using data bases. Or for many internet applications, the Map-Reduce paradigm introduced by Google.
Security: How to maintain secrecy, prevent data tampering, authenticate agents. In general, how to achieve distributed systems where resources can be shared safely.
Performance Evaluation: What are the features of systems that are significant for determining their performance. How can we evaluate these features and relate them to cost.
Scalability: do the solutions and systems that we study remain effective as the number of their users, and the amount of resources they use, increases?

At a more applied level we need to understand:

How problems and their solutions become more complex as we move from isolated systems to distributed systems, and how infrastructure (middleware) and tools facilitate our task.
We must recognize that distributed solutions are decomposed carefully, often with replication and physical distribution, thus improving reliability, latency, and maintenability.
We must understand some common components (such as clients, servers, caches, proxies, firewalls) and architectures (such as client-server, n-tier, peer-to-peer, distributed objects) used in internet applications.

Our laboratories involve programming. Our programming will be using low level facilities: the C programming language and the Unix system call interface. Of consequence the problems we will be able to tackle are of limited scale since we will not be as productive as if using Java, Python, or Basic .NET.

Let me give just one example of how less productive is C than Java. Suppose that you need to write code that returns the concatenation of two strings. In Java it is easy. If s1 and s2 are strings, s1+s2 is their concatenation, without worrying about storage allocation or, later, about reclaiming the storage associated to the string. In C life is not easy. We need to allocate the appropriate storage (1+strlen(s1)+strlen(s2) bytes, without worrying about Unicode), copy s1 to it, then use strcat to append s2 to it. Usually allocation is in the heap (at times static allocation may be sufficient, seldom stack allocation will be appropriate) and later we will need to deallocate the storage to avoid memory leaks. Careful understanding is required and pitfalls abound, later revealed by puzzling segmentation faults. If we consider how much easier is networking in Java (Perl, Python, Basic .NET, ..) we wonder about the sanity of the choice of C: yet the fact that C (and at times machine language) is low level forces us to worry about details of how systems operate. This is the source of insights essential to an IT professional especially when not previously assimilated. However at least one assignment will be in Java, to focus more on global systems issues, and to introduce the reality that most future systems development will be in languages such as Java (or higher level than Java).

The same can be said about the inconvenience of the Unix system call interface relative to adopting a framework such as J2EE, for example, using Enterprise JavaBeans. Using JavaBeans (think of a bean as an object, an instance of a class, which can be accessed remotely, with additional service) we can (if we have the appropriate software) create an object that we can access securely (authenticating caller and callee and providing secrecy) and transactionally (operations are atomic) with almost no development effort. If instead we use Unix system calls (without remote procedure calls) we need to worry about a myriad of details, from how to serialize objects over a network, to how to persist state, to how to achieve transactionality (for example, 2-phase commit protocol), to how to use security mechanisms such as encryption and digests. Again the goal is to acquire insights on how systems work, how design alternatives are evaluated, not to learn to use rapidly evolving high-level development tools and frameworks. The downside of this approach is that in the process of programming and learning you become enmeshed in a number of details that are not frequently encountered in higher level development (they are hidden).

Other courses, your independent learning, experience in internships and in other activities will give you a broader, less detailed, more applied understanding of distributed systems and applications than you acquire in this course. In addition to formal courses you will have to seek out other experiences and knowledge, especially about how business/society operates and how IT can add value to it. The future is likely to reward individuals with high technical knowledge, and/or business knowledge, and even more creative individuals with agility in their knowledge, ability to apply their knowledge within organizations, with courage to take initiatives and make decisions, with strong, reliable characters.