T-106.5250 Distributed systems

Frankentent of exams from 2006.12 - 2008.12


1. (12p) Give a short answer to following questions (max. 100 words per 
question) [6 questions/exam]

a) What is an open distributed system?
    A distributed system whose key interfaces are openly published, enabling 
    implementing and expansion by third parties using heterogenous hardware and
    software.

b) Give a definition of distributed system.
    A system running on a variety of different hardware and software subsystems 
    without a common clock or memory, communicating only through message passing
    over communication channels between subsystems.

c) Why do we need middleware?
    So that software can utilize communication channels and other resources 
    without knowing specific information about the system or the netwrok it is
    connected to, or information related to the resource. Middleware also 
    enables combining of different system components, such as running same 
    software on different hardware or software platforms. Masks heterogenity of
    systems, by bringing interfaces to a higher level of abstraction.

d) How can we sign digital documents?
    Using cryptographic techniques such as public-key authentication. A digest 
    computed of the entire message (including headers) is encrypted with the 
    senders private key, to verify the identity of the sender the receiver can 
    decrypt the encrypted digest using the senders public-key and compare it to 
    the digest calculated form the received message. 

e) For what purposes is public key crypthography useful in distributed 
   systems?
    For authenticating client and servers in secure communications; signing of
    digital documents to ensure integrity; for encrypting sent messages to 
    ensure privacy.

f) What are logical clocks and why are they needed?
    Clocks that only count the relative order of events, not the actual time 
    passed between them. Used in distributed systems to determine in which order
    events have passed.

g) How do synchronous and asychronous distributed systems differ from each 
   other?
    Synchronous distributed systems set bounds on the process execution speeds, 
    message transmission delays and clock drift rates, while asynchronous 
    distributed systems do not.

h) Which two interaction models the course text book presents and how do
   they differ from each other?
    Synchronous and Asynchronous interaction models. The synchronous interaction
    model sets boundaries on  the process execution speeds, message transmission 
    delays and clock drift rates, while asynchronous distributed systems do not.

i) What are distributed transactions? Describe some essential problem that 
   distribution causes to transactions and how it can be solved.
    Distributed transactions are transaction that involve more that one server.
    A essential problem in distributed transactions is atomicity, that is that
    either all involved servers commit the transaction, or none does. This can
    be solved by issuing a vote by the involved servers where they decide on 
    wether to commit or abort, all servers then carry out the joint decision.
    This is called two-phase commit protocol. Here a vote on committing binds 
    the server to a promise to be able to commit even if crashes occur.

j) What are the ACID properties of transactions?
    Atomicity - a transaction must be all or nothing.
    Consistency - a transaction takes the system from one consistent state to
                  another.
    Isolation - all transactions execute without interference form other 
                transactions.
    Durability - after a transaction is successfull, all its effects are stored
                 in permanent storage.

k) What problems does the use of locking cause in transaction processing?
    The possibilty of deadlocks or starvation. Deadlock meaning that two or more
    processes are in a waiting state, waiting for locks accquired by other
    processes also in the same deadlock. Starvation meaning that one or more 
    processes never get a lock they are waiting for since some other process
    always gets it first.

l) What are volatile systems?
    Generally any system where components and communication channels commonly
    break or shut down. Usually mobile systems where the user moves out of 
    range or is forced to shut down.

m) In what circumstances can the Byzantine generals problem be solved?
    Can bee guaranteed to solve only if the system is synchronous, or known to 
    never have any failures, and the amount of faulty processes are less than a 
    third of all processes.

n) What are the pros and cons of using threads instead of multiple processes.
    Threads do not need to build up an executeion environment to run 
    concurrently, enabling multiple threads to start faster and access shared
    memory. Since threads run in the same execution environment possible context
    switches between threads are faster than between processes. On the other 
    hand threads within a process are not protected from each other and can 
    therefore more easily interfere with each other and run malicious code on 
    other threads.

o) What is peer-to-peer architecture and why is it useful?
    A 'serverless' architecture where all clients using the systems also 
    contribute to storing, distribution and processing of information. Useful 
    since the more users in the system, the more bandwidth and other resources 
    are available keeping the cost nearly constant.

p) What is navigation in DNS and in what main ways can it be implemented?
    Navigation is the process of locating naming data from two or more name
    servers. DNS uses iterateive navigation where the client asks name servers
    in order until the needed data is found. Multicast navigation can also be 
    used where the client multicasts the query to all name servers, and the only
    the server witht the required information responds to the query. 
    Additionally non-recursive server-controlled navigation can be used, where 
    the server contacted by the client performs iterative navigation. Or
    recursice server-controlled, where each server contacts only one server 
    creating a chain of connections unitl the required data is found.


2. Essays [3 essays/exam]

Multicast in distributed systems.
    - One sender, one message, multiple receivers
    - Only one copy of data over each communication link
    - Data multiplicated by routers
    - Clients subscribe to multicast messages
    - Reliability difficult to implement, but possible
    - Ordering of messages isn't either taken for granted

Peer-to-peer systems.
    - No servers, resources distributed to clients.
    - Exploits unused resources of client computers
    - Scales in constant resource need, since every new client contributes
    - Scales globally
    - Support costs independent of number of clients
    - Does not support highly mutable data
    - Does not guarantee persistance of data
    - Does not provide anonymity of users

Implementation of the request-reply protocol.
    - Server-Client communication
    - Client sends doOperation() to server, and waits for reply message
    - Server receives with getRequest()
    - After processing the request the server send sendReply()
    - Failures include: Timeouts, Duplicates.
    - Cacheing can speed up processing
    - Exchange protocols: Request, Request-Reply, Request-Reply-Acknowledge

Implementation of remote method invocation (RMI)
    - Method invocation between objects in different processes
    - Proxy: Makes RMI transparent to client, local object representation of 
             remote object, handles the modules below for remote object.
    - Communications module: send messages through lower communications channel
                             using request-reply protocol
    - Remote reference module: Stores mapping between local and remote 
                               references
    - Dispatcher: Sends incoming events to right method in the skeleton
    - Skeleton: Remote side object handling unmarshalling of incoming events
                and sending the reply back after method completed
    - Servant: Executes the actual action by calling the method

Naming and name services in distributed systems.
    - Names are userfriendly
    - Addresses are computerfriendly
    - Conversion between them needed, hence name services
    - Maps names to addresses, and sometimes back
    - Needs to handle huge amounts of mappings
    - Long lifetime: interfaces need to hold
    - High availability: Many system depend on NS
    - Fault isolation: Parts may fail, but service may not. Redundancy!
    - Tolerance of mistrust: Not everyone can be trusted
    - Name spaces
    - Navigation: Iterative, multicast, server-controlled [recursive]

Failures in distributed systems
    - Parts may fail, but should not take down the whole system
    - Types of failures:
        + Timing failures: Timeouts, connectoin brakedowns
        + Byzantine failures: Process responds in nonstandard way
        + Omission failures: Process or communication channel drops
    - Maskig failures, using redundancy or resending of messages
    - Failures can be detected and circumvented
    - Failure tolerance can be increased using redundancy

Clocks and time in distributed systems.
    - Clocks drift
    - Can be synchronized using NTP
    - NTP not accurate enough for ordering
    - Logical time can be used to order events
    - Logical time measures order, no relation to passed time
    - Logical clocks and vector clocks measure relative order of events by
      counting the emount of events occuring in processes, and updating 
      counts on interprocess events (messages)

The main classes of encryption algorithms and the use of cryptography in 
                                                            distributed systems.
    - Shared secret key (symmetric cryptography):
        + Both parties have the same key
        + Encryption and decruption uses same key
        + Faster than asymmetric
    
    - Public/private key pairs (asymmetric cryptography):
        + Sender has a private key and a public key
        + Private key is a secret, and public key is not
        + Private key is used for encrypting data
        + Public key is used for decrypting data
        + Slower than shared secret cryptography
    
    - Cobination:
        + Common in real-world applications
        + Public-key used to share a generated shared key
        + Generated shared key used for bulk of encryption
        + Exploits speed of symmetric cryptography
        + Exploits security and flexibilty of asymmetric cryptography
    
    - Use of cryptography:
        + Secrey and integrity: Making data unreadable to third parties
        + Authentication: Validating the identity of a client and server
        + Signatures: Validating the identity of a message sender