Q1. (a) explain data-ink maximization, give an example of useful redundant data-ink [1.5 points] Data-ink maximization is a principle where the ink (pixels) representing data should be maximized (within reason), and the ink related to other visual components should be reduced. This principle is supported by the notion that the primary goal of a figure is to present data. An example of useful redundant data-ink is cyclical time series, such as timetables, where reading the figure becomes easier if part of the time series is duplicated. (b) explain chartjunk [1 points] Chartjunk refers to unnecessary visual elements in the figure. Examples of these, are 3d decorations, unnecessary images. It is almost always good idea to delete the chartjunk as this takes away attention from the presented data. (c) compute lie factors, redesign the figure [0.75+0.75+2 points] (i) LF = [(14^2 - 2^2) / 2^2] / [(466 - 10) / 10] = 1.05. The figure exaggerates slightly but this may due to ignoring the rounded corners / rounding errors. An acceptable answer is also to use the percentages, 50 and 1, instead of raw counts, 466 and 10. This gave us LF = [(14^2 - 2^2) / 2^2] / [(50 - 1) / 1] = 0.98. The difference is due to rounding. (ii) LF = [(14 - 2) / 2] / [(466 - 10) / 10] = 0.13. The figure understates the data variation significantly. (iii) Ideally, a simple bar chart is significantly better, allowing for easy comparison. Also, a stacked bar chart or a pie chart is acceptable. You should try to avoid presenting 1d-data with areas. Also, significantly less chartjunk. Q2. 2a) What are visual acuities? Briefly explain two acuities. [1 point] Visual acuities are measurements of our ability to see detail, indicating limits on the information densities we can perceive. A list of acuities and their meaning is provided in the lecture slides. (Lecture 4, slides 65-67) 2b) Compare and contrast the two receptors cells in the retina. What is the fovea and why is it important? [2 points] The two types of receptor cells are rods and cones. Rods detect black/white/grey colours but not much detail, function best in dim light, are located around the edges of the retina, and there are about 120 million in each eye. Cones detect fine detail and colours, function best in bright light, are densely packed in fovea (centre of retina), and there are about 5 million in each eye. (Lecture 5, slide 8) The fovea is a small area in the centre of the retina packed with cones, where vision is sharpest and where most details are perceived. (also Lecture 4, slide 64) 2c) Give one preattentive feature and one non-preattentive feature. Briefly explain why these features are preattentive or non-preattentive. [1 point] A preattentive feature is a feature that is preattentively processed before conscious attention. Such features pop out from their surrounding and attract our attention. If a feature does not pop out from the surrounding and does not attract our attention, then the feature is non-preattentive. Examples of preattentive and non-preattentive features are available in the lecture slides. (Lecture 6, slide 34-37) 2d) Briefly explain the Gestalt laws of (i) proximity, (ii) connectedness, and (iii) similarity, and the design principles each one conveys when visualizing information. Which of these grouping principles is most powerful? [2 points] Proximity states that objects that are close to each other are perceptually grouped together. Thus symbols and glyphs representing related information should be placed close to one another. Connectedness states that connected objects appear to be related in some way. Thus, objects should be linked using lines or ribbons of colour to show some relationships between them. Similarity states that similar objects appear to be grouped together. Thus, when designing example a grid layout of a data set, rows and/or columns should be coded using low-level visual channel properties, such as colour and texture. Connectedness is more powerful than proximity and similarity. (Lecture 6, slides 55-58) Q3. 3a) Explain the data abstraction level of Munzner’s model of visualization design. Mention three major dataset types. [1 point] The data abstraction level indicates what type of data needs to be visualized. Three major dataset types are: tables, networks and spatial datasets. (Lecture 7, slides 41, 44-51) 3b) Explain the task abstraction level of Munzner’s model of visualization design. What does the task {action, target} pair refers to? Choose a classic visualization, such as Nightingale’s diagram on the causes/mortality in the army or Snow’s map of cholera deaths, and briefly explain a task {action, target} pair that this visualization tried to achieve. [2 points] The task abstraction level indicates why the information needs to be visualized and what tasks the visualization should accomplish. The tasks can be described as a pair made up of an action and a target. Example, Nightingale’s diagrams aimed at comparing distributions, while Snow’s map produced (i.e., annotated, recorded and derived) distributions. (Lecture 7, slides 41, 52-60) 3c) Which visual variable can accurately communicate quantitative, ordinal and nominal data? Give an example of a visual variable that should be avoided when graphically encoding quantitative or ordinal data. [1 point] Position can accurately communicate quantitative, ordinal and nominal data. Shape is the least accurate in communicating quantitative and ordinal data. (Lecture 7, slide 67) 3d) Choose a visual idiom and indicate: (i) the marks and visual variables it uses; (ii) the data type and tasks it is most appropriate for. [2 points] The details mentioned in the question for various visual idioms are available in Lecture 7 slides 75-80. Q4. (a) List four aesthetic criteria that make a graph layout easier to understand. [1 point] for example, -minimize number of crossings -uniform flow for directed graphs -maximize smallest angle -minimize edge bends (b) List two positive aspects and two negative aspects of force-directed layout. [1 point] for example, +easy to implement +flexible (can incorporate other cost functions) -may get stuck in local minimum -may produce in edge crossings even for trees (c) Describe briefly the layered layout. [1.5 points] Edges are ordered such that the flow of edges is as uniform as possible. Within each layer order nodes are organized to minimize the edge crossings. (d) When linear layout/adjacency matrix/layered layout is better than force-directed layout [1.5 points] linear layout -> nodes have a linear order (time, location in text) adjacency matrix -> dense graph layered layout -> directed graphs with clear layered structure (e) Describe briefly focus+context and Fisheye distortion in the context of graph navigation. [1 point] focus+context is a notion where the goal is to focus on some specific of a graph while maintaining the surrounding context. Fisheye distortion is a focus+context technique, where one zooms to a part of the graph while the remaining graph remains unchanged. Q5. (a) Define PCA and explain how it is computed. [3 points] kth principal component is a projection Dv_k = y_k such that the variance of y_k is maximized and v_k is orthogonal with v_j for j < k. To compute v_k, construct covariance matrix C, and compute the eigenvector of the kth largest eivenvalue. (b) Describe briefly a scenario where PCA is not a suitable choice. [1 point] PCA is not suitable if data is a manifold with high curvature and we want close data points in plane to be close in the actual data. (c) Describe briefly the connection between Torgerson scaling (classic MDS) and PCA. [1 point] Torgerson scaling is essentially PCA using only distance matrix. (d) What is the difference between Sammon mapping and metric MDS? What is the difference between metric MDS and non-metric MDS? [1 point] Sammon mapping favors smaller errors for smaller distances at the expense of larger errors for larger distances. NMDS is only interested in comparing distances: ideal case d_actual(a, b) < d_actual(a, c) if and only if d_projected(a, b) < d_projected(a, c).