image/svg+xml SEIS 632: Analytics and Visualization g analytics analytics prediction\n(supervised) prediction (supervised) analytics->prediction\n(supervised) pattern discovery\n(unsupervised) pattern discovery (unsupervised) analytics->pattern discovery\n(unsupervised) reinforcement learning\n(semi-supervised) reinforcement learning (semi-supervised) analytics->reinforcement learning\n(semi-supervised) numeric\n(regression) numeric (regression) prediction\n(supervised)->numeric\n(regression) categorical\n(classification) categorical (classification) prediction\n(supervised)->categorical\n(classification) clustering clustering pattern discovery\n(unsupervised)->clustering reinforcement learning\n(semi-supervised) structured prediction prediction\n(supervised)->numeric\n(regression) -Importance of analytics-Big V's-Data units-DIKW pyramid-Analytics process-Missing values, data types, tabular data model-Analytics vs. Databases-Analytics types-Analytics examples-SAS EM-Reading:Fayyad, Piatetsky-Shapiro, and Smyth 1996 -K-Means algorithm-Distance metrics, Euclidean distance-Normalizations-Outliers-Sensitivity to initial seeds-Need to determine k-Elbow method- --Tabular data format with targets/labels column-Regression vs classification-Performance metrics: misclassification and average squared error-Data partitioning: training, validation/development, test sets-Optimizing complexity -Regression formula-Matrix formulation-Solving for w's-Linear regression vs logistic regression-Logistic function-Missing values-One-hot encoding-Input selection: forward, backward, and stepwise-Polynomial regression-Normalization-Log transformation -Decision trees, logistic regression-Root, internal, and leaf nodes-Splitting-Logworth, and other splitting metrics-Optimizing complexity, Pruning-Assessing decision trees -Clustering, density estimation dependency modeling, outlier and change detection, dimensionality reduction-Applications g visualization visualization design\nprinciples design principles visualization->design\nprinciples visualization\nprocess visualization process visualization->visualization\nprocess visual\nvariables visual variables visualization->visual\nvariables high\ndimensional\ndata high dimensional data visualization->high\ndimensional\ndata color color visualization->color maps maps visualization->maps statistical\ngraphs statistical graphs visualization->statistical\ngraphs dimensionality\nreduction dimensionality reduction visualization->dimensionality\nreduction storytelling storytelling visualization->storytelling interactivity interactivity visualization->interactivity text and\ndocument\nvisualization text and document visualization visualization->text and\ndocument\nvisualization trees and\nnetworks trees and networks visualization->trees and\nnetworks - Conveys information/knowledge/wisdom through graphical representation of data - Visualization goals: - record, communicate, solve problems- Interaction, technology, tool use, augment human capabilities - Reading: Structure of the Information Visualization Design Space- Design - Data to ink ratio: data ink to non-data ink - Chart junk: non-data ink and redundant data ink - Data density: amount of data shown in a specific area - Layering and separation: focus, multiple views, how the eye scans- Integrity - Show data variation, not design variation - Lie factor - Clear labeling and appropriate axes - Show context- Subjective dimensions: aesthetics, style, playfulness, vividness - Target: domain problem characterization- Translate: domain problem to abstract tasks- Design: variable encodings and interaction- Implement: program the visualization and interface- Validate: at all stages, before vs. after - Map from data variables to visual encoding- Low-level encoding, data types, semantics, perceptual properties- Bertin's visual variables- Steven's Power Law- Comparisons of visual variables- Expressiveness vs effectiveness - Linked views- Multivariate plots- Heatmaps- Small multiples- Scatterplot matrices- Sparklines- Parallel coordinates- Brushing and linking- Glyphs - Star/radar plots - Chernoff faces - Color in the physical world- Color in the eye- Color representation in a computer - RGB vs HSV/HSL- Color in visualizations - Nominal/categorical - Scales/numeric - Focus/highlighting/selection- Illusions- Colorblindness- Dual encodings- Pitfalls - Map characteristics- Choropleth maps- Isopleth maps- Cartograms- Proportional symbols maps- Flow maps- Time varying maps- Thematic maps - Comparisons - Bar charts, waterfall charts, bullet charts, bars vs lines- Relationships - Scatterplots, overplottings, transparency jitter, trend lines, quadrants- Proportions - Pie charts, donuts, stacked bar/area, waterfall- Distributions - Histograms, demographic pyramids, bin- widths, density plots/bandwidth, box-and- whiskers, violin plots, cumulative density, Q-Q plots - PCA: project data into axes of highest variance - Unsupervised dimensionality reduction for numerical data (numeric input, numeric output) - K-Means Clustering - Unsupervised dimensionality reduction for numerical data (numeric input, categorical output) - Fisher's Linear Discriminant Analysis - Supervised (categorical) dimensionality reduction for numerical data (numeric data & categorical labels input, numeric output) - Latent Dirichlet Allocation - Unsupervised dimensionality reduction for categorical/ text data (categorical input, categorical output) - Multidimensional Scaling - Unsupervised dimensionality reduction for dissimilarity matrix (instance dissimilarity matrix input, numeric output) - Linear Methods vs. Non-Linear Methods - Non-linear methods are fairly recent methods for non- linear, manifold, and graph/network data - Reading: Narrative Visualization: Telling Stories With Data - Genre: magazine, annotated chart, partitioned poster, flow chart, comic strip, slide show, video/ animation - Visual Narrative: visual structuring, highlighting, transition guidance - Narrative Structure: ordering, interactivity, messaging- Author-driven vs reader-driven- Martini glass vs. interactive slideshow vs. drill-down story - Overview and detail - Shniederman's Mantra - Multiple views- Focus and context - Same view - Fisheye- Brushing and linking- Filtering- Animation - Encoding time on paper - Types of text documents- Preprocessing Bag of words, document vector- Word clouds- Word graphs- PCA - Appllcations: tournaments, org charts, geneology, flow charts, interactions- Nodes, edges- Adjacency matrices, bottle necks between components- Treemaps- Indentation
