image/svg+xml SEIS 632: Analytics and Visualization g analytics analytics prediction\n(supervised) prediction (supervised) analytics->prediction\n(supervised) pattern discovery\n(unsupervised) pattern discovery (unsupervised) analytics->pattern discovery\n(unsupervised) reinforcement learning\n(semi-supervised) reinforcement learning (semi-supervised) analytics->reinforcement learning\n(semi-supervised) numeric\n(regression) numeric (regression) prediction\n(supervised)->numeric\n(regression) categorical\n(classification) categorical (classification) prediction\n(supervised)->categorical\n(classification) clustering clustering pattern discovery\n(unsupervised)->clustering reinforcement learning\n(semi-supervised) structured prediction prediction\n(supervised)->numeric\n(regression) -Importance of analytics-Big V's-Data units-DIKW pyramid-Analytics process-Missing values, data types, tabular data model-Analytics vs. Databases-Analytics types-Analytics examples-SAS EM-Reading:Fayyad, Piatetsky-Shapiro, and Smyth 1996 -K-Means algorithm-Distance metrics, Euclidean distance-Normalizations-Outliers-Sensitivity to initial seeds-Need to determine k-Elbow method- --Tabular data format with targets/labels column-Regression vs classification-Performance metrics: misclassification and average squared error-Data partitioning: training, validation/development, test sets-Optimizing complexity -Regression formula-Matrix formulation-Solving for w's-Linear regression vs logistic regression-Logistic function-Missing values-One-hot encoding-Input selection: forward, backward, and stepwise-Polynomial regression-Normalization-Log transformation -Decision trees, logistic regression-Root, internal, and leaf nodes-Splitting-Logworth, and other splitting metrics-Optimizing complexity, Pruning-Assessing decision trees -Clustering, density estimation dependency modeling, outlier and change detection, dimensionality reduction-Applications g visualization visualization design\nprinciples design principles visualization->design\nprinciples visualization\nprocess visualization process visualization->visualization\nprocess visual\nvariables visual variables visualization->visual\nvariables high\ndimensional\ndata high dimensional data visualization->high\ndimensional\ndata color color visualization->color maps maps visualization->maps statistical\ngraphs statistical graphs visualization->statistical\ngraphs dimensionality\nreduction dimensionality reduction visualization->dimensionality\nreduction storytelling storytelling visualization->storytelling interactivity interactivity visualization->interactivity text and\ndocument\nvisualization text and document visualization visualization->text and\ndocument\nvisualization trees and\nnetworks trees and networks visualization->trees and\nnetworks - Conveys information/knowledge/wisdom through graphical representation of data - Visualization goals: - record, communicate, solve problems- Interaction, technology, tool use, augment human capabilities - Reading: Structure of the Information Visualization Design Space- Design - Data to ink ratio: data ink to non-data ink - Chart junk: non-data ink and redundant data ink - Data density: amount of data shown in a specific area - Layering and separation: focus, multiple views, how the eye scans- Integrity - Show data variation, not design variation - Lie factor - Clear labeling and appropriate axes - Show context- Subjective dimensions: aesthetics, style, playfulness, vividness - Target: domain problem characterization- Translate: domain problem to abstract tasks- Design: variable encodings and interaction- Implement: program the visualization and interface- Validate: at all stages, before vs. after - Map from data variables to visual encoding- Low-level encoding, data types, semantics, perceptual properties- Bertin's visual variables- Steven's Power Law- Comparisons of visual variables- Expressiveness vs effectiveness - Linked views- Multivariate plots- Heatmaps- Small multiples- Scatterplot matrices- Sparklines- Parallel coordinates- Brushing and linking- Glyphs - Star/radar plots - Chernoff faces - Color in the physical world- Color in the eye- Color representation in a computer - RGB vs HSV/HSL- Color in visualizations - Nominal/categorical - Scales/numeric - Focus/highlighting/selection- Illusions- Colorblindness- Dual encodings- Pitfalls - Map characteristics- Choropleth maps- Isopleth maps- Cartograms- Proportional symbols maps- Flow maps- Time varying maps- Thematic maps - Comparisons - Bar charts, waterfall charts, bullet charts, bars vs lines- Relationships - Scatterplots, overplottings, transparency jitter, trend lines, quadrants- Proportions - Pie charts, donuts, stacked bar/area, waterfall- Distributions - Histograms, demographic pyramids, bin- widths, density plots/bandwidth, box-and- whiskers, violin plots, cumulative density, Q-Q plots - PCA: project data into axes of highest variance - Unsupervised dimensionality reduction for numerical data (numeric input, numeric output) - K-Means Clustering - Unsupervised dimensionality reduction for numerical data (numeric input, categorical output) - Fisher's Linear Discriminant Analysis - Supervised (categorical) dimensionality reduction for numerical data (numeric data & categorical labels input, numeric output) - Latent Dirichlet Allocation - Unsupervised dimensionality reduction for categorical/ text data (categorical input, categorical output) - Multidimensional Scaling - Unsupervised dimensionality reduction for dissimilarity matrix (instance dissimilarity matrix input, numeric output) - Linear Methods vs. Non-Linear Methods - Non-linear methods are fairly recent methods for non- linear, manifold, and graph/network data - Reading: Narrative Visualization: Telling Stories With Data - Genre: magazine, annotated chart, partitioned poster, flow chart, comic strip, slide show, video/ animation - Visual Narrative: visual structuring, highlighting, transition guidance - Narrative Structure: ordering, interactivity, messaging- Author-driven vs reader-driven- Martini glass vs. interactive slideshow vs. drill-down story - Overview and detail - Shniederman's Mantra - Multiple views- Focus and context - Same view - Fisheye- Brushing and linking- Filtering- Animation - Encoding time on paper - Types of text documents- Preprocessing Bag of words, document vector- Word clouds- Word graphs- PCA - Appllcations: tournaments, org charts, geneology, flow charts, interactions- Nodes, edges- Adjacency matrices, bottle necks between components- Treemaps- Indentation
1
  1. Overview
  2. Visualization
  3. Visualization intro
  4. Design Principles
  5. Design Principles
  6. Visualization Process
  7. Visualization Process
  8. Visual Variables
  9. Visual Variables
  10. hi-d
  11. hi-d
  12. color
  13. color
  14. maps
  15. maps
  16. stat graphs
  17. stat graphs
  18. dim. redn.
  19. dim. redn.
  20. storytelling
  21. storytelling
  22. interactivity
  23. interactivity
  24. text
  25. text
  26. networks
  27. networks
  28. overview