Analyzing and Visualizing Data
Chapter 4
Working With Data
Data Assets and Tabulation Types
Two main categories
Data that exist in tables; Datasets
Data that exist as isolated values
Data Types
Levels of data or scales of measurement
Type of exploratory data analysis you can undertake
Editorial thinking you establish
Specific chart types you might use
Color choices and layout decisions around composition
Data Assets and Tabulation Types cont.
Textual (Qualitative)
Unstructured streams of words
Descriptive details of a weather forecast for a given city
The full title of an academic research project
The description of a product on Amazon
Data Assets and Tabulation Types cont.
Nominal (Qualitative)
Ordinal data is still categorical and qualitative in nature
Characteristics of order
The response to a survey question: based on a scale of 1 (unhappy) to 5 (very happy)
The general weather forecast: expressed as Very Hot, Hot, Mild, Cold, Freezing
Data Assets and Tabulation Types cont.
Interval (Quantitative)
Interval data is the less common form of quantitative data
Quantitative and numeric measurement
Measure for temperature
Data Assets and Tabulation Types cont.
Ratio (Quantitative)
Most common quantitative variable
Age of a survey participant in years
Forecasted amount of rainfall in millimetres
Unlike interval data, for ratio data variables zero means something
Data Assets and Tabulation Types cont.
Temporal Data
Time-based data
Textual: ‘Four o’clock in the afternoon on Monday, 12 March 2016’ Ordinal: ‘PM’, ‘Afternoon’, ‘March’, ‘Q1’
Interval: ‘12’, ‘12/03/2016’, ‘2016’
Ratio: ‘16:00’
Data Assets and Tabulation Types cont.
Discrete
No ‘in-between’ state
Days of the week
Heads or tails for a coin toss
1,2,3,4,5,6,etc.
Continuous
Has in-between state
Height and weight
Temperature
Time
1.1,1.2,1.3,1.4,1.5,etc.
Data Acquisition
What data do you need and why?
From where, how, and by whom will the data be acquired?
When can you obtain it?
Data Acquisition cont.
Curated by You
Primary data collection
Manual collection and data foraging
Extracted from pdf files
Web scraping (also known as web harvesting)
Data Acquisition cont.
Curated by Others
Issued to you
Download from the Web
System report or export
Third-party services
API
Data Examination
Data Properties
Data types
Size
Condition
Missing values
Erroneous values
Inconsistencies
Duplicate records
Out of date
Uncommon system characters or line breaks
Leading or trailing spaces
Data Examination cont.
How to Approach This?
Inspect and scan
Data operations
Statistical methods
Frequency counts
Frequency distribution
Measurements of central tendency
Measurements of spread
Maximum, minimum and range
Percentiles
Standard deviation
Influence on Process
Moving forward
Purpose map ‘tone’
Editorial angles
Physical properties influence scale
Data Transformation
Potential Activities
Transform to clean
Transform to convert
Transform to create
Transform to consolidate
Data Exploration
Exploratory Data Analysis
Instinct of the analyst
Reasoning
Deductive
Inductive
Chart types
Research
Statistical methods
Nothings
Not always needed
Questions?