# Turning Data Into Information

Turning Data Into Information

**Turning Data into Information**

**Having framed the problem/opportunity, formulated a testable hypothesis and gathered and organized key data, you are ready to continue your analysis by developing a data story that can be shared with others. To get started, download and review the “Types of Data Analysis” guide from our Week 4 readings, above.**

**Apply one or more of the following analytical tools to your dataset:****Correlation****Regression****Grouping and Visualization****Variance****Standard Deviation**

**Explain whether your analysis of the data confirmed or refuted your testable hypothesis.**

JWI 599: Business Analytics and Capstone

Types of Data Analysis

© Strayer University. All Rights Reserved. This document contains Strayer University confidential and proprietary information and may not be copied, further distributed, or otherwise disclosed, in whole or in part, without the expressed written permission of Strayer University. This document is subject to change based on the needs of the class.

JWMI 599 – Types of Data Analysis (1188) Page 1 of 4

Reference Sheet: Types of Data Analysis Data analysts use many different types of analysis when looking for patterns, correlations, and causations in data sets. Each type of analysis serves a different purpose; therefore, it’s important to select the most useful option(s), depending upon: your organization; the issue, problem or opportunity; and the particular data sets that you have collected. This reference sheet is intended as a resource to support you in deciding which type(s) of data analysis are most useful to apply to your work as you prepare for your Capstone project.

Type of Analysis Definition Primary purpose Recommended use

Grouping and Visualizing

Definition: Grouping quantitative data and metrics into a limited set of clearly defined variables defines and matches that type of graphic illustrations and visualization medium that effectively communicates the analytical “story” to the targeted stakeholders.

Purpose: The purpose of grouping values in a selected data set is to create categories for analyses based on the defined analytical problem or opportunity.

Recommended use: Unique data visualizations are a more “user-friendly” way of communicating quantitative data and metrics to stakeholders.

How To Steps: 1. Group the raw data into categories 2. Identify and define 2 or 3 variables you want to measure

3: Create a visual illustration to show your selected categories (e.g., bar chart, histogram, line graph, or pie chart)

Cluster Analysis

Definition:

Cluster analysis is that process of grouping a set of data in such a way that the data is each cluster or group are more similar to each other than the data in other clusters or groups.

Purpose:

Cluster analysis is a simple exploratory statistical procedure that sorts different homogeneous groups of data into a smaller or more meaningful data set for analyses.

Recommended use:

Cluster analysis is used to identify groups within a database that are not previously known.

How To Steps: 1. Identify and select a database 2. Identify the number of clusters in advance 3. Select a cluster analysis methodology to group the each observation in the selected database

(K-Means Cluster Analysis, Hierarchical Cluster Analysis, and the Two-Step Cluster Analysis)

JWI 599: Business Analytics and Capstone

Types of Data Analysis

© Strayer University. All Rights Reserved. This document contains Strayer University confidential and proprietary information and may not be copied, further distributed, or otherwise disclosed, in whole or in part, without the expressed written permission of Strayer University. This document is subject to change based on the needs of the class.

JWMI 599 – Types of Data Analysis (1188) Page 2 of 4

Chi-Square

Definition:

A chi-square is a statistical test for independence that determines whether there is a significant association between the two variables.

Purpose: The purpose of a chi-square or “goodness of fit” test is to determine if there is any difference between the observed value and the expected value.

Recommended use:

A chi-square statistical test is applied when you have two categorical variables from a single population.

How To Steps: The formula for the chi-square statistic is:

1. “C” are the number of the “degrees of freedom” 2. “0” represents the observed value 3. “E” represents the expected value 4. “DF” degrees of freedom 5. “N” number of observations

NOTE: “Degrees of freedom” represent how many dependent variables or values involved in an analytical calculation have the freedom to vary (DF = N-1).

Measurements of Central Tendency

Definition:

A simple mathematical technique used to identify the location of the center of a quantitative distribution.

Purpose:

MEAN: Mathematical average in a distribution. MEDIAN: Mathematical mid-point in a distribution. MODE: Most frequent value in a distribution.

Recommended use:

For NOMINAL data use the MODE. For ORDINAL data use the MEDIAN. For INTERVAL/RATIO (not skewed) data use the MEAN. For INTERVAL/RATIO (skewed) data use the MEDIAN.

How To Steps: 1. Arrange your data set from smallest to largest values 2. Determine which measure of central tendency to use in the analysis 3. Calculate the selected measure of central tendency (mean, median or mode)

Ranges

Definition: The range is a descriptive statistic that measures is the difference between the lowest and highest values in a data set.

Purpose: The purpose of the range is to shows how well the measure of central tendency represents the values in a selected data set.

Recommended use: Ranges are used to how spread out the values are in a selected database.

JWI 599: Business Analytics and Capstone

Types of Data Analysis

© Strayer University. All Rights Reserved. This document contains Strayer University confidential and proprietary information and may not be copied, further distributed, or otherwise disclosed, in whole or in part, without the expressed written permission of Strayer University. This document is subject to change based on the needs of the class.

JWMI 599 – Types of Data Analysis (1188) Page 3 of 4

How To Steps:

1. List the elements of the data set 2. Identify the highest and lowest numbers in the dataset 3. Subtract the smallest number in the data set from the largest number in the data set 4. Label the range

Variance

Definition: Statistical variance measures how the data distributes from the mean or the expected value.

Purpose: The variance is used to measure probability distributions. For example, the variance can help determine the risk an investor might take on when purchasing a specific security in the market.

Recommended use: Unlike the range that only looks at the extremes, the variance looks at all the data points or observations and than determines their distribution.

How To Steps:

1. Select a data set and calculate the MEAN 2. For each number or observation in the data set, subtract the MEAN and square the results (squared differences) 3. Calculate the average of each squared differences

Standard Deviation

Definition: A standard deviation (SD) is a statistical measure that is used to quantify the amount of variation or dispersion of a set of data values.

Purpose: A standard deviation assesses how far the values are spread above or below the mean of a selected population or sample data set.

Recommended use: A high standard deviation shows that the data is widely spread (less reliable) and a low standard deviation shows that the data are clustered closely around the mean (more reliable).

How To Steps:

1. Select a sample data set. 2. Calculate the mean of the sample. 3. Subtract mean value from each data value. 4. Square each result. 5. Find the sum of the squared values. 6. Divide by n-1, where n is the number of data points.

JWI 599: Business Analytics and Capstone

Types of Data Analysis

JWMI 599 – Types of Data Analysis (1188) Page 4 of 4

Confidence Intervals

Definition:

A statistical method that estimates the probability that a population measurement is similar to a sample value.

Purpose:

Confidence intervals are easy ways to understand the amount of uncertainty in a sample estimate of a population.

Recommended use:

Confidence intervals are used to draw inferences on population values from one or more samples.

How To Steps:

To calculate the confidence interval from a sample mean, choose either a 95% or greater confidence level which represents the amount of uncertainty in the sampling method, meaning that each time the same sampling method is used, the true population value would represent 95% or greater of all samples as well. That also means that 10% or less of the sample would not contain the true population value.

Correlation

Definition: A statistical measure that indicates either a positive or negative relationship between two or more variables.

Purpose: The purpose of correlation in analytics is to determine which variables are connected.

Recommended use: Correlations test the strength of the relationship between variables.

How To Steps: 1. If there are no associations between the selected variables tested, than there are no causal connections

between these variables 2. If there is an association between the selected variables tested, than a correlation coefficient (R) in a statistical

table to represent the estimated strength of the linear relationship between these variables

Regression

Definition: A statistical measure that attempts to determine the strength of a relationship between one dependent variable and one or more independent or control variable(s).

Purpose: Regression statistics estimate the value of the dependent variable when the independent (predictor) variable(s) are know.

Recommended use: Regression statistics are used to exam the relationship between one dependent variable a one independent variable.

How To Steps: 1. Does a set of predictor variable do a good job in predicting an outcome of a dependent variable? 2. Which variables in particular are significant predictors of the dependent variable? 3. What way do predictor variables impact the dependent variable?