Logo

Exploratory Data Analysis (EDA)

9. case studies, introduction.

In this diverse collection of case studies , the power of Exploratory Data Analysis (EDA) shines as a critical tool for understanding and extracting insights from various datasets across different domains. Each case study focuses on a specific problem domain, ranging from e-commerce customer behavior analysis to predictive maintenance in manufacturing industries . The primary goal of these analyses is to leverage EDA techniques to unravel hidden patterns , relationships , and trends within the data, leading to data-driven decisions and optimized strategies .

Throughout these case studies, diverse datasets play a pivotal role in providing a deep understanding of the subject matter. These datasets encompass online retail transaction records , electronic health records , credit card transaction data , environmental sensor readings , marketing campaign metrics , social media sentiments , GPS data from vehicles , student academic records , agricultural data , and manufacturing equipment sensor readings . Armed with these diverse datasets, analysts embark on an EDA journey, employing tools like Python , R , Pandas , Matplotlib , Seaborn , Plotly , Geopandas , and more.

The EDA process unfolds through several key steps, including data cleaning and preprocessing to ensure data quality, exploration of variables and patterns, and compelling visualizations that bring insights to life. The results of EDA reveal essential facets of each domain, such as best-selling products , customer segmentation , healthcare outcomes , fraudulent transactions , pollution hotspots , successful marketing campaigns , sentiment analysis , optimized transportation routes , student academic performance factors , and predictive equipment maintenance .

These case studies illustrate the indispensable role of Exploratory Data Analysis in empowering decision-makers across industries. By unlocking the valuable insights buried within vast datasets, EDA empowers businesses and organizations to optimize their strategies, enhance customer experiences, improve healthcare quality, prevent fraud, protect the environment, target marketing efforts, and optimize logistics. As a foundational step in the data analysis journey, EDA serves as a powerful bridge between raw data and actionable knowledge, opening up a world of possibilities for data-driven innovation and problem-solving.

Case Studies

E-Commerce Customer Behavior Analysis :

Description : This case study aims to understand customer behavior in an online retail business to improve marketing and product strategies.

Dataset : Online retail dataset containing transactional records, customer IDs, product details, timestamps, and order quantities.

Tools : Python with Pandas for data manipulation, Matplotlib and Seaborn for data visualization.

Steps using EDA :

Data cleaning and preprocessing to handle missing values and remove duplicates.

Exploring product popularity, customer purchase patterns, and customer segmentation.

Visualizing purchase trends, seasonal patterns, and revenue growth.

Results : Identifying best-selling products, peak shopping hours, customer segments, and trends in revenue growth.

Healthcare Patient Outcomes Analysis :

Description : This case study focuses on analyzing patient outcomes based on electronic health records (EHR) to improve healthcare quality.

Dataset : Electronic health records (EHR) with patient demographics, medical history, diagnoses, treatments, and patient outcomes.

Tools : R with dplyr and ggplot2 for data wrangling and visualization.

Data preprocessing and cleaning to handle missing values and outliers.

Exploring patient demographics, disease prevalence, and treatment efficacy.

Visualizing readmission rates, mortality rates, and correlations between variables.

Results : Identifying factors influencing patient outcomes, trends in readmission rates, and potential areas for healthcare improvement.

Financial Fraud Detection :

Description : This case study aims to detect fraudulent transactions in credit card data to enhance fraud prevention systems.

Dataset : Credit card transaction data with details such as transaction amounts, locations, timestamps, and customer IDs.

Tools : Python with Pandas for data preprocessing, Matplotlib and Seaborn for visualization, and machine learning algorithms for fraud detection.

Data cleaning and preprocessing to handle imbalanced classes and outliers.

Exploring transaction patterns, correlations, and frequency of fraud cases.

Visualizing transaction amounts, fraudulent vs. non-fraudulent transactions, and identifying potential fraud hotspots.

Results : Identifying unusual spending patterns, high-risk transactions, and improving fraud detection accuracy.

Environmental Sensor Data Analysis :

Description : This case study involves analyzing environmental sensor data to understand air quality trends and pollution sources .

Dataset : Air quality sensor data with measurements of pollutants like CO2, PM2.5, and ozone at various locations and timestamps.

Tools : Python with Pandas for data cleaning, Plotly for interactive visualizations, and geographical libraries for mapping.

Data preprocessing to handle missing values and outliers in sensor readings.

Exploring pollutant levels, spatial distributions, and temporal trends.

Visualizing pollution hotspots and correlations between pollutants.

Results : Identifying areas with poor air quality, trends in pollutant levels, and potential pollution sources.

Marketing Campaign Performance Analysis :

Description : This case study involves analyzing the performance of marketing campaigns to optimize marketing strategies.

Dataset : Marketing campaign data with details of campaigns, customer responses, conversions, and costs.

Tools : R with tidyverse for data manipulation, ggplot2 for visualization, and A/B testing tools for campaign performance analysis.

Data cleaning and preprocessing to handle missing data and inconsistencies.

Exploring campaign performance metrics, customer response rates, and conversion rates.

Visualizing campaign effectiveness, customer segmentation, and A/B test results.

Results : Identifying successful marketing campaigns, high-converting strategies, and customer segments with the best response rates.

Social Media Sentiment Analysis :

Description : This case study aims to analyze social media data to gauge public sentiment about products, brands, or events.

Dataset : Twitter or Facebook data with text posts, timestamps, and user engagement metrics.

Tools : Python with TextBlob or NLTK for sentiment analysis, WordCloud for word visualization, and Matplotlib for plotting.

Text preprocessing to handle stopwords, special characters, and convert text to lowercase.

Analyzing sentiment scores, word frequencies, and trending topics.

Visualizing word clouds to highlight positive and negative sentiment words.

Results : Identifying overall sentiment towards products or brands, popular topics, and public perception trends.

Transportation and Logistics Optimization :

Description : This case study involves optimizing transportation and logistics operations to improve efficiency and reduce costs.

Dataset : GPS data from vehicles, delivery records, traffic information, and location details.

Tools : Python with Geopandas for geospatial analysis, NetworkX for route optimization, and visualization libraries for maps.

Data preprocessing to handle GPS data, normalize timestamps, and clean location data.

Exploring traffic patterns, congestion points, and delivery routes.

Visualizing optimized routes and delivery efficiency.

Results : Identifying bottlenecks, optimizing delivery schedules, and reducing transportation costs.

Education Performance Analysis :

Description : This case study focuses on analyzing student performance data to understand factors influencing academic outcomes.

Dataset : Student academic records with grades, attendance, test scores, and demographics.

Tools : R with tidyr and dplyr for data tidying, ggplot2 for visualizations, and machine learning models for performance prediction.

Data cleaning and preprocessing to handle missing grades and attendance records.

Exploring student demographics, grade distributions, and attendance patterns.

Visualizing performance trends, correlations between variables, and predicting academic performance.

Results : Identifying factors affecting student academic performance, predicting at-risk students, and designing targeted interventions.

Agricultural Yield Prediction :

Description : This case study aims to predict crop yields based on agricultural data to optimize planting strategies.

Dataset : Agricultural data with historical weather data, soil characteristics, crop details, and yields.

Tools : Python with NumPy and Pandas for data manipulation, Scikit-learn for regression models, and visualization libraries for plotting.

Data preprocessing to handle missing weather data and crop details.

Exploring weather patterns, correlations between weather variables, and crop yields.

Visualizing yield predictions and comparing with actual yields.

Results : Identifying the correlation between weather patterns and crop yields, optimizing planting schedules, and predicting future harvest outcomes.

Predictive Maintenance in Manufacturing :

Description : This case study focuses on predictive maintenance in manufacturing industries to reduce downtime and improve productivity.

Dataset : Sensor data from manufacturing equipment , including temperature, vibration, and other performance indicators.

Tools : Python with Pandas for data preprocessing, Plotly for visualization, and machine learning algorithms for predictive maintenance.

Data cleaning and preprocessing to handle missing sensor readings and outliers.

Exploring sensor data patterns, correlations between sensor variables, and anomalies.

Visualizing predictive maintenance predictions and comparing with actual breakdowns.

Results : Identifying early signs of equipment failure, scheduling maintenance proactively, and minimizing unplanned downtime.

In each case study, the Exploratory Data Analysis (EDA) process plays a crucial role in uncovering insights, trends, and relationships within the data. By using various data cleaning, exploration, and visualization techniques, analysts can gain valuable insights to make data-driven decisions and optimize processes in different domains. The results obtained through EDA inform subsequent analyses, help refine strategies, and lead to improvements in various aspects of the business or domain being studied.

Last updated 11 months ago

  • Show All Code
  • Hide All Code

Exploratory Data Analysis-Retail

1. introduction, 2. task summary, 3.1. installing and loading packages, 3.2. importing dataset, 3.3. cleaning the data, 4.1 aggregation, percentage by category, percentage by region, conclusion and recommendations.

Performing “Exploratory Data Analysis” on dataset “Superstore”; a task provided by The Sparks Foundation for Graduate Rotational Internship Program .

Using the dataset provided, try to find out the weak areas where the company is not performing well so that the business manager can focus on those areas to make more profit in the future by drawing best recommendations from the analysis.

For this analysis, we will answer the following questions:-

Which states are selling less quantities and are less profitable and also which states are doing good?

What insight we can find by analyzing the “Quantity” and “Profit” by “Region” and “Category”?

Does the “Ship Mode” and “Discount” have an impact on the profit and quantity sold.

After analyzing the data to answer the above question, what recommendations can we draw to implement in making less profitable categories and regions do better?

4. Exploratory Analysis

We will begin the analysis process by summarizing our data which includes calculating average and sum of sales, quantity, profit and grouping by region and categories.

First we plot a graph to analyze the profit by region and category

Observation

→ In all categories West region is performing good.

→ In furniture category, we can see a loss which is in Central region.

→ Among the three Categories Office Supplies is the top profitable followed by technology.

→ Furniture is the least profitable category in all regions.

Now let see if “Quantity” and “Profit” have a positive correlation;

As we can see from the above plot there is no positive correlation between the quantity sold and the profit earned; so that we can safely say that quantity sold is not the factor for being profitable in all categories and regions.

4.2 Percentage

→ For a better visualization, let’s plot a pie chart

From the above pie chart we can see that: -

→ 60% of total sales is in the office supplies category,

→ 32% of the total sales is in the West region.

Now let see which State, subcategory and ship mode are more profitable;

From the above graph we can see that:

→ “Copiers” are number one profitable subcategory followed by “Phones”,

→ Three subcategories; “Bookcases”, “Supplies” and “Tables” noticeably lower in profit and having a loss.

Profitable States

The top 5 States from the perspective of profitability are: -

Shipping mode percentage and Profit

As we can see from the above chart, Standard Class is the number one profitable shipping mode followed by Second Class shipping mode.

Shipping mode can be one of the factor for being profitable as we can see some trend from the above visualization.

Discount impact on average profit

It shows that the higher the discount the lower the profit; so that discount can also be the other factor for loss and not being profitable.

To conclude our analysis: -

West region shows a good performance in sales and profit which is 32% on the other hand Central America is the list performing state.

Office Supplies is the top profitable category which is 60% followed by technology.

Furniture is the least profitable category in all region.

Quantity and profit doesn’t have a positive correlation.

Copier is more profitable subcategory followed by Phone

There is a loss that we can see in Book Case , Supplies and Table subcategory.

Top five States that are performing good are: California, Michigan, New York, Virginia and Washington .

Recommendation

1.For the states and subcategories that are already profitable, we need to focus more in promoting sales and distribution of the product.

For the items that encounters loss, we need to remove discount and take in to consideration what kind of shipping mode can be more cost effective based on the region that the item shipped.

The majority of categories that are more profitable, Standard Class ship mode used so that, it is more ideal to charge extra fee for the customer who likes to use the other shipping mode.

Discount is more ideal in the event of customers ordering a bulk amount and when they agree to use the most cost effective shipping mode avaliable.

Exploratory Data Analysis: A Comprehensive Guide to Make Insightful Decisions

Today, data has become ubiquitous, affecting every aspect of our lives. From social media to financial transactions to healthcare, the generated data is abundant, which has ignited a data-driven revolution in every industry. 

As the data sets grew, the complexity of understanding these data increased, and that’s how exploratory data analysis was uncovered. The data analysis process has many attributes, like cleansing, transforming, and analyzing the data to build models with relevant insights. 

Exploratory Data Analysis is all about knowing your data in depth, such as significant correlations and repeating patterns, to select the proper analysis methods. In this article, you will learn about Exploratory Data Analysis, its key features, how to perform EDA, its types, the integration challenges, and how to overcome them.

What is Exploratory Data Analysis?

Understanding your data in depth before performing any analysis on it is essential. You need to know the patterns, the variables, and how those variables relate to each other, among other things. EDA (exploratory data analysis) is the process that examines, summarizes, visualizes, and understands your data to generate data-driven conclusions and further study of the information. 

Let’s look at some of the key attributes of EDA:

Summarizing the Data

‍ EDA allows data summarization using descriptive statistics, which helps understand the deviation, central tendency, and distribution of the values.  ‍

Data Visualizing

‍ By implementing EDA, you can utilize varied visualization techniques, such as histograms, line charts, scatter plots, etc, to explore patterns, relationships, or trends within the data set. ‍

Data Cleansing

‍ Data cleansing is crucial for driving meaningful insights and ensuring accuracy. EDA helps you clean your data by identifying errors, inconsistencies, duplicates, or missing values.

What are the Types of Exploratory Data Analysis?

Understanding exploratory data analysis techniques helps with efficient data analysis and meaningful decisions. Let’s look at three primary exploratory data analysis types.

1. Univariate Exploratory Data Analysis

It is a type of analysis in which you study one variable at a time. This helps you understand the characteristics of a particular variable without getting distracted by other factors. For example, focusing on product sales to know which product is doing better.

There are two types of Univariate Analysis: 

  • Graphical Methods: Graphical methods use visual tools to understand the characteristics of a single variable. The visual tools may include histograms, box plots, density pots, etc.  ‍
  • Non-Graphical Methods: In non-graphical methods, numerical values can be used to study the characteristics of a single variable. These numerical measures include descriptive analysis, such as mean, median, mode, dispersion measures, or percentiles. 

2. Bivariate Exploratory Data Analysis

Bivariate analysis focuses on studying a relationship between two variables. It examines how a change in one variable can impact another. For example, the relationship between customer age and product category can be analyzed to determine whether age influences buying preferences. There are multiple ways to examine this relationship, both graphically and non-graphically.

The graphical methods include scatter plots, line charts, and bubble charts to visualize the relationship between two numeric values. The non-graphical method examines the relationship between two variables through correlation, regression analysis, or chi-square test.

3. Multivariate Exploratory Data Analysis

Multivariate analysis helps to analyze and understand the relationship between two or more variables simultaneously. It helps unveil more complex associations and patterns within the data. For example, it explores the relationship between a person's height, weight, and age.

There are two types of Multivariate Analysis:

  • Graphical Methods: These methods help analyze patterns or associations through visualization, such as scatter plots and heat maps.  ‍
  • Non-Graphical Methods: These statistical techniques are used to make predictions, test a hypothesis, or draw conclusions. They include methods such as multiple regression analysis, factor analysis, cluster analysis, etc.

How to Perform Exploratory Data Analysis?

Let’s look at the Exploratory data analysis steps:

1. Data Collection

‍ You should recognize the need to collect the relevant data from various sources and collect the data according to the purpose of your analysis.  ‍

2. Inspecting the Data Variables

‍ As the information collected is vast, you must first identify the critical variables that would affect your outcome or impact your decision.  ‍

3. Data Cleansing

‍ In this step, you need to clean your data by identifying the errors, missing values, inconsistencies, or duplicate values. ‍

4. Identifying Patterns and Correlations

‍ Visualize your data sets using different data visualization tools to understand the patterns and relationships between variables. ‍

5. Performing Descriptive Statistics

‍ Calculate a statistical summary of your data using statistical methods or formulas.   ‍

6. Perform Advanced Analysis

‍ Conduct an advanced analysis beyond descriptive statistics by defining objectives and specifying questions you want answered. This allows you to gain deeper insights into your data and identify complex relationship ‍

7. Interpret Data

‍ Interpret the results of your analysis and generate insights and conclusions based on the relationships, patterns, and trends.  ‍

8. Document and Report

‍ Document the results of your analysis, including all the steps and techniques you have used. Mention your key findings and summarize the data inside the report, which you can use to share insights with your stakeholders. 

Exploratory Data Analysis Tools You Can Use

EDA can be conducted using different tools or software platforms. Let’s look at some of the popular tools used: 

Python’s simplicity makes it a preferable choice for data analysis. Its rich library ecosystem includes libraries like NumPy, Pandas, and Scikit-learn for machine learning, data manipulation, and numerical tasks. These libraries provide versatile toolsets for EDA and visualization. With Python, you can effectively handle and manage missing data and uncover valuable insights from large datasets.

R is a general-purpose programming language designed for data analysis and graphics. With R, you can handle and manipulate complex statistical datasets. It provides a rich set of tools for data cleaning, transformation, and analysis, allowing you to prepare data for further exploration. Additionally, R provides built-in analysis functionalities for performing detailed ERA and uncovering valuable insights from your data.  

MATLAB stands for Matrix Laboratory. It is a high-level programming language and an interactive environment designed to perform numerical computations, data analysis, and visualizations. MATLAB is widely used to solve problems and provide numerical stimulation in mathematics, physics, finance, and engineering.

Jupyter Notebooks

Jupyter is an interactive computing environment that allows you to create and share documents containing texts, equations, live code, visualizations, and more. It supports multiple programming languages, which makes it a versatile data analysis and research tool. Jupyter Notebooks' flexible environment provides an interactive workspace for working on EDA projects.

What are the Challenges of EDA?

Before implementing EDA, you must know the challenges you might face ahead of time. Let’s look at some of them: 

Data Unification

‍ One of the most significant hurdles is data unification. Combining data from multiple sources, such as APIs, cloud platforms, or databases, before analysis can be a complex task. The challenge lies in the differences in data format or structure across these sources. For instance, customer data from CRM can be structured differently than website traffic data from analytics applications. ‍

Data Quality

‍ Performing EDA involves ensuring the quality of the data. The data is sourced from multiple sources, and these data sets may have inconsistencies, missing values, errors, etc., which may lead to incorrect conclusions.  ‍

Data Security

‍ When handling large amounts of sensitive data, you might encounter security issues where unauthorized access or data breaches can become a crucial challenge.  ‍

Data Consistency

‍ It is essential to reflect the updates or modifications in all related data sets, which helps maintain data consistency and synchronize your integration process.

Addressing these challenges might require a robust solution combining best practices for smooth data integration . 

Simplifying the EDA Process with Airbyte

EDA is a crucial step in any data science project. However, the initial stages of EDA can be bogged down by challenges related to data preparation and consolidation. Here, Airbyte simplifies these challenges and empowers you to focus on the crucial task—extracting insights from your data.

Here’s how Airtbyte helps you to streamline the EDA process:

Airbyte

Airbyte is a data integration and replication platform with an extensive library of 350+ pre-built connectors . The library contains a varied range of APIs, databases, and flat files, making the integration process smooth. Through the Connector Builder Kit (CDK), you can also build customized connectors according to your work-specified needs. 

For complex transformations, you can seamlessly integrate Airbyte with dbt (data build tool). This powerful combination allows you to leverage dbt’s functionalities like data standardization, cleansing, and mapping before loading it into your target destination. 

What’s more! Airbyte adheres to industry standards such as GDPR, SOC2, ISO certifications, and so on, providing data security and compliance. 

Use Cases & Examples of Exploratory Data Analysis 

Exploratory data analysis is essential for understanding the trends and patterns among the data and using the information to derive insightful conclusions. Let’s look at exploratory data analysis examples and use cases.

EDA in Retail

EDA in retail can be performed to understand a particular product's sales patterns. Retailers can use the EDA tools to improve their sales by studying how different variables such as price, discount, and demographics contribute to increases or decreases in sales. They can also know which products are doing best and in which region the sales of a particular product are high. 

EDA in Healthcare

EDA can be used to analyze clinical trial data and study the effectiveness of a particular drug or treatment. It helps you analyze patients' data, how they respond to a certain treatment, risk factors, etc. Insights from EDA can help healthcare professionals make more informed decisions about patients' care, treatment strategies, and resource allocation.

EDA isn’t just about examining data. It’s a robust technique that unlocks meaningful insights from even complex data structures. By employing EDA, you can gain a deeper understanding of your data, enabling you to identify areas for improvement and make strategic data-driven decisions for your business. 

Suggested Read:

Data Analytics vs Data Analysis

About the Author

Table of contents, get your data syncing in minutes, join our newsletter to get all the insights on the data stack., integrate with 300+ apps using airbyte, integrate and move data across 300+ apps using airbyte., related posts.

logo

Association Analysis – Retail Case Study Example (Part 4)

Edward Scissorhands - by Roopam

Edward Scissorhands – by Roopam

This is a continuation of the case study example of marketing analytics we have been discussing for the last few articles. You can find the previous parts at the following links (  Part 1 ,  Part 2 , and Part 3 ).  In the last part, we discussed exploratory data analysis (EDA:  Part 3 ). In this article we will talk about association analysis, a helpful technique to mine interesting patterns in customers’ transaction data. Association analysis can be used as a handy tool for extended exploratory data analysis. By the way, association analysis is also the core of market basket analysis or sequence analysis. Later in the article, we will use association analysis in our case study example to design effective offer catalogs for campaigns and also online store design (website).

Scissorhands

I must have been 9 or 10 years old when in our school we had our first craft lecture. Craft lectures are called SUPW in India, it’s an abbreviation  for ‘ Socially Useful Productive Work’ . As a part of the first lecture, each student was provided with an A4 sized color paper and a pair of scissors. In the first lecture excited kids with no direction discovered that they could cut a sheet in a virtually infinite number of ways. It was neither socially useful nor productive work, and created a lot of wasted paper. A more apt long form of SUPW in this case is ‘ Some Useful Paper Wasted’ . Later with a more directed effort we discovered that there are so many cool shapes hidden in a piece of paper as long as scissors are used wisely.

This is precisely the kind of experience many analysts have when they come across customers’ transaction data in companies. There is wealth of information about customer behavior hidden in this data but it is hard to figure out where to start. Transaction data can be sliced, diced and grouped in infinitely many ways similar to a piece of paper dissected with scissors. The key in both these above cases is direction.

Hollywood Image of Data Analysis

A-Beautiful-Mind

The point I am trying to drive at here is that data analysis is a highly planned activity.  As an analyst never touch your data before you have a proper plan of action (hypotheses etc.) in place. Having said this there are always going to be times as an analyst, when you have to enter uncharted territories of data to find patterns. In these cases, I will recommend you rely on machine learning algorithms or create your own modified algorithms specific to your requirements. In my opinion, machines are any day better than us humans at this task. Association analysis powered by the  Apriori algorithm is one such technique to mine transaction data. Let’s explore association analysis in the next part.

Association Analysis

Association analysis, as you will discover soon, is primarily frequency analysis performed on a large dataset. Since datasets for most practical problems are large you need clever algorithms like Apriori to manage association analysis.Let’s consider a much smaller transaction dataset to learn about association analysis. Here, each row or transaction number represents market baskets of customers. For the subsequent products columns, 1 represents ‘bought the product in that transaction’, whereas, 0 stands for ‘did not buy’.

Transaction # Shirts Trousers Ties
001 1 1 1
002 0 1 0
003 1 0 1
004 1 0 1
005 1 1 0

There are a few association analysis metrics (i.e. support, confidence, and lift) that are really helpful in deciphering information hidden in this kind of dataset. Let us explore these metrics and understand their usage. Support for purchase of shirts and ties together in association analysis is defined as:

\textup{Support}=\frac{\textup{\normalsize transactions\ with\ both\ shirts\ and\ ties}}{\textup{\normalsize total\ number\ of\ transactions}}=P(shirts\cap ties)

For our data there are 3 transactions with both shirts and ties (shirts∩ties) out of total 5 transactions.

\textup{Support}=\frac{\normalsize 3}{\normalsize 5}=0.6\ or\ 60\%

60% is a fairly high value for support and you will rarely find such high values for support in real world examples. For real world problems with several product groups, support of 1% or at times even lower depending upon the nature of your problem is also useful.

Confidence for association is calculated using the following formula:

\textup{Confidence}=\Large\frac{\textup{\normalsize transactions\ with\ both\ shirts\ and\ ties}}{ \textup{\normalsize transactions\ with\ just\ shirts}}=P(ties|shirts)

In our dataset, there are 3 transaction for both shirts and ties together out of 4 transactions for shirts. The calculation for confidence for our dataset is:

\textup{Confidence}=\frac{\large 3}{\large 4}=0.75\ or\ 75\%

Again you will rarely find such high value of confidence for most real world problems unless there are appealing combo offers on two products. A good value of confidence is again problem specific.

A third useful metric for association analysis is lift; it is defined as:

\textup{Lift}=\frac{\textup{\normalsize confidence}}{\textup{\normalsize expected confidence}}=\frac{\normalsize  P(ties|Shirts)}{\normalsize P(ties)}

Expected confidence in the above formula is presence of ties in the overall dataset i.e. there are 4 instances of ties purchase out of 5.

\textup{Lift}=\frac{{\normalsize 3}/{\normalsize 4}}{{\normalsize 3}/{\normalsize 5}}=\frac{\normalsize 15}{\normalsize 12}=1.25\ or\ 125\%

The value for lift, 125%, shows that purchases of the ties improve when the customers buy shirts. The question you are asking here is that if the customer buys a shirt, does his chance of buying ties go up i.e. value of lift above 100%. Let us use our knowledge about association analysis for the case study example we have been working on.

Retail Case Study Example – Association Analysis

Dress_Shirt_and_Tie_Combos_Men_Nordstrom

With your data for formal shirts and ties we explored in the above example, you got support of 0.2% with confidence of 12% and lift of 509%. This implies that though there are fewer percentage records of transactions with both ties and shirts, once the customers buys formal shirts his chances of buying a tie goes up five fold.

DresSMart provides the option to it’s customers to return the undamaged product back within 30 days with full refund. You did a further investigation of customers who are buying ties along with shirts and found that product return rates of the ties for these transactions are also 3 times more than the other return rates. This is an indicator that customers are struggling to choose matching ties while placing the orders online along with shirts. There is a need to improve this process on the company’s website. The idea is to reduce product return rate while exploiting the full opportunity for cross selling ties with shirts.

You have found some good clues to improve the profitability of your company through exploratory data analysis tools. Now you want to prepare and address the original objectives ( Part 2 ) to improve profitability for campaign efforts. You will delve into serious modeling for this task next time around.

Sign-off Note

Hope you enjoy being Edward Scissorhands with your data! See you soon with the next part of this case study example where we will explore more about decision tree algorithms.

18 thoughts on “ Association Analysis – Retail Case Study Example (Part 4) ”

Liked the article. I wanted to know how feasible is it using association analysis for online path analysis and clickstream data. For example, how two different page urls are used and so on. How can I use apriori algorithm for improvement of the model?

Thanks in advance. Poonam

Thanks Poonam, I am glad you enjoyed this article.

The way you have described your problem, I don’t see a reason why association/sequence analysis won’t work. The only thing you may have to keep in mind is that if the number of page urls are too high (say 500 or more) then you may want to categorize them into broader categories before doing your analysis.

For the Apriori algorithm you can use ‘arules’ package in R. Association analysis is not so much a model but a method to create simple rules using frequency & basic probability analysis. Hence, the Apriori algorithm is not to improve any models but to find these rules efficiently. I hope this helped let me know if you need any further help.

You are awesome ,Innovate resourceful teacher ,,.. Thank you for your wonderful articles

I have read almost all of your articles. But I didn’t find any article on Maximum likelihood estimator(MLE). Request you to write article on OLS VS MLE vs Bayesian Estimator with examples.

Thanks Rajanna for the kind words. I am really happy you are enjoying the articles. Will discuss Maximum Likelihood and other techniques in some later articles.

could you please list in here the URLs of the previous three parts pf this blog?

Here you go: Part 1 , Part 2 , and Part 3

Roopam, thanks for presenting this articles. i came up with the following situation while doing the association rules. Please let me know how to select the best rule in the following situation.

support confidence lift rule 0.5 0.6 2.86 lhs= Rin rhs=surf excel 0.5 0.6 2.86 lhs= Rin rhs=dettol 0.4 0.5 2.86 lhs= diaper rhs=surf excel

Good job turning this case study into a an interesting story. Like a good book, I can’t put it down before I learn how it ends! Thanks for educating the world on how useful yet not frightening data analysis can be.

Thank you, I am really happy you are enjoying this case, and learning from it.

Hello Roopam, Thanks for publishing such an informative article in a simple layman’s term.

Pls do let me know if am missing out something here:

Expected confidence -P(Ties) should be read as 3/5 as i can see only 3 ties were bought per this dataset, however you have mentioned 4/5 in ur calculation. Pls correct my observation.

Expected confidence in the above formula is presence of ties in the overall dataset i.e. there are 4 instances of ties purchase out of 5

Thanks Abhinav, that was a typo have corrected it.

Excellent material, thanks a lot! 🙂

The links to parts 1,2 and 3 are dead, would be nice to read the whole series.

Some glitch with WordPress. Try now.

You could find the whole series at this link : http://ucanalytics.com/blogs/category/marketing-analytics/retail-case-study-example/

Thank you very much for these case studies. I am preparing for my Data Science Consultant interviews these days and these are helping me a lot. The next rounds in most companies I am interviewing with is Analytical Case Study. The HR described it as, “they will give a scenario, aks for what data will u need, what algos can you run, what are the risks involved etc”. Leaving your blog, I haven’t found many other good case studies which reflect the scenario I am most likely to get.

I have a question and some requests: 1) How should I come up with risks for any particular scenario? Is there a framework involved? 2) Apart from the Case Studies that you currently have on the blog, are there any more that you can share. 3) Can you point me to any other blogs/posts/videos/links you have come across which contain similar work?

Thank you very much. This is awesome work and is most likely helping a lot of people.

1) How should I come up with risks for any particular scenario? Is there a framework involved?

Risk is an extremely wide concept but analytically think of it as the probability of things going outside the expected business boundaries. You may find this credit risk case study useful http://ucanalytics.com/blogs/category/risk-analytics/banking-risk-case-study-example/

There are several great websites with good explainations of statistical & machine learning tools and coding. However, I didn’t come across any website focusing completely on creative business problem solving and case studies – the way data science professionals do it in the real world. This was precisely the reason I started this blog. Am glad it helped you. All the best.

Hi, Roopam,

Though I am new to data analytics and you can say zero experience in Algorithm. As of now have been basically instinctive based. Point that you made data analysis is more planning then instinct is awesome… I hope to learn from your blog. Decided to go through each of the post that you have posted…

Regards, Sushant

you are really good store teller ( with concept). I must say I enjoyed each and every line .

thanks a lot

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Notify me of new posts by email.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

View:

Session 9A: Percolator: Theory and Praxis of Liberatory Justice in Public Service Organizations: Rewards, Challenges, and the Way Forward
08:30 , , and
08:30 important intermediate mechanisms in the causal linkage between PBF and student success. A few studies have examined PBF-driven shifts in spending patterns in public institutions, finding only marginal to null average treatment effects on financial priorities of public four-year institutions (Rabovsky, 2012; Kelchen & Stedrak, 2016; Hu et al., 2022). However, changes in institutional processes often take time and financial priorities of incentivized institutions may evolve over time as institutions learn and adapt to their changing state funding environments (Heinrich & Marschke, 2010; Mizrahi, 2020). This study examines the dynamic shifts in institutional spending in public four-year institutions subject to PBF policies and by minority-serving institution (MSI) status. The study leverages institution-level data from IPEDS and a comprehensive state-level PBF dataset and employs event study analysis. Understanding the dynamic changes in institutional spending over multiple periods may provide information on why PBF policies continue to yield limited improvements in college completion outcomes. Evidence on the dynamic shifts in institutional spending may also enable states to better design and implement performance incentives that induce desirable institutional changes and improve student outcomes ultimately. 

       

 

 

 

08:45 , and
09:00
09:15 and
08:30
08:45 and
09:00 and
09:15 , and

Our analysis employs multivariate regression and other methods, considering variables for major threat categories, consequences of terrorist attacks, and terrorist funding potential through irregular trade. Additional control variables include political, power, and demographic factors. This approach provides a more comprehensive assessment of terrorism risks and funding allocation efficiency.

08:30 and

The latter has become especially salient in public administration, with social equity being elevated as a core public service value and the demonstrated performance benefits of a(n) (effectively managed) diverse workforce. More historically, public organizations have sought to be demographically representative institutions, with recognizable implications for responsiveness among street-level bureaucrats, especially in arenas with administrative discretion (Keiser et al., 2002).

The objective of this paper is to address these policy-salient concerns by examining what qualities of public sector jobs are most attractive across age groups, as well as race and gender. To do so, we utilize a large-scale pre-registered conjoint experiment that allows us to make valid inferences on the impact of our independent variables on job attractiveness.

Our contributions are two-fold: first, we compare the simultaneous effects of a range of variables on job attractiveness whereas previous work has examined them in isolation; and secondly, we devote specific attention to comparing differences in the needs and work values of individuals across age groups. The findings highlight what matters the most in how job seekers self-select into differing organizational/policy domains, professional contexts, as well as job characteristics. The paper ends with a discussion of the findings and future work to advance this area of research.

08:45 and

This study focuses on one such stressed organizational context -child welfare services- and uses the job demands-resource model to unpack the reform needed to motivate and engage child welfare caseworkers. By doing so, it builds on the literature of how work engagement in public sector contexts, especially highly stressed ones, may be differently affected by clusters of job demands and resources. Using an explanatory sequential mixed method approach, the study first identifies the clusters of job demands and job resources that are antecedents of high satisfaction and overall work commitment in child welfare caseworkers. This is done by analyzing secondary survey data from the second cohort of the National Survey of Child and Adolescent Well-Being(NSCAW II).

This analysis is followed by in-depth interviews with current child welfare caseworkers to understand the relative importance of the identified job demand and resource clusters. Additionally, the interviews will add richness to the study by unpacking the personal experiences of caseworkers in the post-pandemic public sector human service work environment. The study, therefore, will provide useful insights to better inform the design and implementation of human resource policy reforms in the public sector.

09:00 and

In this paper, we develop a model for understanding where organizations fall on the continuum of preventing exclusion to promoting inclusion in their DEIA work. Preventing exclusion is associated with legal compliance, internal processes, and diversity inputs while promoting inclusion is associated with creating equitable environments where individuals feel a sense of belonging.

We test this model using survey data from veteran serving organizations (VSOs) participating in 18 AmericaServes networks across the United States (n=1,000) and individual surveys of veterans utilizing services (n=2,731). We propose that how organizations define and do DEIA work has profound impacts for whether historically marginalized groups access and utilize services. We conclude with guidance for organizations to develop and implement substantive and systematic DEIA work.

This work is funded by USAA and done in partnership with the D’Aniello Institute for Veteran and Military Families.

09:15

Specifically, this study explores the impact of organizational inclusion and justice on the behavioral pathways that employees strategically choose in response to harassment experiences and their willingness to report such incidents. The findings reveal diverse effects on behavioral choices: Enhanced justice significantly predicts both the willingness to report incidents and turnover intention, though it is not significantly associated with changes in assignment or transfer. Inclusion, conversely, exhibits nuanced effects across behavioral strategies, significantly predicting the willingness to report but demonstrating positive associations with turnover intention and transfer.

Qualitative data further confirm that organizational inclusion and justice play a crucial role in reshaping policies to protect victims, although mixed perspectives exist among employees regarding their behavioral choices when addressing harassment experiences. The study highlights the substantial impact of organizational inclusion and justice as proactive measures in curbing misconduct within highly bureaucratic settings. However, it underscores the necessity for delicate management strategies to ensure effectiveness in addressing workplace harassment.

08:30 and
08:44 , and

To address the gaps, we employ a quasi-experimental method, regression discontinuity (RD) design, based on school performance data and ratings from New York City public schools from 2007 to 2013. We find that performance signals affect overall turnover, but only at the lower end of performance ratings. Compared with schools earning a C grade, schools earning a D grade have higher levels of teacher turnover. Moreover, teachers from different racial groups respond to low-performance signals differently. Compared to their counterparts in schools that earned a C, white teachers in D schools are more likely to transfer to higher-rated schools. In contrast, Black teachers in D schools are more likely to exit NYC schools to join other districts or leave the profession entirely. This study deepens our understanding of employee turnover under performance regimes and shows an unintended effect of performance management: performance regimes drive minority teachers away and worsen the lack of representation.

08:58 , and

This case study analyzes interviews with 23 CoC representatives, a survey of 114 CoCs (33% response rate), and HUD performance data. We find limited evidence that funding levels are associated with reported measures of performance. Broadly, our data show that governance complexities and environmental constraints violate many of the principal-agent assumptions embodied within performance management doctrine. At the same time, interviews suggest that some CoCs use HUD reporting requirements for varied purposes, including catalytic and discursive capacities (Musso and Weare, 2019; Moynihan, 2008; Nathan 2008). Overall, CoCs are building performance management systems capacities, but still face challenges regarding sustainable organizational culture. Impediments to performance include both internal organizational factors and external factors such as lack of housing, limited funding, and regulatory restrictions. Overall, the evidence supports a more cooperative and discursive model for capacity building rather than a top-down view of performance management governance in networked grant-in-aid systems.

09:12 and

Our paper contributes to the collaborative performance literature. We argue that to understand shared data use during the implementation phase, we need to examine groups’ engagement with performance practices during the earlier planning and coordination phases using a temporal view. We also submit that the three mechanisms constitute broader theoretical streams that call for theorizing about specific causal pathways within them. We identify and examine three lower-level mechanisms that can help explain collective data use: ambiguity reduction, formality-informality complementarity, and identity creation.

To develop and illustrate our arguments, we employ a mechanism-based case study. This approach relies on the use of explanatory narratives, and it is particularly appropriate if the unit of the analysis is a social, interactive process. As our case, we selected the Citizen Security Plan in Jamaica (2020-2023). The Plan is an initiative that aims to combine addressing crime and safety issues with efforts of community development. It was selected because it requires government to collaborate; it relies on the use of goals and data; and it allowed us to observe changes across project phases.

08:30 and

This study conducts a nationwide survey of 50,000+ faculty at public postsecondary education institutions to assess what factors impact their awareness of student homelessness. We will conduct exploratory factor analysis to investigate a myriad of personal backgrounds, professional experience, university engagement, and campus resource item variables. We hypothesize that faculty with personal experience with homelessness, those in human service and social work fields, and those who frequently engage with their university resources are more likely to have increased awareness of student homelessness. Data collection was completed in December 2023.

08:45

To accomplish this, I will employ a two-way fixed-effect model using data sourced from the U.S. Census Bureau, Georgia Department of Education, Governor’s Office of Student Achievement, and National Center for Education Statistics (NCES). The dataset spans the school years from 2011 to 2019, with dependent variables of financial outcomes (total expenditure, instructional expenditure, fixed cost) and student outcomes (Georgia Milestones scores, graduation rate, school safety index).

09:00
09:15
08:30 , and

To study this, we investigate disaster resilience planning in a rural county in the Southeast of the US exposed to several natural disasters, including tornadoes, ice storms, and strong winds. The county is characterized as having a high level of social vulnerability compared to the rest of the US (US Federal Emergency Management Administration, 2023). The empirical base includes data from observations of local government public meetings, content analysis of relevant planning documents, and interviews with collaborative partners. The data are analyzed using social network analysis methods, including descriptive and inferential techniques. The findings have implications for public management theory and practice in resilience planning.

08:45 , and
09:00
09:15 and
08:30

Ordinal representation pertains to altering the order of representation among bureaucrats when cardinal representation cannot be improved. For instance, in an organization with four bureaucrats where two are female and two are male regarding gender representation, cardinal representation cannot be enhanced. To address the question of whether ordinal representation holds significance in coproduction, this study examines the ordinal effects of gender representation on individuals’ decisions to coproduce.

By employing two distinct policy areas—recycling and emergency preparedness—the study randomizes the order of female officials in a setting with two males and two females, where gender representation cannot be enhanced in a cardinal manner. Both experiments failed to consistently identify evidence of the ordinal effects resulting from placing females in different orders on citizens’ overall willingness to coproduce. However, the results revealed a pattern indicating that the gender of the chief leader influences an increase in the willingness of others of the same gender—and simultaneously decreases the willingness of their gender counterparts—to participate in coproduction.

08:45
09:00 , and
08:30
08:45 , and
09:00 and
09:15
08:30 and

This paper addresses the widening academic gap and examines the role of public education in fostering academic equity. This study responds to the call for a more holistic understanding of what perpetuates academically successful youth from historically inequitable backgrounds by linking the individual with their greater environment (McCoy & Bowen, 2015). Specifically, we ask, “what individual and institutional factors promote equitable access to higher education across marginalized student identities?” We propose a two-level, intersectional public education equity framework.

The framework is tested using data from surveys conducted among 1,400+ high school seniors and 50 guidance counselors in ten public high schools in the United States. The findings reveal misalignments between schools and individuals regarding perceptions of protective factors for social equity, indicating significant variations in the factors believed to impact access to higher education. Additionally, the study identifies certain risk factors for academic inequity, such as homelessness, first-generation status, lack of school resources, and financial constraints, which can be mitigated through protective factors such as societal expectations, family support, mentorship programs, and peer norming.

08:45

Research on place-based incentives has primarily focused on single incentive programs, concentrating on property values or job creation as desired outcomes. Few studies have compared multiple place-based investments or evaluated the combination of investments and resulting changes in equitable access to capital for neighborhood residents. This paper contributes to existing research by analyzing several programs—Community Development Block Grant (CDBG), Neighborhood Opportunity Fund (NOF), New Market Tax Credit (NMTC), Property Tax Abatement (PTA), Small Business Improvement Fund (SBIF), and Tax Increment Financing (TIF)—and how the related investments alter the racial composition of neighborhoods as a result of home loan approvals. In doing so, this paper offers a better understanding of and policy prescriptions for enhancing social equity when redeveloping and revitalizing local communities in need.

09:00 , and

We explore these tensions by drawing on quality rating data from England's Care Quality Commission to compare service quality across health and social care organizations that are government-run, CICs "spun-out" of the state, or privately-founded CICs. Specifically, we use ordered logit regression models to compare over 2,000 quality ratings of these three types of providers across five dimensions: safe, effective, caring, responsive, well-led, plus an overall rating. We draw on a 'publicness' theoretical framework to explore whether and to what extent public or private ownership, as well as the loss of public ownership through the ‘spin-out’ of public services into independent social enterprises, impacts quality. Our initial results show that overall, both types of social enterprise CICs performed better than government-run services, whilst non-spin-out CICs performed best on caring and responsive and spin-out CICs performed best on safe and effective dimensions.

09:15 and
08:30 and
08:45 and

Understanding this is important, first, because officials often have a more comprehensive view of local financial health than objective financial indicators can capture alone. Compared to information contained in financial statements, commonly used for indicators, local practitioners possess broader relevant data and a nuanced understanding of what it means in the local context. Second, local government officials, i.e. individuals positioned within a network of government and community actors, ultimately make local investment, policy, and programmatic decisions. As such, when it comes to understanding policy outputs, their perceptions of their municipality’s financial condition arguably matter more than objective measures.

Drawing from open system theory and the literature on perceived organizational outcomes, this research aims to explore whether public managers holding positions in different city departments have systematically different views on financial health. This research examines survey data from city officials in 273 Kansas cities with populations over 500. The survey, conducted between September 2023 and January 2024, targeted professionals in five positions—City Administrator and Directors of Public Works, Planning and Finance. Through descriptive and empirical analysis, this research illuminates how perceived local financial conditions in influence the decisions and fiscal responses across different organizations.

09:00 and
09:15 , , , and
10:15 and

Building upon previous works on policy designs of net energy metering, which have gained less attention despite its prevalence, we explore different tariff designs and create indexes encompassing various NEM tariffs. Using panel data (about 200 investor-owned utilities in 50 states from 2013 to 2021), we evaluate how different tariff designs have affected the penetration of distributed solar. By studying the correlation between policy designs and the adoption of DERs, our study contributes to policy design literature, understanding how various policy designs affect policy outcomes and how to design policies for other distributed resources.

10:30 and

Strategic management is often touted as an approach for integrating strategy formulation and implementation in response to environmental challenges. As one of the popular approaches used by the public sector, strategic management is often touted as a means for effective public service delivery. However, it is unclear whether current strategic management approaches are up to the task of addressing climate-related threats to the sustainability of public services at the local level where problems are fundamentally transboundary and require coordination across typical silos. We address this gap by asking: What manager-led processes drive resource-constrained cities to adapt their capabilities to the accelerating impacts of climate change? Using a novel mixed methods approach combining survey, text analysis of planning documents, and interviews, we examine how resource-constrained cities in Indiana integrate their capabilities and planning in response to climate change in the context of GSI.

10:45
11:00 and

While regulatory competition suggests a state would relax its enforcement on an entity when its corporate siblings (entities that belong to the same company) in other states have been penalized for violations, regulatory learning theory, predicts otherwise. When an entity’s corporate siblings become violators, it tarnishes the reputation of the whole company and indicates possible wrongdoing of the focal entity itself, prompting regulators to increase scrutiny on the focal entity.

We test the two competing theories using a facility-level panel dataset of Clean Air Act enforcement actions. Preliminary results show a mixed pattern. While regulators increase enforcement on a facility when its same-industry siblings located in the same state become high priority violators (regulatory learning dominates), they relax enforcement on the focal facility if the same-industry violator siblings are in competitor states (regulatory competition dominates).

10:15
10:30 , , and
10:45

Public Administration scholars must pay attention to this restructuring and its impacts to agency adjudication practices. This working systematic review of the Federal Administrative Judiciary will analyze distinct approaches employed by legal and public administration scholars to explore the conceptual and very practical tension between judicial independence and bureaucratic discretion. As the first systematic review regarding this topic, I expect to chronicle the development of these positions within the federal government by exploring institutional collaboration and influences. And finally, I hope to identify topics that may bolster comprehension of administrative adjudication in the USA.

This presentation is relevant to the overall theme of “Bringing Theory to Practice”. As a heavily applied social science, public administration scholars focusing in management must attend to the legal discourse, particularly regarding judicialized employees. ALJs are in such a position within an agency to provide a unique bridge between public administration and the legal discipline. With the ongoing restructuring of their position, there are ample opportunities for practice to also inform theoretical innovation.

10:15 and
10:29 , , , and
10:43 , and
10:57 and

The aim of our review is twofold. First, given the potential of relational contracting as an alternative to traditional contracting in complex situations, we aim to examine how relational approaches may or may not be a viable alternative to traditional transactional approaches. Second, we aim to contribute to the existing literature by developing an integrative framework of relational contracting as a way of managing buyer-supplier relationships in public procurement. Using ASReview Lab, an open-source machine learning software, we identify, collect, and assess relevant articles on this topic. Based on the findings, we develop an integrative framework of relational contracting in public procurement and present a research agenda to tackle theoretical and empirical lacunas in research into relational contracting.

10:15 , , , and

While previous literature discusses how politics and power structure shape global public policy and governance transformation, there is a notable gap in understanding grassroots-based practices that explore innovative narratives, actors, and strategies to establish a community of practices for poverty reduction in the global south. To address this research gap, we outline a processual, multilevel, network-centric perspective by investigating two community-based poverty reduction cases in Africa and China.

Our findings reveal that development narratives, actors’ networks, and pragmatically evolutionary practices constitute the three key pillars for building a community of practice focused on poverty reduction in the global south. The paper contributes to the literatures on the role of action research in poverty reduction in the global south, aligning with the first priority of the Sustainable Development Goals (SDGs). Meanwhile, it highlights the significance of knowledge network in the formulation and implementation of public policies. The study also bridges the knowledge gap between development theory and practical applications in poverty reduction in the global south.

10:30
10:45 and
11:00 and

Our mixed method study takes place in the Puget Sound Basin of Washington state, where we gather data on stakeholder perceptions from approximately 48 CGRs working on ecosystem recovery. We use an exploratory sequential design, starting with interviews to generate a list of indicators with which stakeholders evaluate usefulness of scientific information. We then draw on this list to develop a survey sent to approximately 800 stakeholders. Our initial data show that scientific information is considered most useful when it comes from a reputable source and is produced transparently. Unexpectedly, less valuable indicators of usability included peer-review and co-production with information users. Our study contributes to CGR theory on knowledge management, identifying qualities that may enhance likelihood that information influences joint decisions. It also offers policy implications for information producers, suggesting ways to enhance information’s usability for practitioners in ecosystem recovery.

10:15 , , and
10:30 and
10:45 and

Our quantitative study investigates the link between learning organization and job satisfaction and the mediating role of psychological safety in a policing context. We use the dimensions of learning organization questionnaire (DLOQ) developed by Watkins and Marsick (1997), Edmonson’s (1999) instrument for measuring psychological safety and the short index of job satisfaction (Sinval & Marôco, 2020). The participants in our study are experienced German police officers selected for future leadership positions.

11:00

Seeking to identify effective and efficient outreach methods, in 2023 the Northeast Ohio Regional Sewer District conducted a field experiment involving 56,000 households in the Cleveland, Ohio metropolitan area. Using a conjoint design, the experiment randomly assigned households to a control condition or one of up to 56 combinations of treatments. Treatments included black and white postcards, color postcards, letters from the utility, letters from a community organization, English-only messages, bilingual messages, and multiple mailings. Some mailings framed assistance in terms of dollar value, while others expressed benefits as percentage discounts.

Results indicate that direct mail significantly increased CAP inquiries, and that a single, simple black-and-white postcard was the most cost-effective medium. Surprisingly, messaging variables did not drive significantly different response rates. The study is a model of university-government collaboration, and its findings provide unprecedented evidence about direct mail as a means of reducing learning burdens for public assistance programs.

10:15
10:30 , , and
10:45

To understand data and data skills in city governments, this research proposal uses survey results from local government chief administrators in the census west region of the United States. The findings of this exploratory research suggest that 1) a data-skills gap exists in local government, 2) data skill expertise contributes indirectly to a chief administrator’s satisfaction in their organization’s overall data skills, and 3) data capture, curation, and analysis skills have smaller skill gaps compared to data communication and application skills. The findings provide important insight into the data skill needs of local governments and help identify important research questions for local governments and the acquisition of data skills.

11:00 , and

To answer this question, we employ a conjoint experiment on high-level directors in local government to determine their interest in applying to management positions given different job characteristics. Our conjoint survey experiment asks respondents to make four discrete choices between paired job descriptions. These job descriptions vary in characteristics of the work of city managers including the flexibility of the schedule, after-hours commitments, paid time off, perceived stability of the position, and requirements for public engagement. The data is then analyzed considering the respondent characteristics, position, mentorship, and family life considerations to more comprehensively explore the propensity of women to seek out next-level managerial roles based on these job requirements. This paper disentangles the question of whether women would be more interested in applying to city management roles if the position was designed differently. This study offers local governments recommendations for rethinking the nature of the city manager role.

10:15 , and
10:30 , and
10:45 and

Our findings indicate that self-motivation emerges as a significant factor positively influencing innovative behavior among national civil servants. Individuals who exhibit a strong internal drive and intrinsic motivation are more likely to engage in innovative practices, contributing to a culture of creativity within the public sector. Peer trust also emerges as a noteworthy factor associated with enhanced innovative behavior.

Surprisingly, institutional support, often considered as a key determinant in fostering innovation, was not found to have a significant impact on innovative behavior in our study. Similarly, the presence of competition among organizations within the public sector was not found to significantly influence innovative behavior among national civil servants. This nuanced finding invites a deeper exploration of the nature of competition and its implications for fostering innovation within the unique dynamics of national civil service environments.

The implications of these findings are substantial for public sector leaders and policymakers:.recognizing the importance of cultivating self-motivation and fostering peer trust can serve as a strategic approach to promote innovative behavior among civil servants.

11:00 and
10:15
10:30
10:45 and

We argue that team leadership assignments are gendered in ways that will disadvantage women. Whereas men are likely to be given leadership assignments that are conducive to continuing career progress within their organizations, women are likely to be given leadership assignments that hamper their progress. One reason for this is *structural*: Women and men begin their careers in different types of teams, and consequently accumulate early-career experiences that delimit their future leadership opportunities in divergent ways. A second reason is *aspirational*: Women who are candidates for open leadership positions will be inclined to doubt their qualifications, to be skeptical of their leadership capabilities, and to experience anxiety about assuming formal team-level leadership responsibilities. And a third reason is *stereotypical*: Organizational stakeholders who have input into promotion decisions will harbor differing expectations about women's and men's leadership potential, expectations that will tend to be more negative when it comes to women's leadership capacities.

We test these expectations using longitudinal, individual-level personnel data on United States federal employees.

10:15

However, in contrast to employees, politicians can be conceived as organizational outsiders. Instead, we therefore argue that political considerations affect how politicians assess and value performance measures. Specifically, we hypothesize that (a) politicians will perceive performance information featuring high and low performance signals differently, but also that (b) political ideology in terms of being aligned/opposed to the measured public services and (c) being affiliated/in opposition to the ruling political coalition will affect their perceptions.

To test these hypotheses, we conducted a pre-registered survey experiment among political candidates for Danish regional councils charged primarily with governing health care services (n=885). Respondents were randomly exposed to either no information or true performance information (high/low) about their own region’s health care system. They were then asked to evaluate the validity, legitimacy, and usefulness of the information, and whether they wanted to receive additional information. The results have potentially important practical implications concerning when political decision-makers are willing to trust and use performance information and policy evidence.

10:30

The model captures three dimensions at which legitimacy can be created or undermined: the political articulation of public interests (input), the administrative implementation process (throughput), and the results achieved for citizens (output). A comprehensive review of the literature will be structured along the ITO model. Initial findings suggest that results for 1) input and 2) output are mixed, while they are most promising regarding 3) throughput legitimacy.

First, while performance systems can increase political control, they are modest regarding strengthening minority interests. The management literature laments that a stronger results focus has not been accompanied by more resource autonomy, but such an increase in control is not a problem from a legitimacy perspective. At the same time, though performance systems can be pluralist in nature, evidence suggests they often reinforce existing power differentials.

Second, research documents that performance systems improve outcomes, but gains may not be necessarily equitable. Third, they can enhance the evidence base for decision making, and bias here is less of an issue from a democratic perspective if it reflects political values. Performance systems create process legitimacy if they capture citizen feedback, structure interactions between government and civil society, and increase citizen trust.

10:45 and

In this paper, we revisit the relationship between politics and administration, emphasizing how politics can influence agency performance even in the most professional and high performing agencies.

We describe the mechanisms by which political alignment or misalignment influence performance. We detail how presidents work to 1) change outputs by directly influencing agency capacity (e.g., budget and personnel levels) and 2) change outputs without directly targeting capacity by using the tools of the administrative presidency to let capacity idle, reorient capacity, or diminish capacity indirectly.

We test these relationships using newly created measures of agency performance for 139 U.S. federal agencies during the 2000-2022 period. The new measures combine dozens of subjective and objective measures of performance that vary across agencies and time. We conclude with the implications of our findings for future research focusing on the intersection of both politics and management.

11:00
10:15 , , and
10:30
10:45 and
10:15 , and
10:30 , and
10:45 , and

COMMENTS

  1. Exploratory Data Analysis (EDA)

    Exploratory Data Analysis (EDA) - Retail Case Study Example (Part 3) For the last couple of weeks we have been working on a marketing analytics case study example (read Part 1 and Part 2 ). In the last part ( Part 2) we defined a couple of advanced analytics objectives based on the business problem at an online retail company called DresSmart ...

  2. 9. Case Studies

    Introduction. In this diverse collection of case studies, the power of Exploratory Data Analysis (EDA) shines as a critical tool for understanding and extracting insights from various datasets across different domains. Each case study focuses on a specific problem domain, ranging from e-commerce customer behavior analysis to predictive ...

  3. Exploratory Data Analysis on E-Commerce Data

    You got the prediction — without first understanding the data. This is exactly where the importance of Exploratory Data Analysis (EDA) (as defined by. Jaideep Khare. ) comes in which, unfortunately, is a commonly undervalued step as part of the data science process. EDA is so important for 3 reasons (at least) as stated below:

  4. Exploratory Data Analysis: A case study

    Exploratory data analysis sits at the core of any insightful data work. Performing log analysis in search for threats is no different. Jun 5. Afrifa Kwame. Cyclistic Bike-Share Analysis Case Study ...

  5. Using Exploratory Data Analysis to Improve the Fresh Foods ...

    This case study demonstrates how explorative data analysis and basic statistics helped reduce the inefficiencies in the retail inventory and ordering process of fresh foods within grocery chains. Low shelf life and fluctuating demand had led to the need to hold clearance sales with zero or negative margin or to write off the inventory as ...

  6. Exploratory Data Analysis

    ...Exploratory data analysis is a powerful tool. A diligent EDA is an absolute must to put your advanced business analytics in the right direction.

  7. Exploratory Data Analysis with an Example

    Introduction. Exploratory Data Analysis helps in identifying any outlier data points, understanding the relationships between the various attributes and structure of the data, recognizing the important variables. It helps in framing questions and visualizing the results, paving the way to make an informed choice of the machine learning ...

  8. Exploratory Data Analysis (EDA) using SQL

    The best way to learn a skill is by applying it on hands-on projects, so today we will perform exploratory data analysis using SQL in a retail database to answer the following questions: The first…

  9. Exploratory Data Analysis

    Explore and run machine learning code with Kaggle Notebooks | Using data from Sample super store. code. New Notebook. table_chart. New Dataset. tenancy. New Model. emoji_events. New Competition. corporate_fare. New Organization. ... Exploratory Data Analysis - Retail Python · Sample super store. Exploratory Data Analysis - Retail. Notebook ...

  10. Unlocking Customer Insights: Exploratory Data Analysis in Retail for

    Exploratory Data Analysis (EDA) is an essential phase in the data analysis process. It involves a comprehensive and visual examination of a dataset to discover patterns, trends, and hidden insights.

  11. Exploratory Data Analysis (EDA)

    This article was published as a part of the Data Science Blogathon. Exploratory Data Analysis, or EDA, is an important step in any Data Analysis or Data Science project. EDA is the process of investigating the dataset to discover patterns, and anomalies (outliers), and form hypotheses based on our understanding of the dataset.

  12. PDF Big-Data-Analytics-Retail Case Study

    BrandttitudeTM is a unique analytics platform that executes exploratory data analytics, predictive modeling and self-service analytics to deliver compelling metrics in diverse categories. It provides accurate and actionable insights to. decision-makers across various industries through a unified view of disparate data.

  13. Exploratory Data Analysis-Retail

    Exploratory Data Analysis-Retail Selamawit 2023-05-02. 1. Introduction; 2. Task Summary; 3. Process. ... 1. Introduction. Performing "Exploratory Data Analysis ... There is a loss that we can see in Book Case, Supplies and Table subcategory. Top five States that are performing good are: ...

  14. What is Exploratory Data Analysis: Types, Steps, & Examples

    3. Multivariate Exploratory Data Analysis. Multivariate analysis helps to analyze and understand the relationship between two or more variables simultaneously. It helps unveil more complex associations and patterns within the data. For example, it explores the relationship between a person's height, weight, and age.

  15. Exploratory data analysis using supermarket sales data in Python

    Image by GraphicMama-team from Pixabay. Before venturing on to any data science project it is important to pre-process the data and also to explore the data. Today we will discuss a very basic topic of exploratory data analysis (EDA) using Python and also uncover how simple EDA can be extremely helpful in performing preliminary data analysis.

  16. PDF Chapter 3: Method (Exploratory Case Study)

    Field The method phenomenon a variety used for an exploratory case (Merriam, of a resources, understanding evaluating Yin (2010) of described a qualitative the perceptions analyzing qualitative evaluations of people regarding Qualitative methods and presenting the findings. to produce as collecting a particular.

  17. Exploratory Data Analysis: A Guide with Examples

    In this post, we will provide a step-by-step guide to EDA with some examples. Step 1: Data Collection and Preparation The first step in EDA is to collect the data and prepare it for analysis. This ...

  18. PDF Chapter 4 Exploratory Data Analysis

    Chapter 4Exploratory D. ta AnalysisA rst look at the data.As mentioned in Chapter 1, exploratory data analysis or \EDA" is a critical rst step in an. lyzing the data from an experiment. H. mong the explanatory variables, andassessing the direction and rough size of relationships betwee.

  19. Model Selection

    As you might have noticed, in this case study, we have so far performed 3 tasks from the above list i.e. 'Description' (exploratory data analysis), 'Association' (association analysis), and 'Classification' (decision trees: CART and C4.5).I must say, exploratory data analysis (EDA) is an integral part of every data science project.

  20. Exploratory Data Analysis of a US Retail Sample Store.

    3.0.Quick Statistics. The dataset contains 13 columns and 9,994 rows, all sales records for 18 sub-categories ranging from copiers to tables. Below is a brief statistical summary of the dataset ...

  21. NLP Part 3

    Let's get a better understanding of our newly cleansed dataset. Exploratory Data Analysis (EDA) is the process by which the data analyst becomes acquainted with their data to drive intuition and begin to formulate testable hypotheses. This process typically makes use of descriptive statistics and visualizations.

  22. Exploratory Data Analysis on sample dataset: Sample Superstore

    EDA or exploratory data analysis is a very important step when working with data. This step helps one get familiar with the dataset before performing any analysis or predictions on it. I took the ...

  23. Association Analysis

    This is a continuation of the case study example of marketing analytics we have been discussing for the last few articles. You can find the previous parts at the following links ( Part 1, Part 2, and Part 3).. In the last part, we discussed exploratory data analysis (EDA: Part 3).In this article we will talk about association analysis, a helpful technique to mine interesting patterns in ...

  24. Program for Saturday, June 29th

    The findings of this exploratory research suggest that 1) a data-skills gap exists in local government, 2) data skill expertise contributes indirectly to a chief administrator's satisfaction in their organization's overall data skills, and 3) data capture, curation, and analysis skills have smaller skill gaps compared to data communication ...

  25. Perceived Behavioral Factors and Individual Investor Stock Market

    The partially completed survey forms were eliminated. A sample of 362 respondents was presented for the investigation, which is statistically significant with respect to the literature to proceed for analysis. The data of the current study were collected between January 2021 and June 2021.