Exploratory Data Analysis (EDA)
9. case studies, introduction.
In this diverse collection of case studies , the power of Exploratory Data Analysis (EDA) shines as a critical tool for understanding and extracting insights from various datasets across different domains. Each case study focuses on a specific problem domain, ranging from e-commerce customer behavior analysis to predictive maintenance in manufacturing industries . The primary goal of these analyses is to leverage EDA techniques to unravel hidden patterns , relationships , and trends within the data, leading to data-driven decisions and optimized strategies .
Throughout these case studies, diverse datasets play a pivotal role in providing a deep understanding of the subject matter. These datasets encompass online retail transaction records , electronic health records , credit card transaction data , environmental sensor readings , marketing campaign metrics , social media sentiments , GPS data from vehicles , student academic records , agricultural data , and manufacturing equipment sensor readings . Armed with these diverse datasets, analysts embark on an EDA journey, employing tools like Python , R , Pandas , Matplotlib , Seaborn , Plotly , Geopandas , and more.
The EDA process unfolds through several key steps, including data cleaning and preprocessing to ensure data quality, exploration of variables and patterns, and compelling visualizations that bring insights to life. The results of EDA reveal essential facets of each domain, such as best-selling products , customer segmentation , healthcare outcomes , fraudulent transactions , pollution hotspots , successful marketing campaigns , sentiment analysis , optimized transportation routes , student academic performance factors , and predictive equipment maintenance .
These case studies illustrate the indispensable role of Exploratory Data Analysis in empowering decision-makers across industries. By unlocking the valuable insights buried within vast datasets, EDA empowers businesses and organizations to optimize their strategies, enhance customer experiences, improve healthcare quality, prevent fraud, protect the environment, target marketing efforts, and optimize logistics. As a foundational step in the data analysis journey, EDA serves as a powerful bridge between raw data and actionable knowledge, opening up a world of possibilities for data-driven innovation and problem-solving.
Case Studies
E-Commerce Customer Behavior Analysis :
Description : This case study aims to understand customer behavior in an online retail business to improve marketing and product strategies.
Dataset : Online retail dataset containing transactional records, customer IDs, product details, timestamps, and order quantities.
Tools : Python with Pandas for data manipulation, Matplotlib and Seaborn for data visualization.
Steps using EDA :
Data cleaning and preprocessing to handle missing values and remove duplicates.
Exploring product popularity, customer purchase patterns, and customer segmentation.
Visualizing purchase trends, seasonal patterns, and revenue growth.
Results : Identifying best-selling products, peak shopping hours, customer segments, and trends in revenue growth.
Healthcare Patient Outcomes Analysis :
Description : This case study focuses on analyzing patient outcomes based on electronic health records (EHR) to improve healthcare quality.
Dataset : Electronic health records (EHR) with patient demographics, medical history, diagnoses, treatments, and patient outcomes.
Tools : R with dplyr and ggplot2 for data wrangling and visualization.
Data preprocessing and cleaning to handle missing values and outliers.
Exploring patient demographics, disease prevalence, and treatment efficacy.
Visualizing readmission rates, mortality rates, and correlations between variables.
Results : Identifying factors influencing patient outcomes, trends in readmission rates, and potential areas for healthcare improvement.
Financial Fraud Detection :
Description : This case study aims to detect fraudulent transactions in credit card data to enhance fraud prevention systems.
Dataset : Credit card transaction data with details such as transaction amounts, locations, timestamps, and customer IDs.
Tools : Python with Pandas for data preprocessing, Matplotlib and Seaborn for visualization, and machine learning algorithms for fraud detection.
Data cleaning and preprocessing to handle imbalanced classes and outliers.
Exploring transaction patterns, correlations, and frequency of fraud cases.
Visualizing transaction amounts, fraudulent vs. non-fraudulent transactions, and identifying potential fraud hotspots.
Results : Identifying unusual spending patterns, high-risk transactions, and improving fraud detection accuracy.
Environmental Sensor Data Analysis :
Description : This case study involves analyzing environmental sensor data to understand air quality trends and pollution sources .
Dataset : Air quality sensor data with measurements of pollutants like CO2, PM2.5, and ozone at various locations and timestamps.
Tools : Python with Pandas for data cleaning, Plotly for interactive visualizations, and geographical libraries for mapping.
Data preprocessing to handle missing values and outliers in sensor readings.
Exploring pollutant levels, spatial distributions, and temporal trends.
Visualizing pollution hotspots and correlations between pollutants.
Results : Identifying areas with poor air quality, trends in pollutant levels, and potential pollution sources.
Marketing Campaign Performance Analysis :
Description : This case study involves analyzing the performance of marketing campaigns to optimize marketing strategies.
Dataset : Marketing campaign data with details of campaigns, customer responses, conversions, and costs.
Tools : R with tidyverse for data manipulation, ggplot2 for visualization, and A/B testing tools for campaign performance analysis.
Data cleaning and preprocessing to handle missing data and inconsistencies.
Exploring campaign performance metrics, customer response rates, and conversion rates.
Visualizing campaign effectiveness, customer segmentation, and A/B test results.
Results : Identifying successful marketing campaigns, high-converting strategies, and customer segments with the best response rates.
Social Media Sentiment Analysis :
Description : This case study aims to analyze social media data to gauge public sentiment about products, brands, or events.
Dataset : Twitter or Facebook data with text posts, timestamps, and user engagement metrics.
Tools : Python with TextBlob or NLTK for sentiment analysis, WordCloud for word visualization, and Matplotlib for plotting.
Text preprocessing to handle stopwords, special characters, and convert text to lowercase.
Analyzing sentiment scores, word frequencies, and trending topics.
Visualizing word clouds to highlight positive and negative sentiment words.
Results : Identifying overall sentiment towards products or brands, popular topics, and public perception trends.
Transportation and Logistics Optimization :
Description : This case study involves optimizing transportation and logistics operations to improve efficiency and reduce costs.
Dataset : GPS data from vehicles, delivery records, traffic information, and location details.
Tools : Python with Geopandas for geospatial analysis, NetworkX for route optimization, and visualization libraries for maps.
Data preprocessing to handle GPS data, normalize timestamps, and clean location data.
Exploring traffic patterns, congestion points, and delivery routes.
Visualizing optimized routes and delivery efficiency.
Results : Identifying bottlenecks, optimizing delivery schedules, and reducing transportation costs.
Education Performance Analysis :
Description : This case study focuses on analyzing student performance data to understand factors influencing academic outcomes.
Dataset : Student academic records with grades, attendance, test scores, and demographics.
Tools : R with tidyr and dplyr for data tidying, ggplot2 for visualizations, and machine learning models for performance prediction.
Data cleaning and preprocessing to handle missing grades and attendance records.
Exploring student demographics, grade distributions, and attendance patterns.
Visualizing performance trends, correlations between variables, and predicting academic performance.
Results : Identifying factors affecting student academic performance, predicting at-risk students, and designing targeted interventions.
Agricultural Yield Prediction :
Description : This case study aims to predict crop yields based on agricultural data to optimize planting strategies.
Dataset : Agricultural data with historical weather data, soil characteristics, crop details, and yields.
Tools : Python with NumPy and Pandas for data manipulation, Scikit-learn for regression models, and visualization libraries for plotting.
Data preprocessing to handle missing weather data and crop details.
Exploring weather patterns, correlations between weather variables, and crop yields.
Visualizing yield predictions and comparing with actual yields.
Results : Identifying the correlation between weather patterns and crop yields, optimizing planting schedules, and predicting future harvest outcomes.
Predictive Maintenance in Manufacturing :
Description : This case study focuses on predictive maintenance in manufacturing industries to reduce downtime and improve productivity.
Dataset : Sensor data from manufacturing equipment , including temperature, vibration, and other performance indicators.
Tools : Python with Pandas for data preprocessing, Plotly for visualization, and machine learning algorithms for predictive maintenance.
Data cleaning and preprocessing to handle missing sensor readings and outliers.
Exploring sensor data patterns, correlations between sensor variables, and anomalies.
Visualizing predictive maintenance predictions and comparing with actual breakdowns.
Results : Identifying early signs of equipment failure, scheduling maintenance proactively, and minimizing unplanned downtime.
In each case study, the Exploratory Data Analysis (EDA) process plays a crucial role in uncovering insights, trends, and relationships within the data. By using various data cleaning, exploration, and visualization techniques, analysts can gain valuable insights to make data-driven decisions and optimize processes in different domains. The results obtained through EDA inform subsequent analyses, help refine strategies, and lead to improvements in various aspects of the business or domain being studied.
Last updated 11 months ago
- Show All Code
- Hide All Code
Exploratory Data Analysis-Retail
1. introduction, 2. task summary, 3.1. installing and loading packages, 3.2. importing dataset, 3.3. cleaning the data, 4.1 aggregation, percentage by category, percentage by region, conclusion and recommendations.
Performing “Exploratory Data Analysis” on dataset “Superstore”; a task provided by The Sparks Foundation for Graduate Rotational Internship Program .
Using the dataset provided, try to find out the weak areas where the company is not performing well so that the business manager can focus on those areas to make more profit in the future by drawing best recommendations from the analysis.
For this analysis, we will answer the following questions:-
Which states are selling less quantities and are less profitable and also which states are doing good?
What insight we can find by analyzing the “Quantity” and “Profit” by “Region” and “Category”?
Does the “Ship Mode” and “Discount” have an impact on the profit and quantity sold.
After analyzing the data to answer the above question, what recommendations can we draw to implement in making less profitable categories and regions do better?
4. Exploratory Analysis
We will begin the analysis process by summarizing our data which includes calculating average and sum of sales, quantity, profit and grouping by region and categories.
First we plot a graph to analyze the profit by region and category
Observation
→ In all categories West region is performing good.
→ In furniture category, we can see a loss which is in Central region.
→ Among the three Categories Office Supplies is the top profitable followed by technology.
→ Furniture is the least profitable category in all regions.
Now let see if “Quantity” and “Profit” have a positive correlation;
As we can see from the above plot there is no positive correlation between the quantity sold and the profit earned; so that we can safely say that quantity sold is not the factor for being profitable in all categories and regions.
4.2 Percentage
→ For a better visualization, let’s plot a pie chart
From the above pie chart we can see that: -
→ 60% of total sales is in the office supplies category,
→ 32% of the total sales is in the West region.
Now let see which State, subcategory and ship mode are more profitable;
From the above graph we can see that:
→ “Copiers” are number one profitable subcategory followed by “Phones”,
→ Three subcategories; “Bookcases”, “Supplies” and “Tables” noticeably lower in profit and having a loss.
Profitable States
The top 5 States from the perspective of profitability are: -
Shipping mode percentage and Profit
As we can see from the above chart, Standard Class is the number one profitable shipping mode followed by Second Class shipping mode.
Shipping mode can be one of the factor for being profitable as we can see some trend from the above visualization.
Discount impact on average profit
It shows that the higher the discount the lower the profit; so that discount can also be the other factor for loss and not being profitable.
To conclude our analysis: -
West region shows a good performance in sales and profit which is 32% on the other hand Central America is the list performing state.
Office Supplies is the top profitable category which is 60% followed by technology.
Furniture is the least profitable category in all region.
Quantity and profit doesn’t have a positive correlation.
Copier is more profitable subcategory followed by Phone
There is a loss that we can see in Book Case , Supplies and Table subcategory.
Top five States that are performing good are: California, Michigan, New York, Virginia and Washington .
Recommendation
1.For the states and subcategories that are already profitable, we need to focus more in promoting sales and distribution of the product.
For the items that encounters loss, we need to remove discount and take in to consideration what kind of shipping mode can be more cost effective based on the region that the item shipped.
The majority of categories that are more profitable, Standard Class ship mode used so that, it is more ideal to charge extra fee for the customer who likes to use the other shipping mode.
Discount is more ideal in the event of customers ordering a bulk amount and when they agree to use the most cost effective shipping mode avaliable.
Exploratory Data Analysis: A Comprehensive Guide to Make Insightful Decisions
Today, data has become ubiquitous, affecting every aspect of our lives. From social media to financial transactions to healthcare, the generated data is abundant, which has ignited a data-driven revolution in every industry.
As the data sets grew, the complexity of understanding these data increased, and that’s how exploratory data analysis was uncovered. The data analysis process has many attributes, like cleansing, transforming, and analyzing the data to build models with relevant insights.
Exploratory Data Analysis is all about knowing your data in depth, such as significant correlations and repeating patterns, to select the proper analysis methods. In this article, you will learn about Exploratory Data Analysis, its key features, how to perform EDA, its types, the integration challenges, and how to overcome them.
What is Exploratory Data Analysis?
Understanding your data in depth before performing any analysis on it is essential. You need to know the patterns, the variables, and how those variables relate to each other, among other things. EDA (exploratory data analysis) is the process that examines, summarizes, visualizes, and understands your data to generate data-driven conclusions and further study of the information.
Let’s look at some of the key attributes of EDA:
Summarizing the Data
EDA allows data summarization using descriptive statistics, which helps understand the deviation, central tendency, and distribution of the values.
Data Visualizing
By implementing EDA, you can utilize varied visualization techniques, such as histograms, line charts, scatter plots, etc, to explore patterns, relationships, or trends within the data set.
Data Cleansing
Data cleansing is crucial for driving meaningful insights and ensuring accuracy. EDA helps you clean your data by identifying errors, inconsistencies, duplicates, or missing values.
What are the Types of Exploratory Data Analysis?
Understanding exploratory data analysis techniques helps with efficient data analysis and meaningful decisions. Let’s look at three primary exploratory data analysis types.
1. Univariate Exploratory Data Analysis
It is a type of analysis in which you study one variable at a time. This helps you understand the characteristics of a particular variable without getting distracted by other factors. For example, focusing on product sales to know which product is doing better.
There are two types of Univariate Analysis:
- Graphical Methods: Graphical methods use visual tools to understand the characteristics of a single variable. The visual tools may include histograms, box plots, density pots, etc.
- Non-Graphical Methods: In non-graphical methods, numerical values can be used to study the characteristics of a single variable. These numerical measures include descriptive analysis, such as mean, median, mode, dispersion measures, or percentiles.
2. Bivariate Exploratory Data Analysis
Bivariate analysis focuses on studying a relationship between two variables. It examines how a change in one variable can impact another. For example, the relationship between customer age and product category can be analyzed to determine whether age influences buying preferences. There are multiple ways to examine this relationship, both graphically and non-graphically.
The graphical methods include scatter plots, line charts, and bubble charts to visualize the relationship between two numeric values. The non-graphical method examines the relationship between two variables through correlation, regression analysis, or chi-square test.
3. Multivariate Exploratory Data Analysis
Multivariate analysis helps to analyze and understand the relationship between two or more variables simultaneously. It helps unveil more complex associations and patterns within the data. For example, it explores the relationship between a person's height, weight, and age.
There are two types of Multivariate Analysis:
- Graphical Methods: These methods help analyze patterns or associations through visualization, such as scatter plots and heat maps.
- Non-Graphical Methods: These statistical techniques are used to make predictions, test a hypothesis, or draw conclusions. They include methods such as multiple regression analysis, factor analysis, cluster analysis, etc.
How to Perform Exploratory Data Analysis?
Let’s look at the Exploratory data analysis steps:
1. Data Collection
You should recognize the need to collect the relevant data from various sources and collect the data according to the purpose of your analysis.
2. Inspecting the Data Variables
As the information collected is vast, you must first identify the critical variables that would affect your outcome or impact your decision.
3. Data Cleansing
In this step, you need to clean your data by identifying the errors, missing values, inconsistencies, or duplicate values.
4. Identifying Patterns and Correlations
Visualize your data sets using different data visualization tools to understand the patterns and relationships between variables.
5. Performing Descriptive Statistics
Calculate a statistical summary of your data using statistical methods or formulas.
6. Perform Advanced Analysis
Conduct an advanced analysis beyond descriptive statistics by defining objectives and specifying questions you want answered. This allows you to gain deeper insights into your data and identify complex relationship
7. Interpret Data
Interpret the results of your analysis and generate insights and conclusions based on the relationships, patterns, and trends.
8. Document and Report
Document the results of your analysis, including all the steps and techniques you have used. Mention your key findings and summarize the data inside the report, which you can use to share insights with your stakeholders.
Exploratory Data Analysis Tools You Can Use
EDA can be conducted using different tools or software platforms. Let’s look at some of the popular tools used:
Python’s simplicity makes it a preferable choice for data analysis. Its rich library ecosystem includes libraries like NumPy, Pandas, and Scikit-learn for machine learning, data manipulation, and numerical tasks. These libraries provide versatile toolsets for EDA and visualization. With Python, you can effectively handle and manage missing data and uncover valuable insights from large datasets.
R is a general-purpose programming language designed for data analysis and graphics. With R, you can handle and manipulate complex statistical datasets. It provides a rich set of tools for data cleaning, transformation, and analysis, allowing you to prepare data for further exploration. Additionally, R provides built-in analysis functionalities for performing detailed ERA and uncovering valuable insights from your data.
MATLAB stands for Matrix Laboratory. It is a high-level programming language and an interactive environment designed to perform numerical computations, data analysis, and visualizations. MATLAB is widely used to solve problems and provide numerical stimulation in mathematics, physics, finance, and engineering.
Jupyter Notebooks
Jupyter is an interactive computing environment that allows you to create and share documents containing texts, equations, live code, visualizations, and more. It supports multiple programming languages, which makes it a versatile data analysis and research tool. Jupyter Notebooks' flexible environment provides an interactive workspace for working on EDA projects.
What are the Challenges of EDA?
Before implementing EDA, you must know the challenges you might face ahead of time. Let’s look at some of them:
Data Unification
One of the most significant hurdles is data unification. Combining data from multiple sources, such as APIs, cloud platforms, or databases, before analysis can be a complex task. The challenge lies in the differences in data format or structure across these sources. For instance, customer data from CRM can be structured differently than website traffic data from analytics applications.
Data Quality
Performing EDA involves ensuring the quality of the data. The data is sourced from multiple sources, and these data sets may have inconsistencies, missing values, errors, etc., which may lead to incorrect conclusions.
Data Security
When handling large amounts of sensitive data, you might encounter security issues where unauthorized access or data breaches can become a crucial challenge.
Data Consistency
It is essential to reflect the updates or modifications in all related data sets, which helps maintain data consistency and synchronize your integration process.
Addressing these challenges might require a robust solution combining best practices for smooth data integration .
Simplifying the EDA Process with Airbyte
EDA is a crucial step in any data science project. However, the initial stages of EDA can be bogged down by challenges related to data preparation and consolidation. Here, Airbyte simplifies these challenges and empowers you to focus on the crucial task—extracting insights from your data.
Here’s how Airtbyte helps you to streamline the EDA process:
Airbyte is a data integration and replication platform with an extensive library of 350+ pre-built connectors . The library contains a varied range of APIs, databases, and flat files, making the integration process smooth. Through the Connector Builder Kit (CDK), you can also build customized connectors according to your work-specified needs.
For complex transformations, you can seamlessly integrate Airbyte with dbt (data build tool). This powerful combination allows you to leverage dbt’s functionalities like data standardization, cleansing, and mapping before loading it into your target destination.
What’s more! Airbyte adheres to industry standards such as GDPR, SOC2, ISO certifications, and so on, providing data security and compliance.
Use Cases & Examples of Exploratory Data Analysis
Exploratory data analysis is essential for understanding the trends and patterns among the data and using the information to derive insightful conclusions. Let’s look at exploratory data analysis examples and use cases.
EDA in Retail
EDA in retail can be performed to understand a particular product's sales patterns. Retailers can use the EDA tools to improve their sales by studying how different variables such as price, discount, and demographics contribute to increases or decreases in sales. They can also know which products are doing best and in which region the sales of a particular product are high.
EDA in Healthcare
EDA can be used to analyze clinical trial data and study the effectiveness of a particular drug or treatment. It helps you analyze patients' data, how they respond to a certain treatment, risk factors, etc. Insights from EDA can help healthcare professionals make more informed decisions about patients' care, treatment strategies, and resource allocation.
EDA isn’t just about examining data. It’s a robust technique that unlocks meaningful insights from even complex data structures. By employing EDA, you can gain a deeper understanding of your data, enabling you to identify areas for improvement and make strategic data-driven decisions for your business.
Suggested Read:
Data Analytics vs Data Analysis
About the Author
Table of contents, get your data syncing in minutes, join our newsletter to get all the insights on the data stack., integrate with 300+ apps using airbyte, integrate and move data across 300+ apps using airbyte., related posts.
Association Analysis – Retail Case Study Example (Part 4)
Edward Scissorhands – by Roopam
This is a continuation of the case study example of marketing analytics we have been discussing for the last few articles. You can find the previous parts at the following links ( Part 1 , Part 2 , and Part 3 ). In the last part, we discussed exploratory data analysis (EDA: Part 3 ). In this article we will talk about association analysis, a helpful technique to mine interesting patterns in customers’ transaction data. Association analysis can be used as a handy tool for extended exploratory data analysis. By the way, association analysis is also the core of market basket analysis or sequence analysis. Later in the article, we will use association analysis in our case study example to design effective offer catalogs for campaigns and also online store design (website).
Scissorhands
I must have been 9 or 10 years old when in our school we had our first craft lecture. Craft lectures are called SUPW in India, it’s an abbreviation for ‘ Socially Useful Productive Work’ . As a part of the first lecture, each student was provided with an A4 sized color paper and a pair of scissors. In the first lecture excited kids with no direction discovered that they could cut a sheet in a virtually infinite number of ways. It was neither socially useful nor productive work, and created a lot of wasted paper. A more apt long form of SUPW in this case is ‘ Some Useful Paper Wasted’ . Later with a more directed effort we discovered that there are so many cool shapes hidden in a piece of paper as long as scissors are used wisely.
This is precisely the kind of experience many analysts have when they come across customers’ transaction data in companies. There is wealth of information about customer behavior hidden in this data but it is hard to figure out where to start. Transaction data can be sliced, diced and grouped in infinitely many ways similar to a piece of paper dissected with scissors. The key in both these above cases is direction.
Hollywood Image of Data Analysis
The point I am trying to drive at here is that data analysis is a highly planned activity. As an analyst never touch your data before you have a proper plan of action (hypotheses etc.) in place. Having said this there are always going to be times as an analyst, when you have to enter uncharted territories of data to find patterns. In these cases, I will recommend you rely on machine learning algorithms or create your own modified algorithms specific to your requirements. In my opinion, machines are any day better than us humans at this task. Association analysis powered by the Apriori algorithm is one such technique to mine transaction data. Let’s explore association analysis in the next part.
Association Analysis
Association analysis, as you will discover soon, is primarily frequency analysis performed on a large dataset. Since datasets for most practical problems are large you need clever algorithms like Apriori to manage association analysis.Let’s consider a much smaller transaction dataset to learn about association analysis. Here, each row or transaction number represents market baskets of customers. For the subsequent products columns, 1 represents ‘bought the product in that transaction’, whereas, 0 stands for ‘did not buy’.
Transaction # | Shirts | Trousers | Ties |
001 | 1 | 1 | 1 |
002 | 0 | 1 | 0 |
003 | 1 | 0 | 1 |
004 | 1 | 0 | 1 |
005 | 1 | 1 | 0 |
There are a few association analysis metrics (i.e. support, confidence, and lift) that are really helpful in deciphering information hidden in this kind of dataset. Let us explore these metrics and understand their usage. Support for purchase of shirts and ties together in association analysis is defined as:
For our data there are 3 transactions with both shirts and ties (shirts∩ties) out of total 5 transactions.
60% is a fairly high value for support and you will rarely find such high values for support in real world examples. For real world problems with several product groups, support of 1% or at times even lower depending upon the nature of your problem is also useful.
Confidence for association is calculated using the following formula:
In our dataset, there are 3 transaction for both shirts and ties together out of 4 transactions for shirts. The calculation for confidence for our dataset is:
Again you will rarely find such high value of confidence for most real world problems unless there are appealing combo offers on two products. A good value of confidence is again problem specific.
A third useful metric for association analysis is lift; it is defined as:
Expected confidence in the above formula is presence of ties in the overall dataset i.e. there are 4 instances of ties purchase out of 5.
The value for lift, 125%, shows that purchases of the ties improve when the customers buy shirts. The question you are asking here is that if the customer buys a shirt, does his chance of buying ties go up i.e. value of lift above 100%. Let us use our knowledge about association analysis for the case study example we have been working on.
Retail Case Study Example – Association Analysis
With your data for formal shirts and ties we explored in the above example, you got support of 0.2% with confidence of 12% and lift of 509%. This implies that though there are fewer percentage records of transactions with both ties and shirts, once the customers buys formal shirts his chances of buying a tie goes up five fold.
DresSMart provides the option to it’s customers to return the undamaged product back within 30 days with full refund. You did a further investigation of customers who are buying ties along with shirts and found that product return rates of the ties for these transactions are also 3 times more than the other return rates. This is an indicator that customers are struggling to choose matching ties while placing the orders online along with shirts. There is a need to improve this process on the company’s website. The idea is to reduce product return rate while exploiting the full opportunity for cross selling ties with shirts.
You have found some good clues to improve the profitability of your company through exploratory data analysis tools. Now you want to prepare and address the original objectives ( Part 2 ) to improve profitability for campaign efforts. You will delve into serious modeling for this task next time around.
Sign-off Note
Hope you enjoy being Edward Scissorhands with your data! See you soon with the next part of this case study example where we will explore more about decision tree algorithms.
18 thoughts on “ Association Analysis – Retail Case Study Example (Part 4) ”
Liked the article. I wanted to know how feasible is it using association analysis for online path analysis and clickstream data. For example, how two different page urls are used and so on. How can I use apriori algorithm for improvement of the model?
Thanks in advance. Poonam
Thanks Poonam, I am glad you enjoyed this article.
The way you have described your problem, I don’t see a reason why association/sequence analysis won’t work. The only thing you may have to keep in mind is that if the number of page urls are too high (say 500 or more) then you may want to categorize them into broader categories before doing your analysis.
For the Apriori algorithm you can use ‘arules’ package in R. Association analysis is not so much a model but a method to create simple rules using frequency & basic probability analysis. Hence, the Apriori algorithm is not to improve any models but to find these rules efficiently. I hope this helped let me know if you need any further help.
You are awesome ,Innovate resourceful teacher ,,.. Thank you for your wonderful articles
I have read almost all of your articles. But I didn’t find any article on Maximum likelihood estimator(MLE). Request you to write article on OLS VS MLE vs Bayesian Estimator with examples.
Thanks Rajanna for the kind words. I am really happy you are enjoying the articles. Will discuss Maximum Likelihood and other techniques in some later articles.
could you please list in here the URLs of the previous three parts pf this blog?
Here you go: Part 1 , Part 2 , and Part 3
Roopam, thanks for presenting this articles. i came up with the following situation while doing the association rules. Please let me know how to select the best rule in the following situation.
support confidence lift rule 0.5 0.6 2.86 lhs= Rin rhs=surf excel 0.5 0.6 2.86 lhs= Rin rhs=dettol 0.4 0.5 2.86 lhs= diaper rhs=surf excel
Good job turning this case study into a an interesting story. Like a good book, I can’t put it down before I learn how it ends! Thanks for educating the world on how useful yet not frightening data analysis can be.
Thank you, I am really happy you are enjoying this case, and learning from it.
Hello Roopam, Thanks for publishing such an informative article in a simple layman’s term.
Pls do let me know if am missing out something here:
Expected confidence -P(Ties) should be read as 3/5 as i can see only 3 ties were bought per this dataset, however you have mentioned 4/5 in ur calculation. Pls correct my observation.
Expected confidence in the above formula is presence of ties in the overall dataset i.e. there are 4 instances of ties purchase out of 5
Thanks Abhinav, that was a typo have corrected it.
Excellent material, thanks a lot! 🙂
The links to parts 1,2 and 3 are dead, would be nice to read the whole series.
Some glitch with WordPress. Try now.
You could find the whole series at this link : http://ucanalytics.com/blogs/category/marketing-analytics/retail-case-study-example/
Thank you very much for these case studies. I am preparing for my Data Science Consultant interviews these days and these are helping me a lot. The next rounds in most companies I am interviewing with is Analytical Case Study. The HR described it as, “they will give a scenario, aks for what data will u need, what algos can you run, what are the risks involved etc”. Leaving your blog, I haven’t found many other good case studies which reflect the scenario I am most likely to get.
I have a question and some requests: 1) How should I come up with risks for any particular scenario? Is there a framework involved? 2) Apart from the Case Studies that you currently have on the blog, are there any more that you can share. 3) Can you point me to any other blogs/posts/videos/links you have come across which contain similar work?
Thank you very much. This is awesome work and is most likely helping a lot of people.
1) How should I come up with risks for any particular scenario? Is there a framework involved?
Risk is an extremely wide concept but analytically think of it as the probability of things going outside the expected business boundaries. You may find this credit risk case study useful http://ucanalytics.com/blogs/category/risk-analytics/banking-risk-case-study-example/
There are several great websites with good explainations of statistical & machine learning tools and coding. However, I didn’t come across any website focusing completely on creative business problem solving and case studies – the way data science professionals do it in the real world. This was precisely the reason I started this blog. Am glad it helped you. All the best.
Hi, Roopam,
Though I am new to data analytics and you can say zero experience in Algorithm. As of now have been basically instinctive based. Point that you made data analysis is more planning then instinct is awesome… I hope to learn from your blog. Decided to go through each of the post that you have posted…
Regards, Sushant
you are really good store teller ( with concept). I must say I enjoyed each and every line .
thanks a lot
Leave a comment Cancel reply
Your email address will not be published. Required fields are marked *
Notify me of follow-up comments by email.
Notify me of new posts by email.
This site uses Akismet to reduce spam. Learn how your comment data is processed .
COMMENTS
Exploratory Data Analysis (EDA) - Retail Case Study Example (Part 3) For the last couple of weeks we have been working on a marketing analytics case study example (read Part 1 and Part 2 ). In the last part ( Part 2) we defined a couple of advanced analytics objectives based on the business problem at an online retail company called DresSmart ...
Introduction. In this diverse collection of case studies, the power of Exploratory Data Analysis (EDA) shines as a critical tool for understanding and extracting insights from various datasets across different domains. Each case study focuses on a specific problem domain, ranging from e-commerce customer behavior analysis to predictive ...
You got the prediction — without first understanding the data. This is exactly where the importance of Exploratory Data Analysis (EDA) (as defined by. Jaideep Khare. ) comes in which, unfortunately, is a commonly undervalued step as part of the data science process. EDA is so important for 3 reasons (at least) as stated below:
Exploratory data analysis sits at the core of any insightful data work. Performing log analysis in search for threats is no different. Jun 5. Afrifa Kwame. Cyclistic Bike-Share Analysis Case Study ...
This case study demonstrates how explorative data analysis and basic statistics helped reduce the inefficiencies in the retail inventory and ordering process of fresh foods within grocery chains. Low shelf life and fluctuating demand had led to the need to hold clearance sales with zero or negative margin or to write off the inventory as ...
...Exploratory data analysis is a powerful tool. A diligent EDA is an absolute must to put your advanced business analytics in the right direction.
Introduction. Exploratory Data Analysis helps in identifying any outlier data points, understanding the relationships between the various attributes and structure of the data, recognizing the important variables. It helps in framing questions and visualizing the results, paving the way to make an informed choice of the machine learning ...
The best way to learn a skill is by applying it on hands-on projects, so today we will perform exploratory data analysis using SQL in a retail database to answer the following questions: The first…
Explore and run machine learning code with Kaggle Notebooks | Using data from Sample super store. code. New Notebook. table_chart. New Dataset. tenancy. New Model. emoji_events. New Competition. corporate_fare. New Organization. ... Exploratory Data Analysis - Retail Python · Sample super store. Exploratory Data Analysis - Retail. Notebook ...
Exploratory Data Analysis (EDA) is an essential phase in the data analysis process. It involves a comprehensive and visual examination of a dataset to discover patterns, trends, and hidden insights.
This article was published as a part of the Data Science Blogathon. Exploratory Data Analysis, or EDA, is an important step in any Data Analysis or Data Science project. EDA is the process of investigating the dataset to discover patterns, and anomalies (outliers), and form hypotheses based on our understanding of the dataset.
BrandttitudeTM is a unique analytics platform that executes exploratory data analytics, predictive modeling and self-service analytics to deliver compelling metrics in diverse categories. It provides accurate and actionable insights to. decision-makers across various industries through a unified view of disparate data.
Exploratory Data Analysis-Retail Selamawit 2023-05-02. 1. Introduction; 2. Task Summary; 3. Process. ... 1. Introduction. Performing "Exploratory Data Analysis ... There is a loss that we can see in Book Case, Supplies and Table subcategory. Top five States that are performing good are: ...
3. Multivariate Exploratory Data Analysis. Multivariate analysis helps to analyze and understand the relationship between two or more variables simultaneously. It helps unveil more complex associations and patterns within the data. For example, it explores the relationship between a person's height, weight, and age.
Image by GraphicMama-team from Pixabay. Before venturing on to any data science project it is important to pre-process the data and also to explore the data. Today we will discuss a very basic topic of exploratory data analysis (EDA) using Python and also uncover how simple EDA can be extremely helpful in performing preliminary data analysis.
Field The method phenomenon a variety used for an exploratory case (Merriam, of a resources, understanding evaluating Yin (2010) of described a qualitative the perceptions analyzing qualitative evaluations of people regarding Qualitative methods and presenting the findings. to produce as collecting a particular.
In this post, we will provide a step-by-step guide to EDA with some examples. Step 1: Data Collection and Preparation The first step in EDA is to collect the data and prepare it for analysis. This ...
Chapter 4Exploratory D. ta AnalysisA rst look at the data.As mentioned in Chapter 1, exploratory data analysis or \EDA" is a critical rst step in an. lyzing the data from an experiment. H. mong the explanatory variables, andassessing the direction and rough size of relationships betwee.
As you might have noticed, in this case study, we have so far performed 3 tasks from the above list i.e. 'Description' (exploratory data analysis), 'Association' (association analysis), and 'Classification' (decision trees: CART and C4.5).I must say, exploratory data analysis (EDA) is an integral part of every data science project.
3.0.Quick Statistics. The dataset contains 13 columns and 9,994 rows, all sales records for 18 sub-categories ranging from copiers to tables. Below is a brief statistical summary of the dataset ...
Let's get a better understanding of our newly cleansed dataset. Exploratory Data Analysis (EDA) is the process by which the data analyst becomes acquainted with their data to drive intuition and begin to formulate testable hypotheses. This process typically makes use of descriptive statistics and visualizations.
EDA or exploratory data analysis is a very important step when working with data. This step helps one get familiar with the dataset before performing any analysis or predictions on it. I took the ...
This is a continuation of the case study example of marketing analytics we have been discussing for the last few articles. You can find the previous parts at the following links ( Part 1, Part 2, and Part 3).. In the last part, we discussed exploratory data analysis (EDA: Part 3).In this article we will talk about association analysis, a helpful technique to mine interesting patterns in ...
The findings of this exploratory research suggest that 1) a data-skills gap exists in local government, 2) data skill expertise contributes indirectly to a chief administrator's satisfaction in their organization's overall data skills, and 3) data capture, curation, and analysis skills have smaller skill gaps compared to data communication ...
The partially completed survey forms were eliminated. A sample of 362 respondents was presented for the investigation, which is statistically significant with respect to the literature to proceed for analysis. The data of the current study were collected between January 2021 and June 2021.