LIBRARIES | ARCH

Data science masters theses.

The Master of Science in Data Science program requires the successful completion of 12 courses to obtain a degree. These requirements cover six core courses, a leadership or project management course, two required courses corresponding to a declared specialization, two electives, and a capstone project or thesis. This collection contains a selection of masters theses or capstone projects by MSDS graduates.

Collection Details

List of items in this collection
  Title Date Added Visibility
 

2022-06-15
 

2022-06-05
 

2020-06-16
 

2020-06-13
 

2019-11-26
 

2019-11-21
 

2019-06-23
  • Master of Science in Engineering Data Science

Data science is poised to play a vital role in research and innovation in the 21st century. Google, Facebook, Amazon, and Youtube are just some prominent examples which highlight the increasing impact of data science in our day-to-day life. The singularity which will facilitate the transition of our modern society to a science fiction-esque one is on the cusp of being realized due to the so-called data revolution, especially in the field of engineering.

Engineering Data Science is a broad field that encompasses predictive modeling and data-driven design of engineering systems. Applications range from health sciences and environmental sciences, to materials science, manufacturing, autonomous cars, image processing, and cybersecurity.

The demand for graduates with a data science background is already high and is growing rapidly across a wide range of industries worldwide. Houston, being the energy capital of the world as well as the home of a thriving healthcare industry, is also seeing a persistent demand for a workforce well-trained in data science. To provide state-of-the-art training for a data-centric workforce, the Cullen College of Engineering offers a Master of Science in Engineering Data Science.

The Master of Science in Engineering Data Science at the University of Houston is a 10 course graduate curriculum program with both non-thesis and thesis options. 

A four-year bachelor's degree in engineering or engineering related fields, or computer science and data science and statistics is required in order to apply for the Engineering Data Science program. 

The degree plan consists of courses in three primary categories. These courses may be available online and face-to-face in a classroom setting (A full online M.S. in Engineering Data Science is not available at this moment). 

The Master of Science in Engineering Data Science is a Science, Technology, Engineering ,and Mathematics (STEM) degree. There is a STEM OPT extension which is a 24-month period of temporary training for F-1 visa students in an approved STEM field. Please visit this link  for more information.

Application

For information on admission requirements and application process please  click here .

Application Deadlines International Domestic
Fall Semester March 15 (*Priority)
May 15 (Regular)
March 15 (*Priority)
May 15 (Regular)
Spring Semester September 15 September 15

*** This program does not offer summer intake ***

Degree Plan

(30 Credit Hours Requirement) MS in Engineering Data Science program requires 30 credit hours (10 courses). We offer both thesis and non-thesis options. 

Core Co​urses

9 Credit Hours/ 3 courses  (for both thesis and non-thesis options)

Course Code Course Name Credit Hours
EDS 6333 Probability and Statistics 3
EDS 6340 Introduction to Data Science 3
EDS 6342 Introduction to Machine Learning 3

Note: Students in their first semester of the degree should enroll in these three core courses. 

Prescribed Elective Courses

9 Credit Hours/ any 3 courses of the following (for both thesis and non-thesis options)

Course Code Course Name Credit Hours
INDE 7397
or
PETR 6397
Big Data and Analytics
or
Big Data Analytics
3
ECE 6364 Digital Image Processing     3
ECE 6397 Signal Processing and Networking for Big Data Applications 3
EDS 6344 AI for Engineers 3
EDS 6346 Data Mining for Engineers 3
EDS 6348 Introduction to Cloud Computing 3
ECE 6342 Digital Signal Processing 3
INDE 7397 Engineering Analytics 3
INDE 6372 Advanced Linear Optimization 3
EDS 6397 Information Visualization 3

Elective Courses 

Non-thesis option: 12 credit hours / any 4 courses of the following Thesis option: 3 credit hours / any 1 course of the following

Course Code Course Name Credit Hours
BIOE 6305 Brain Machine Interfacing 3
BIOE 6306 Advanced Artificial Neural Networks 3
BIOE 6309 Neural Interfaces 3
BIOE 6340 Quantitative Systems Biology & Disease 3
BIOE 6342 Biomedical Signal Processing 3
BIOE 6346 Advanced Medical Imaging 3
BIOE 6347 Introduction to Optical Sensing and Biophotonics 3
BIOE 6345 Biomedical Informatics 3
BZAN 6354 Database Management for Business Analytics 3
CIVE 6393 Geostatistics 3
CIVE 6380 Introduction to Geomatics/Geosensing 3
CIVE 6382 Lidar Systems and Applications 3
CIS 6397 Python for Data Analytics 3
CHEE 6367 Advanced Proc Control 3
ECE 6376 Digital Pattern Recognition 3
ECE 6397 Sparse Representations in Signal Processing 3
ECE 6337 Stochastic Processes in Signal Processing and Data Science  3
ECE 6378 Power System Analysis 3
ECE 6342 Digital Signal Processing 3
ECE 6333 Signal Detection and Estimation Theory 3
ECE 6315 Neural Computation 3
ECE 6397 GPU Programming 3
ECE 6397 High Performance Computing 3
ECE 6325 State-Space Control Systems 3
INDE 6370 Operation Research-Digital Simulation 3
INDE 6336 Reliability Engineering 3
INDE 7340 Integer Programming 3
INDE 7342 Nonlinear Optimization 3
INDE 6363 Statistical Process Control 3
IEEM 6360 Data Analytics for Engineering Managers 3
MECE 6379 Computer Methods in Mechanical Design 3
MECE 6397 Data Analysis Methods 3

Courses For Thesis Option 

(9 Credit Hours: research / thesis work)

Course Code Course Name Credit Hours
EDS 6398 Research Credit Hours 3
EDS 6399 Thesis Credit Hours 3
ECE 7399 Thesis Credit Hours 3

Note:  The research/thesis credit hours for the thesis option may be taken over two or three semesters. The thesis examination committee must be approved by the Program Director prior to the defense date. The committee must consist of at least three tenure-track faculty members with at least two committee members from within the College of Engineering.

To learn more about the thesis option or if you have a MS advisor and wish to add the thesis option to your degree plan, please contact the academic advisor at egrhpc [at] uh.edu (egrhpc[at]uh[dot]edu) .

Academic Requirements

Students must have an overall GPA of 3.0 or higher in order to graduate with a MS degree in Engineering Data Science.

Each student should assume responsibility for being familiar with the academic program requirements as stated in the current catalogs of the college, university and this website.

  • For further information on academic requirements, please review the  UH Graduate Catalog
  • For further information student rules and regulations, please review the  UH Student Handbook

Note: Students must receive a grade of C- and above to pass a course. If a student receives a grade of D+ or below, that course will not be counted towards the completion of the degree plan. However, the grade will always be counted in the calculation of the cumulative GPA. 

TUITION AND COST

The MS in Engineering Data Science is a 30 credit hours (10 courses) program. Students with full-time enrollment typically complete the program in a year and a half to two years. Here is the link to the Graduate Tuition Calculator which will give you an estimate of the costs.

FREQUENTLY ASKED QUESTIONS

If you have questions, please look at our extensive list of FAQs. If your question is not included here, please contact us at egrhpc [at] uh.edu (egrhpc[at]uh[dot]edu)

What is the status of my application? When will I hear about the decision? All the applications will be reviewed close to the application deadline. You will be informed about the decision around that time.

Can I receive an application fee waiver? Unfortunately, we are not able to offer the application fee waiver for MS students at this time. However, we do offer a competitive Dean’s scholarship of $1,000 to qualified students for the first year. If you are awarded with this scholarship, you become eligible to pay the in-state tuition which is a huge saving.  This scholarship may be renewable if your CGPA is 3.75 or higher after your first academic year’s study.  Please visit this link for more information.

How long does it take to complete the degree program? The MS in Engineering Data Science is a 30 credit hours (10 courses) program. Full time enrolled students typically complete the program in a year and a half to two years. 

What is full-time enrollment? Students will need to enroll in a minimum of 3 courses (9 credit hours) for full-time enrollment. Students will need to maintain full-time enrollment if they are an international student on a visa AND/OR they are a recipient of a scholarship.

Are there resources for Internships and Jobs? We have the Engineering Career Center which hosts career fairs every semester for Engineering students as well as for graduates after six months after graduation. You can view the website for more information about employers and partners of the Career Center. 

How can I transfer to the Engineering Data Science program? If you are already enrolled in another program at UH, at the end of this semester, you can petition to change from your current program to Engineering Data Science. Your petition will have to be approved by both programs (current program stating you are in good standing and Engineering Data Science stating you are admissible). Transfer is not guaranteed. The petition will be reviewed and the decision will be based on the grades and courses taken in the current program and your application package when applying to UH.  Petition: Graduate Student General Petition

What is the status of my I-20? The I-20 is being processed at our Graduate School. If they have questions or need any additional documents, they will contact you directly via email. Once I-20 is complete, you can view it and print it out for your visa interview at ApplyWeb under your account. Should you need to contact Graduate School, the email address is as follows: gradschool [at] uh.edu (gradschool[at]uh[dot]edu)   Please always include your full name and PSID# in your emails. Please be advised that Graduate School is in charge of I-20s for all the international students of UH which may take some time to complete. Thank you in advance for your patience.

I was offered admission but with no scholarship. When will I get the scholarship? The Dean’s scholarship is very competitive.  A student may be eligible for scholarship in their second year (or third semester), if the student maintains a GPA of 3.75 or above and based on availability of scholarship funds. If you do receive the scholarship as described above, in that semester you will be eligible for in-state tuition. 

How can I get TA/RA assistantships? Typically, we do not offer assistantships to MS students. The teaching/research assistantships are available to PhD students only. If you are interested in student employment, please visit the University Career Services website . You will be able to find out the types of employment that is available as well as eligibility.

I have a PDV hold on my account. How can I remove it to enroll in classes? According to our Graduate School’s policy, you need to submit your official transcripts/marksheets and the degree certificate to UH Graduate School in a sealed envelope with the university stamp, via a carrier such as the postal service or DHL or FedEx, etc. Otherwise, Prior Degree Verification (PDV) Hold will not be removed. You may bring those officials with you and hand deliver them to the Graduate School Office upon your arrival but you will not be able to register for courses in advance. If you wish to temporarily remove the PDV Hold, a petition must be filed with Graduate School with an official letter from your university indicating the reasons why you cannot get your official documents in time.  Petition: Graduate Student General Petition

I am transferring from another institution in the US. Can I transfer credits for some courses I have already taken? As per engineering graduate program policies, we can transfer up to 6 credit hours or 2 courses. Please send us a list of courses that you took at your previous institution and the grades you received for each course. Based on the course content and equivalency with courses in our engineering data science degree plan, we will make a decision about whether the course can be transferred or not. Once a course has been approved, to complete the credit transfer, you need to submit a petition. Petition: Graduate Student General Petition

Do I have to take the TOEFL? If you are an international student and English is not your native language, you also need to submit proof of English proficiency such as TOEFL or IELTS or Duolingo test result. You can upload the unofficial test result for evaluation first and ask the testing center to send the official test result to UH electronically. English Language Proficiency Requirement

Is GRE required? GRE is optional for Fall 2023.

How long should my essay be? There is no limitation for essays. It can be just as much as you need to describe your background, interest, experience, and relevant information. Typically it is 1.5-2 pages.

  • Certificate in Engineering Data Science
  • High Performance Computing

banner-in1

  • Data Science

Top 20 Data Engineering Project Ideas 2024 [With Source Code]

Home Blog Data Science Top 20 Data Engineering Project Ideas 2024 [With Source Code]

Play icon

Welcome to the world of data engineering, where the power of big data unfolds. If you're an aspiring data engineer and seeking to showcase your skills or gain hands-on experience, you've landed in the right spot. Get ready to learn the best data engineering project concepts and explore a world of exciting data engineering projects in this article.

Before working on these initiatives, you should be conversant with topics and technologies. Companies are constantly seeking experienced data engineers who can create innovative data engineering initiatives. Therefore, the greatest thing you can do as a novice is to work on some real-time data engineering initiatives. Working on a data engineering project will not only give you a deeper understanding of how data engineering works, but it will also improve your problem-solving skills as you encounter and fix problems within the project. Best Data Science certifications online or offline are available to assist you in establishing a solid foundation for every end-to-end data engineering project.

What are Data Engineering Projects?

If you want to break into the field of data engineering but don't yet have any expertise in the field, compiling a portfolio of data engineering projects may help. From EDA and data cleansing to data modeling and visualization, the greatest data engineering projects demonstrate the whole data process from start to finish.

Data pipeline best practices should be shown in these initiatives. You should be able to identify potential weak spots in data pipelines and construct robust solutions to withstand them. Finally, make data visualizations to display your project's results and construct a website to showcase your work, whether it's a portfolio or a personal site.

The first step in hiring data engineers is reviewing a candidate's résumé. When screening resumes, most hiring managers prioritize candidates who have actual experience working on data engineering projects.

Structure of a Data Engineering Project 

Here is the Project Folder Structure for data engineer project ideas:

  • config/ (Configuration Files)
  • data/ (Data Files
  • docs/ (Documentation)
  • etl/ (Extract-Transform-Load)
  • pipelines/ (Data Pipeline Orchestration)
  • src/ (Source Code)
  • tests/ (Project Tests)
  • .gitignore (Version Control Exclusion)
  • environment.yml (Conda Environment)
  • README.md (Project Overview

List of Top Data Engineering Projects of 2024

Data engineers make unprocessed data accessible and functional for other data professionals. Multiple types of data exist within organizations, and it is the obligation of data architects to standardize them so that data analysts and scientists can use them interchangeably. If data scientists and analysts are pilots, data engineers are aircraft manufacturers. Without the latter, the former cannot accomplish its objectives. From analysts to Big Data Engineers, everyone in the field of data science has been discussing data engineering.

Here are the list of data engineering project ideas (beginner, intermediate, and professionals): 

  • Stock and Twitter Data Extraction Using Python, Kafka, and Spark
  • Use Python to Scrape Real Estate Listings and Make a Dashboard
  • Use Stack Overflow Data for Analytic Purposes
  • Extracting Inflation Rates from CommonCrawl and Building a Model
  • Realtime Data Analytics
  • Yelp Review Analysis
  • Finnhub API with Kafka for Real-Time Financial Market Data Pipeline
  • Pipeline for Real-Time Data Processing in Music Applications
  • Anomaly Detection in Cloud Servers
  • Smart Cities Using Big Data
  • Tourist Behaviour Analysis
  • Image Caption Generator.

When constructing a data engineering project, you should prioritize the following areas:

  • Multiple sources of data (APIs, websites, CSVs, JSON, etc.)
  • Data consumption
  • Data storage
  • Data visualization (So that you have something to show for your efforts)
  • Utilising multiple instruments

Top 4 Data Engineering Project Ideas: Beginner & Final Year Students

Becoming an expert data engineer necessitates familiarity with the best practices and cutting-edge technologies in your field. Participating in a data engineering project is a great way to learn the ropes of the field. That's why we're going to zero in on the data engineering initiatives that need your attention. If you are struggling with Data Engineering projects for beginners, then Data Engineer Bootcamp is for you.

Some simple beginner Data Engineer projects that might help you go forward professionally are provided below.

1. Stock and Twitter Data Extraction Using Python, Kafka, and Spark

Project Overview:  The rising and falling of GameStop's stock price and the proliferation of cryptocurrency exchanges have made stocks a topic of widespread attention.

Stock and Twitter Data Extraction Using Python

If you share this individual's enthusiasm for the markets, you may want to consider creating a tool like Cashtag, which was created by a Reddit developer. For this study, we wanted to create a "big data pipeline for user sentiment analysis on the US stock market." In a nutshell, this initiative uses social media data to provide real-time market sentiment predictions. The process flow for this project is shown in the following diagram:

This project's documentation will serve as a starting point from which you may draw ideas for your own work.

Source Code: Stock and Twitter Data Extraction Using Python, Kafka, and Spark

2. Use Python to Scrape Real Estate Listings and Make a Dashboard

Project Overview:  If you're looking to get your hands dirty with some cutting-edge tech and big Data Engineering projects for engineering students, consider something like sspaeti's 20-minute data engineering project. The purpose of this work is to provide a resource that can help you find the best possible home or rental.

Use Python to Scrape Real Estate Listings and Make a Dashboard

Source: Medium

Web scraping applications like Beautiful Soup and Scrapy are used to gather information for this project. As a data engineer, you should get experience writing Python programs that process HTML, and web scraping is an excellent method to do so. Delta Lake and Kubernetes are both trending subjects. Therefore it's interesting to see them both addressed in this project.

Finally, a well-designed user interface is an essential part of any successful data engineering project. Superset is used for data visualization in this project, while Dagster is used to coordinate the many moving parts. The wide range of methods used in this work makes it an excellent addition to a resume.

Source: Use Python to Scrape Real Estate Listings and Make a Dashboard

3. Use Stack Overflow Data for Analytic Purposes

Project Overview:  What if you had access to all or most of the public repos on GitHub? Which queries do you have?

As part of similar research, Felipe Hoffa analysed gigabytes of data spread over many publications from Google's BigQuery data collection. However, the abundance of data opens numerous possibilities for research and analysis. Concepts that Felipe examined include:

  • The Case for Tabs
  • Which languages do programmers spend their weekends working on?
  • Searching for questions and comments in GitHub repos.
  • Since there are numerous ways to approach this task, it encourages originality in one's approach to data analysis.

2.8 million open-source projects are available for inspection.

Moreover, this project concept should highlight the fact that there are many interesting datasets already available on services like GCP and AWS. Hundreds of datasets are available from these two cloud services, so you may practise your analytical skills without having to scrape data from an API.

Source: Use Stack Overflow Data for Analytic Purposes

4. Extracting Inflation Rates from CommonCrawl and Building a Model

Project Overview:  Dr. Usama Hussain worked on another intriguing idea. He calculated the rate of inflation by following internet pricing fluctuations for products and services. Given that the United States has had the highest inflation rate since 2008, this is a significant problem.

The author utilised petabytes of website data from the Common Crawl in their effort.

This is also another excellent example of putting together and showing a data engineering project, in my opinion. One of the difficulties I often mention is how difficult it may be to demonstrate your data engineering job.

However, Dr. Hussain's project is documented in such a way that it is possible to see what work was done and the skills he possesses without having to dig into all the code.

The data flow is outlined below by Dr. Hussain.

Source Code: Extracting Inflation Rates from CommonCrawl and Building a Model

Top 4 Data Engineering Project Ideas: Intermediate Level

Knowing big data theory alone will not get you very far. You'll need to put your newfound knowledge into action. Working on big data projects allows you to put your big data skills to the test. Projects are a wonderful way to put your skills to the test. They are also excellent for your resume. This post will go over some amazing Big Data projects that you may work on to demonstrate your big data expertise and these are solid Data Engineer projects for resume.

Here are some data engineering project ideas to consider and Data Engineering portfolio project examples to demonstrate practical experience with data engineering problems.

1. Realtime Data Analytics

Project Overview:  Olber, a corporation that provides taxi services, is gathering information about each and every journey. Per trip, two different devices generate additional data. The taxi metre transmits information on the length of each journey, the distance travelled, as well as the pick-up and drop-off locations. Customers' payments are processed using a smartphone application, which also provides reliable and easily accessible data about fares. In order to identify patterns among its customers, the taxi firm needs to compute, in real-time, the typical amount of cash given as a tip for each kilometre travelled in each region.

A complete end-to-end stream processing pipeline is shown here using an architectural diagram. Extracting, transforming, loading, and reporting are the four processes that make up this kind of pipeline. The pipeline in this reference design collects data from two different sources, then conducts a join operation on related records from each stream, then enriches the output, and finally produces an average. The findings are being saved for use in further analyses.

Source Code: Realtime Data Analytics

2. Yelp Review Analysis

Project Overview:  Yelp is a platform that allows people to post reviews and provide a star rating to businesses that they have visited. Studies found that a one-star raise led to a 59 percent gain in revenue for independently owned and operated firms. As a consequence of this, we think the Yelp dataset has a lot of promise as a resource for gaining valuable insights. Yelp reviews written by customers are a treasure trove just waiting to be unearthed.

The primary objective of this project is to carry out in-depth analyses of seven different cuisine types of restaurants, namely Korean, Japanese, Chinese, Vietnamese, Thai, French, and Italian, in order to determine what makes a good restaurant and what concerns customers, and then to make recommendations for future improvement and growth in profit. The majority of our focus will be on analysing feedback from consumers to figure out why they either like or detest the company. Using big data, we are able to transform unstructured data, such as customer reviews, into actionable insights, which enables businesses to better understand how and why customers prefer their products or services and to make improvements to their operations as quickly as is practically possible.

Source Code: Yelp Review Analysis

3. Finnhub API with Kafka for Real-Time Financial Market Data Pipeline

Project Overview:  The goal of this project is to construct a streaming data pipeline by making use of the real-time financial market data API provided by Finnhub. This project's architecture is essentially composed on five layers: the Data Ingestion layer, the Message broker layer, the Stream processing layer, the Serving database layer, and the Visualisation layer. A dashboard that provides data in a graphical manner for in-depth study is the final product of this project.

The pipeline consists of many different components, one of which is a producer that retrieves data from Finnhub's API and then transmits that data to a Kafka topic, which is part of a Kafka cluster that stores the data and processes it. Apache Spark is going to be used for stream processing. The next step is to use Cassandra for the purpose of storing the real-time financial market data that is being sent over the pipeline. Users are able to watch the market data in real-time and detect trends and patterns by using the final dashboard that was created with the help of Grafana. This dashboard shows real-time charts and graphs that are based on the data that is stored in the database.

Source Code: Finnhub API with Kafka for Real-Time Financial Market Data Pipeline

4. Pipeline for Real-Time Data Processing in Music Applications

Project Overview:  The project will stream events that are created by a fictitious music streaming service that operates similarly to Spotify. Additionally, a data pipeline that consumes real-time data will be developed. The incoming data would be analogous to an event that occurred when a person listened to music, navigated around the website, or authenticated themselves. The processing of the data would take place in real-time, and it would be saved to the data lake at regular intervals (every two minutes). The hourly batch job will then make use of this data by consuming it, applying transformations to it, and creating the tables that are needed for our dashboard so that analytics may be generated. We are going to try to conduct an analysis of indicators such as the most played songs, active users, user demographics, etc.

You will be able to generate a sample dataset for this project by using Eventism and the Million Songs dataset. Apache Kafka and Apache Spark are two examples of streaming technologies that are used for processing data in real-time. The Structured Streaming API offered by Spark makes it possible for data to be processed in real-time in mini-batches, which in turn offers low-latency processing capabilities. The processed data are uploaded to Google Cloud Storage, where they are then subjected to transformation with the assistance of dbt. We can clean the data, convert the data, and aggregate the data using dbt so that it is ready for analysis. The data is then sent to BigQuery, which serves as a data warehouse, and Data Studio is used to create a visual representation of the data. Apache AirFlow has been used for the purpose of orchestration, whilst Docker is the tool of choice when it comes to containerization.

Source Code: Pipeline for Real-Time Data Processing in Music Applications

Top 4 Data Engineering Project Ideas - Advanced Level

After you have worked on these, adding projects for Data Engineer to your resume will likely increase the likelihood that an interview will be requested of you.

1. Anomaly Detection in Cloud Servers

Source: ResearchGate

Project Overview:  Anomaly detection is a valuable instrument for cloud platform administrators who wish to monitor and analyse cloud behaviour in order to increase cloud reliability. It aids cloud platform administrators in detecting unanticipated system activity in order to take preventative measures prior to a system breakdown or service failure.

This project provides a reference implementation of a Cloud Dataflow streaming pipeline that integrates with BigQuery ML, Cloud AI Platform, to detect anomalies. A critical component of the implementation utilises Dataflow for feature extraction and real-time outlier detection, which has been validated on over 20TB of data.

Source Code: Anomaly Detection in Cloud Servers

2. Smart Cities Using Big Data

Project Overview:  A "smart city" is an ultra-modern urban area that gathers data through electronic means, voice activation techniques, and sensors. The data is used to better manage the city's assets, resources, and services, which in turn leads to better citywide operations. In order to keep tabs on and manage things like traffic and transportation systems, power plants, utilities, water supply networks, waste, crime detection, information systems, educational institutions, health care facilities, and more, data is gathered from citizens, devices, buildings, and assets and then processed and analysed. This data is collected by means of big data, and then the complex characteristics of a smart city may be put into effect with the aid of advanced algorithms, smart network infrastructures, and numerous analytics platforms. For traffic or stadium sensing, analytics, and management, this smart city reference pipeline demonstrates how to combine several media building pieces with analytics provided by the OpenVINO Toolkit.

Source Code: Smart Cities Using Big Data

3. Tourist Behaviour Analysis

Project Overview:  One of the most forward-thinking ideas for a big data project is presented here. The purpose of this Big Data project is to research visitor behaviour in order to ascertain the preferences of tourists and the locations that are visited the most, as well as to anticipate the need for tourism in the future.

What part does large amounts of data play in the whole project? Because vacationers use the internet and other technologies while they are away from home, they leave digital traces that can be easily collected and distributed by Big Data. The vast majority of the data comes from outside sources like social media websites. The sheer amount of data is just too much for a conventional database to manage, which is why big data analytics is required. The data collected from all of these sources may be put to use to assist companies in the airline, hotel, and tourism sectors in expanding their client base and marketing their products and services. Additionally, it can assist tourism organisations in visualising and forecasting current and future trends, which is another useful application for the tool.

Source Code: Tourist Behavior Analysis

4. Image Caption Generator

Project Overview:  Businesses must now upload engaging content as a result of the rise of social media and the significance of digital marketing. Visuals that are enticing to the eye are essential, but the images must also be accompanied by subtitles. Utilising hashtags and attention-grabbing subtitles may help you reach the intended audience more effectively. Large datasets containing photos and captions that are correlated must be managed. Image processing and deep learning are used to comprehend the image, and artificial intelligence is used to generate relevant and alluring captions. Python source code for Big Data can be written. The creation of image captions is not a Big Data project proposal for beginners and is indeed difficult. Using CNN (Convolution Neural Network) and RNN (Recurrent Neural Network) with BEAM Search, the project described below employs a neural network to generate captions for an image.

Rich and colourful datasets, such as MSCOCO, Flickr8k, Flickr30k, PASCAL 1K, AI Challenger Dataset, and STAIR Captions, are currently used in the generation of image descriptions and are gradually becoming a topic of discussion. The supplied project employs cutting-edge machine learning and big data algorithms to create an efficient image caption generator.

Source Code: Image Caption Generator

Open-Source Data Engineering Project Ideas: Additional Topics

Below are some Data Engineering project topic examples

1. Analytics Application
2. Extract, Transform, Load (ETL)
3. Extracting Inflation Data
4. Building Data Pipelines
5. Creating a Data Repository
6. Analyse Security Breach
7. Aviation Data Analysis
8. Shipping and Distribution Demand Forecasting

Why Should You Work on Data Engineering-Based Projects?

In conjunction with machine learning, it enables the development of marketing plans that are based on the forecasts of customers. Businesses that use big data analytics become more customer focused.

Learning this skill set, which is in great demand, will allow you to make rapid strides in your professional development. Because of this, the best thing you can do if you're new to big data is to think of some ideas for projects that include big data.

Data engineers are responsible for the construction and administration of computer hardware and software systems that are used for the gathering, formatting, storing, and processing of data. In addition to this, they make sure that the data is always readily accessible to consumers. The end-to-end data process is shown via data engineering projects, which range from exploratory data analysis (EDA) and data cleansing through data modelling and visualization.

Including Data Engineering projects on your resume is quite crucial if you want your application for a job to stand out from the other applicants who have applied for the same position.

Best Platforms to Work on Data Engineering Projects

The following is a list of several platforms that are suitable for use in Data Engineering real time projects -

  • Great Expectations

One of the finest data science learning platforms, Google Cloud provides all of the tools that data scientists use to extract value from data, making it one of the top data science learning platforms. Business intelligence solutions like as Power BI, Tableau, and Looker may assist companies in mitigating operational risk and achieving maximum efficiency in terms of operations enablement by assisting businesses in making choices that are supported by data.

Learn Data Engineering the Smart Way!

A few thing that you should keep in mind while studying for data engineering projects and jobs are -

  • Learn how to program in languages such as Python and Scala and become an expert in those languages.
  • Scripting and automation are skills you should learn.
  • Gain familiarity with database management, and work on improving your SQL skills.
  • Master data processing methods.
  • Acquire the skill of scheduling your workflows.
  • Gain experience in cloud computing by using services such as Amazon Web Services.
  • Improve your understanding of technologies used in infrastructure, such as Docker and Kubernetes, for example.
  • Maintain a current awareness of the trends in the industry.
Elevate your career with business analyst certificate programs  . Establish your expertise and open doors to limitless opportunities!

This article examines some of the finest concepts for large data projects. We began with some basic, quick-to-complete assignments and have added Data Engineering projects with source code.

The optimal undertaking is one that establishes a balance between industry interests and personal interests. Whether you like it or not, your personal interest will be communicated through the topic you select, so it is essential to select a topic that you enjoy. If you have an interest in equities, real estate, politics, or any other niche category, you can use the projects listed above as a template for your own project. Checkout the KnowledgeHut’s best Data Science certification online for Data Engineering project ideas.

Frequently Asked Questions (FAQs)

An online portfolio is the best way to showcase your work. Document each project's construction and operation. Your blog entries or Github repositories may show your problem description, recommended design, data analysis approach, and results. Adding real world Data Engineering projects is a good way to showcase projects for Data Engineering. 

Start with a question. Next, find a relevant dataset. Kaggle, FiveThirtyEight, Google Trends, the Census Bureau, and Data.gov provide free datasets. Use an open API or web scraping tools to get website data. 

project-worthy topics in data engineering:

  • Data pipeline development
  • Data warehousing
  • Data modeling
  • Data integration
  • Data migration 

Data engineering creates a trustworthy data storage and processing infrastructure. Building and maintaining data pipelines to centralize data sources. Data engineers build and maintain the infrastructure data scientists and analysts utilize to work with data. 

An example of data engineering with example is businesses wanting to know how website visitors behave. Web logs, smartphone apps, and social media accounts provide data. Databases, JSON, and CSV files contain the data. This data must be collected, normalized, imported into a central data repository, and examined. Data engineering is fascinating. Data engineers take data from multiple sources, convert it to Parquet or ORC, then put it into a data warehouse like Amazon Redshift or Google BigQuery. Data scientists and analysts may then study the data in a data warehouse. 

Profile

Ritesh Pratap Arjun Singh

RiteshPratap A. Singh is an AI & DeepTech Data Scientist. His research interests include machine vision and cognitive intelligence. He is known for leading innovative AI projects for large corporations and PSUs. Collaborate with him in the fields of AI/ML/DL, machine vision, bioinformatics, molecular genetics, and psychology.

Avail your free 1:1 mentorship session.

Something went wrong

Upcoming Data Science Batches & Dates

NameDateFeeKnow more

Course advisor icon

data engineering thesis

Top 20 Latest Research Problems in Big Data and Data Science

Problem statements in 5 categories, research methodology and research labs to follow.

Dr. Sunil Kumar Vuppala

Dr. Sunil Kumar Vuppala

Towards Data Science

E ven though Big data is in the mainstream of operations as of 2020, there are still potential issues or challenges the researchers can address. Some of these issues overlap with the data science field. In this article, the top 20 interesting latest research problems in the combination of big data and data science are covered based on my personal experience (with due respect to the Intellectual Property of my organizations) and the latest trends in these domains [1,2]. These problems are covered under 5 different categories, namely

Core Big data area to handle the scale Handling Noise and Uncertainty in the data Security and Privacy aspects Data Engineering Intersection of Big data and Data science The article also covers a research methodology to solve specified problems and top research labs to follow which are working in these areas.

I encourage researchers to solve applied research problems which will have more impact on society at large. The reason to stress this point is that we are hardly analyzing 1% of the available data. On the other hand, we are generating terabytes of data every day. These problems are not very specific to a domain and can be applied across the domains.

Let me first introduce 8 V’s of Big data (based on an interesting article from Elena ), namely Volume, Value, Veracity, Visualization, Variety, Velocity, Viscosity, and Virality. If we closely look at the questions on individual V’s in Fig 1, they trigger interesting points for the researchers. Even though they are business questions, there are underlying research problems. For instance, 02-Value: “Can you find it when you most need it?” qualifies for analyzing the available data and giving context-sensitive answers when needed.

Having understood the 8V’s of big data, let us look into details of research problems to be addressed. General big data research topics [3] are in the lines of:

  • Scalability — Scalable Architectures for parallel data processing
  • Real-time big data analytics — Stream data processing of text, image, and video
  • Cloud Computing Platforms for Big Data Adoption and Analytics — Reducing the cost of complex analytics in the cloud
  • Security and Privacy issues
  • Efficient storage and transfer
  • How to efficiently model uncertainty
  • Graph databases
  • Quantum computing for Big Data Analytics
Next, let me cover some of the specific research problems across the five listed categories mentioned above. The problems related to core big data area of handling the scale:-
  • Scalable architectures for parallel data processing:

Hadoop or Spark kind of environment is used for offline or online processing of data. The industry is looking for scalable architectures to carry out parallel data processing of big data. There is a lot of progress in recent years, however, there is a huge potential to improve performance.

2. Handling real-time video analytics in a distributed cloud:

With the increased accessibility to the internet even in developing countries, videos became a common medium of data exchange. There is a role of telecom infrastructure, operators, deployment of the Internet of Things (IoT), and CCTVs in this regard. Can the existing systems be enhanced with low latency and more accuracy? Once the real-time video data is available, the question is how the data can be transferred to the cloud, how it can be processed efficiently both at the edge and in a distributed cloud?

3. Efficient graph processing at scale:

Social media analytics is one such area that demands efficient graph processing. The role of graph databases in big data analytics is covered extensively in the reference article [4]. Handling efficient graph processing at a large scale is still a fascinating problem to work on.

The research problems to handle noise and uncertainty in the data:-

4. Identify fake news in near real-time:

This is a very pressing issue to handle the fake news in real-time and at scale as the fake news spread like a virus in a bursty way. The data may come from Twitter or fake URLs or WhatsApp. Sometimes it may look like an authenticated source but still may be fake which makes the problem more interesting to solve.

5. Dimensional Reduction approaches for large scale data:

One can extend the existing approaches of dimensionality reduction to handle large scale data or propose new approaches. This also includes visualization aspects. One can use existing open-source contributions to start with and contribute back to the open-source.

6. Training / Inference in noisy environments and incomplete data :

Sometimes, one may not get a complete distribution of the input data or data may be lost due to a noisy environment. Can the data be augmented in a meaningful way by oversampling, Synthetic Minority Oversampling Technique (SMOTE), or using Generative Adversarial Networks (GANs)? Can the augmentation help in improving the performance? How one can train and infer is the challenge to be addressed.

7. Handling uncertainty in big data processing:

There are multiple ways to handle the uncertainty in big data processing[4]. This includes sub-topics such as how to learn from low veracity, incomplete/imprecise training data. How to handle uncertainty with unlabeled data when the volume is high? We can try to use active learning, distributed learning, deep learning, and fuzzy logic theory to solve these sets of problems.

The research problems in the security and privacy [5] area:-

8. Anomaly Detection in Very Large Scale Systems:

The anomaly detection is a very standard problem but it is not a trivial problem at a large scale in real-time. The range of application domains includes health care, telecom, and financial domains.

9. Effective anonymization of sensitive fields in the large scale systems :

Let me take an example from Healthcare systems. If we have a chest X-ray image, it may contain PHR (Personal Health Record). How one can anonymize the sensitive fields to preserve the privacy in a large scale system in near real-time? This can be applied to other fields as well primarily to preserve privacy.

10. Secure federated learning with real-world applications:

Federated learning enables model training on decentralized data. It can be adopted where the data cannot be shared due to regulatory / privacy issues but still may need to build the models locally and then share the models across the boundaries. Can we still make the federated learning work at scale and make it secure with standard software/hardware-level security is the next challenge to be addressed. Interested researchers can explore further information from RISELab of UCB in this regard.

11. Scalable privacy preservation on big data:

Privacy preservation for large scale data is a challenging research problem to work on as the range of applications varies from the text, image to videos. The difference in country/region level privacy regulations will make the problem more challenging to handle.

The research problems related to data engineering aspects:-

12. Lightweight Big Data analytics as a Service:

Everything offering as a service is a new trend in the industry such as Software as a Service (SaaS). Can we work towards providing lightweight big data analytics as a service?

13. Auto conversion of algorithms to MapReduce problems:

MapReduce is a well-known programming model in Big data. It is not just a map and reduce functions but provide scalability and fault-tolerance to the applications. However, there are not many algorithms that support map-reduce directly. Can we build a library to do an auto conversion of standard algorithms to support MapReduce?

14. Automated Deployment of Spark Clusters:

A lot of progress is witnessed in the usage of spark clusters in recent times but they are not completely ready for automated deployment. This is yet another challenging problem to explore further.

The research problems in intersection of big data with data science:-

15. Approaches to make the models learn with less number of data samples:

In the last 10 years, the complexity of deep learning models increased with the availability of more data and compute power. Some researchers proudly claim that they solved a complex problem with hundreds of layers in deep learning. For instance, image segmentation may need a 100 layer network to solve the segmentation problem. However, the recent trend is that can anyone solve the same problem with less relevant data and with less complexity? The reason behind this thinking is to run the models at the edge devices, not just only at the cloud environment using GPUs/TPUs. For instance, the deep learning models trained on big data might need deployment in CCTV / Drones for real-time usage. This is fundamentally changing the approach of solving complex problems. You may work on challenging problems in this sub-topic.

16. Neural Machine Translation to Local languages:

One can use Google translation for neural machine translation (NMT) activities. However, there is a lot of research in local universities to do neural machine translation in local languages with support from the Governments. The latest advances in Bidirectional Encoder Representations from Transformers (BERT) are changing the way of solving these problems. One can collaborate with those efforts to solve real-world problems.

17. Handling Data and Model drift for real-world applications:

Do we need to run the model on inference data if one knows that the data pattern is changing and the performance of the model will drop? Can we identify the drift in the data distribution even before passing the data to the model? If one can identify the drift, why should one pass the data for inference of models and waste the compute power. This is a compelling research problem to solve at scale in the real world. Active learning and online learning are some of the approaches to solve the model drift problem.

18. Handling interpretability of deep learning models in real-time applications:

Explainable AI is the recent buzz word. Interpretability is a subset of explainability. Machine / Deep learning models are no more black-box models. Few models such as Decision Trees are interpretable. However, if the complexity increases, the base model itself may not be useful to interpret the results. We may need to depend on surrogate models such as Local interpretable model-agnostic explanations (LIME) / SHapley Additive exPlanations (SHAP) to interpret. This can help the decision-makers with the justification of the results produced. For instance, rejection of a loan application or classifying the chest x-ray as COVID-19 positive. Can the interpretable models handle large scale real-time applications?

19. Building context-sensitive large scale systems:

Building a large scale context-sensitive system is the latest trend. There are some open-source efforts to kick start. However, it requires a lot of effort in collecting the right set of data and building context-sensitive systems to improve search capability. One can choose a research problem in this topic if you have a background on search, knowledge graphs, and Natural Language Processing (NLP). This is applicable across the domains.

20. Building large scale generative based conversational systems (Chatbot frameworks):

One specific area gaining momentum is building conversational systems such as Q&A and Chatbot generative systems. A lot of chatbot frameworks are available. Making them generative and preparing summary in real-time conversations are still challenging problems. The complexity of the problem increases as the scale increases. A lot of research is going on in this area. This requires a good understanding of Natural Language Processing and the latest advances such as Bidirectional Encoder Representations from Transformers (BERT) to expand the scope of what conversational systems can solve at scale.

Research Methodology:

Hope you can frame specific problems with your domain and technical expertise from the topics highlighted above. Let me recommend a methodology to solve any of these problems. Some points may look obvious for the researchers, however, let me cover the points in the interest of a larger audience:

Identify your core strengths whether it is in theory, implementation, tools, security, or in a specific domain. Other new skills you can acquire while doing the research. Identifying the right research problem with suitable data is kind of reaching 50% of the milestone. This may overlap with other technology areas such as the Internet of Things (IoT), Artificial Intelligence (AI), and Cloud. Your passion for research will determine how long you can go in solving that problem. The trend is interdisciplinary research problems across the departments. So, one may choose a specific domain to apply the skills of big data and data science.

Literature survey : I strongly recommend to follow only the authenticated publications such as IEEE, ACM, Springer, Elsevier, Science direct, etc… Do not get into the trap of “International journal …” which publish without peer reviews. Please do not limit the literature survey to only IEEE/ACM papers only. A lot of interesting papers are available in arxiv.org and paperswithcode . One needs to check/follow the top research labs in industry and academia as per the shortlisted topic. That gives the latest research updates and helps to identify the gaps to fill in.

Lab ecosystem : Create a good lab environment to carry out strong research. This can be in your research lab with professors, post-docs, Ph.D. scholars, masters, and bachelor students in academia setup or with senior, junior researchers in industry setup. Having the right partnership is the key to collaboration and you may try the virtual groups as well. Having that good ecosystem boosts up the results as one can challenge the others on their approach to improve the results further.

Publish at right avenues: As mentioned in the literature survey, publish the research papers in the right forum where you will receive peer reviews from the experts around the world. We may get obstacles in this process in the way of rejections. However, as long as you receive constructive feedback, one should be thankful to the anonymous reviewers. You may see the potential opportunity to patent the ideas if the approach is novel, non-obvious, and inventive. The recent trend is to open source the code while publishing the paper. If your institution permits it to open source, you may do so by uploading the relevant code in Github with appropriate licensing terms and conditions.

Top Research labs to follow:

Some of these research areas are active in the top research centers around the world. I request you to follow them and identify further gaps to continue the work. Here are some of the top research centers around the world to follow in big data + data science area:

RISE Lab at the University of Berkeley , USA

Doctoral Research Centre in Data Science, The University of Edinburgh, United Kingdom

Data Science Institute, Columbia University, USA

The Institute of Data-Intensive Engineering and Science, John Hopkins University, USA

Facebook Data Science research

Big Data Institute, University of Oxford, United Kingdom

Center for Big Data Analytics, The University of Texas at Austin, USA

Center for data science and big data analytics, Oakland University, USA

Institute for Machine Learning, ETH Zurich, Switzerland

The Alan Turing Institute, United Kingdom

IISc Computational and Data Sciences Research

Data Lab, Carnegie Mellon University, USA

If you wish to continue your learning in big data , here are my recommendations:

Coursera Big Data Specialization

Big data course from the University of California San Diego

Top 10 books based on your need can be picked up from the summary article in Analytics India Magazine.

Data Challenges:

In the process of solving the real-world problems, one may come across these challenges related to data:

  • What is the relevant data in the available data?
  • The Lack of International Standards for Data Privacy Regulations
  • The General Data Protection Regulation (GDPR) kind of rules across the countries
  • Federated learning concepts to adhere to the rules — one can build the model and share, still, data belongs to the country/organization.

Conclusion:

In this article, I briefly introduced the big data research issues in general and listed Top 20 latest research problems in big data and data science in 2020. These problems are further divided and presented in 5 categories so that the researchers can pick up the problem based on their interests and skill set. This list is no means exhaustive. However, I hope these inputs can excite some of you to solve the real problems in big data and data science. I covered these points along with some background on big data in a webinar for your reference [7]. You may refer to my other article which lists the problems to solve with data science amid Covid-19[8]. Let us come together to build a better world with technology.

References:

[1] https://www.gartner.com/en/newsroom/press-releases/2019-10-02-gartner-reveals-five-major-trends-shaping-the-evoluti

[2] https://www.forbes.com/sites/louiscolumbus/2019/09/25/whats-new-in-gartners-hype-cycle-for-ai-2019/#d3edc37547bb

[3] https://arxiv.org/ftp/arxiv/papers/1705/1705.04928.pdf

[4] https://www.xenonstack.com/insights/graph-databases-big-data/

[5] https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0206-3

[6] https://www.rd-alliance.org/group/big-data-ig-data-security-and-trust-wg/wiki/big-data-security-issues-challenges-tech-concerns

[7] https://www.youtube.com/watch?v=maZonSZorGI

[8] https://medium.com/@sunil.vuppala/ds4covid-19-what-problems-to-solve-with-data-science-amid-covid-19-a997ebaadaa6

Choose the right research problem and apply your skills to solve it. All the very best. Please share your feedback in the comments section. Feel free to add if you come across further topics in this area.

Dr. Sunil Kumar Vuppala

Written by Dr. Sunil Kumar Vuppala

Dr. Sunil is a Director of Data Science in Ericssion. 16+ years of exp in ML/DL, IoT, Analytics; Inventor, Speaker, Thought leader. Top Data Scientist in India.

Text to speech

Application portal

Data Engineering

  • How to apply

Data Engineering

The curriculum of the Data Engineering program provides students with a comprehensive understanding of the big data aspects of data analytics and data science, with the technological challenges of data acquisition, curation, and management.

The program focuses on the application of computer skills and mathematical knowledge to solve real-world problems in different industries. It appeals to students with a wide range of career goals and backgrounds, and offers four focus tracks: Computer Science, Geo-Informatics, Bio-Informatics, and Business & Supply Chain Engineering.

These tracks prepare students for advanced projects and their master's thesis, giving them hands-on experience in their area of interest. The program is well-rounded, providing students with theoretical knowledge and practical experience, giving them the tools to succeed in this rapidly growing field.

Students

Demand for Data Engineers is massive. Typical fields of work encompass the finance sector, the automotive and health industry, retail, and telecommunications.

An MSc degree in Data Engineering also allows students to advance to a Ph.D. and a career in academia and research institutions.

prof_kettemann.jpg

Today we are “drowning in data and starving for information” while acknowledging that “data is the new gold”. However, deriving value from all the data now available requires a transformation in data analysis, in how we see, maintain, share and understand data.

Data Engineering is an emerging profession concerned with the task of acquiring large collections of data and extracting insights from them. It is driving the next generation of technological innovation and scientific discovery, which is expected to be strongly data-driven.

The program is embedded in the “Mobility” focus area at Constructor University. This focus area investigates the mobility of people, goods, and information. Even though the Data Engineering program is centered in “Mobility”, it includes contributions from and supports applications in the two other research foci: Health (bioactive substances), and Diversity (in modern societies).

Four program tracks to allow specialization: Moreover, the Data Engineering program attracts students with diverse career goals, backgrounds, and prior work experience. Therefore, the program offers four focus tracks within which the students can choose to specialize further:

  • Computer Science,
  • Geo-Informatics,
  • Bio-Informatics
  • Business & Supply Chain Engineering.

These tracks are a preparation for the Advanced Projects within the Discovery Area and the Master Thesis.

Computer Science Track: In particular, Computer Science provides students with the skills to go beyond a mere usage of existing toolboxes and to develop innovative data analysis techniques of their own design.

Geo-Informatics Track: It gives students an introduction to Geographic Information System techniques, principles of spatial analysis, and data mining with integration of remote sensing and GPS. It thereby provides an early exposure to earth science data and its handling.

Bioinformatics Track and the analysis of biomedical data: Integration and model-based interpretation of high-throughput data are severe bottlenecks in biomedical and pharmaceutical research. Data Engineering prepares students for the novel computational challenges in these fields.

Business & Supply Chain Engineering Track: Students can also choose the specialization track in Business & Supply Chain Engineering. A vast amount of data is collected as part of business processes in particular along supply chains. In this specialization track, students will concentrate on the full data analysis cycle including pre-processing of data, data analysis and deployment of model results within the business process.

DE Study Scheme Fall 2023

The Data Engineering graduate program is composed of foundational lectures, specialized modules, industry seminars and applied project work, leading to a master thesis that can be conducted in research groups at Constructor University, at external research institutes or in close collaboration with a company. The program takes four semesters (two years). The following table shows an overview of the modular structure of the program. The program is sectioned into five areas (Core, Elective, Methods, Discovery, and Career) and the Master Thesis. All credit points (CP) are ECTS (European Credit Transfer System) credit points. In order to graduate, students need to obtain 120 CP.

Elective area (15 CP)

The Data Engineering program attracts students with diverse career goals, backgrounds, and prior work experience. Therefore, modules in this area can be chosen freely by students depending on their prior knowledge and interests.

Students may choose any combination of the modules listed below. Each track may be followed completely and/or complemented with other modules (as necessary in case of the tracks with 10 CP).

program structure

Methods area (15 CP)

In the Methods Area advanced concepts, methods and technologies of data engineering are introduced with a view towards industrial applications. Students can choose freely from the modules in this area. To enhance flexibility, students may transfer modules between the Elective and the Methods Areas (except for remedial modules) after consulting their academic advisor.

program structure

Within the Methods Area Constructor University offers special remedial modules, which are recommended to refresh knowledge or to fill knowledge gaps, preparing students to successfully take the Data Engineering Core Area modules. Based on a placement test in the orientation week, the academic advisor will propose which of the modules are useful depending on prior knowledge of the student.

program structure

Discovery area (15 CP)

This area features in the first semester a Project Seminar introducing the students to Current Topics and Challenges in Data Engineering, which is followed by two advanced projects in Data Engineering in semesters 2 and 3, each worth 5 CP. The projects can be done in the research groups at Constructor University or during internships in companies. The projects are supervised by Constructor University faculty.

program structure

Career area (15 CP)

In this area students acquire skills preparing them for a career as data engineers in industry.

program structure

Master thesis (30 CP)

In the fourth semester, students conduct research and write a master thesis guided and supported by their academic advisor.

program structure

Get in touch - let your future start at Constructor University

Better Quality Iana Domnina

Iana Domnina

Recruitment Counselor & Admissions Officer

Student Recruitment Master’s Degrees

Email:  [email protected]

Office: RLH, Room 291

WhatsApp: +49 172 3292745

Book an appointment

Do you have any questions or need consultation? Call us or write us – we are happy to help you with your inquiry.

Graduate FAQ

For PhD Degrees please contact:

Dr. Svenja Frischholz

Head of academic advising services

[email protected]

Phone: +49 421 200 4338

The Data Engineering graduate program is targeted towards students who have completed their BSc in areas like computer science, physics, applied mathematics, statistics, electrical engineering, communications engineering or related disciplines , and who want to deepen their knowledge and proceed to research-oriented work towards a master or ultimately a PhD degree.

Students learning

The program aims to provide an in-depth understanding of the essential aspects of data-based decision making and the skills required to apply and implement these powerful methods in a successful and responsible manner. Apart from the necessary programming skills, this comprises:

  •     Methods of data acquisition both from the internet and from sensors;
  •     Methods to efficiently store and access data in large and distributed data bases;
  •     Statistical model building including a wide range of data mining methods, signal processing, and machine learning techniques;
  •     Visualization of relevant information;
  •     Construction and use of confidence intervals, hypothesis testing, and sensitivity analyses;
  •     The legal foundations of Data Engineering;
  •     Scientific qualification;
  •     Competence to take up a qualified employment in Data Engineering;
  •     Competence for responsible involvement in society;
  •     Personal growth.

The study program Data Engineering is part of the School of Computer Science & Engineering.

Understanding the worldwide flow of people, goods and information is important in today’s globalized world.

Information influences the life of the individual and the cohesion of societies and cultures in many different ways.

Expertise in different disciplines, such as computer science, communication technology, logistics, mathematics and psychology are brought together in the development of new solutions.

Constructor University Spring 2024

Become part of a global community

Constructor University Students come from all over the world to live and learn at Constructor University. Our student body represents 120 nations. They form an ambitious campus community whose internationality is unprecedented in Europe. Constructor University’s green and tree-shaded 80-acre campus provides much more than buildings for teaching and research.

@ 2024 Constructor University Bremen gGmbH

  • Privacy Policy

ScholarsArchive@OSU

Advancing spatial data engineering and analysis: integrative approaches with gis, statistical modelling, and deep learning.

Are you sure you want to send a request to delete this work? Your request will be reviewed and you will receive an email when it's processed.

Downloadable Content

data engineering thesis

Descriptions

Attribute NameValues
Creator
Abstract
Contributor
License
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Subject
Rights Statement
Publisher
Peer Reviewed
Language

Relationships

This work has no parents.

  • Graduate Theses and Dissertations (GTD)
Thumbnail Title Date Uploaded Visibility Actions
2024-01-09 Public

Northeastern University

Academic Catalog 2024-2025

Data analytics engineering, ms.

For program contact information, please visit this website . 

The Department of Mechanical and Industrial Engineering offers the Master of Science in Data Analytics Engineering to meet the current and projected workforce demands. This degree program offers students an opportunity to train for industry jobs or to acquire rigorous analytical skills and research experience to prepare for a doctoral program in health, security, and sustainability at Northeastern University. While the core courses for this program are offered by the College of Engineering, students can choose elective courses from diverse disciplines spread across various colleges at Northeastern. The MS degree in data analytics engineering is designed to train students with engineering, science, mathematics, and statistics backgrounds as advanced data analytics professionals and researchers who can transform large streams of data into understandable and actionable information for the purpose of making decisions. The key sectors that require analytics professionals include healthcare, smart manufacturing, supply chain and logistics, national security, defense, banking, finance, marketing, human resources, and sports.

The Master of Science in Data Analytics Engineering program helps students acquire knowledge and skills to:

  • Discover opportunities to improve products, processes, systems, and enterprises through data analytics
  • Apply optimization, statistical, and machine-learning methods to solve complex problems involving large data from multiple sources
  • Process and explore data from a variety of sources, including Internet of Things, an integrated network of devices and sensors, customer touch points, processes, social media, and people
  • Work with technology teams to design and build large and complex SQL and NoSQL databases
  • Use tools and methods for data mining, Big Data processing, and data visualization to generate reports for analysis and decision making
  • Create integrated views of data collected from multiple sources of an enterprise
  • Understand and explain results of data analytics to decision makers
  • Design and develop data analytics projects

This degree program seeks to prepare students for a comprehensive list of tasks including collecting, storing, processing, and analyzing data; reporting descriptive statistics and patterns; performing diagnostic, predictive, and prescriptive analytics; drawing conclusions and insights; making actionable recommendations; and designing and managing data analytics projects. 

General Degree Requirements

To be eligible for admission to any of the MS degree programs, a prospective student must hold a Bachelor of Science degree in engineering, science, mathematics, statistics, or an equivalent field. Students in all master’s degree programs must complete a minimum of 32 semester hours of approved coursework (exclusive of any preparatory courses) with a minimum grade-point average of 3.000. Students can complete a master's degree by pursuing any of one of the three tracks: coursework option, project option, and thesis option. Specific degree requirements for each of these tracks can be found under the Program Requirements tab. Students may pursue any program either on a full-time or part-time basis; however, certain restrictions may apply.

Specific Degree Requirements

Core courses for the Master of Science in Data Analytics Engineering provide students with a foundation in algorithms and optimization, statistics, data and knowledge engineering, data mining, and visualization. These courses are designed to provide students with a strong understanding of probability and statistics, statistical learning, optimization methods, data mining, database design, and visualization. Students can select electives from a wide range of fields including business, finance, engineering, healthcare, manufacturing, and urban communities/cities. Elective courses provide students with the knowledge and understanding of descriptive, prescriptive, diagnostic, and predictive analytics as applied to a specific field of interest such as business, healthcare, manufacturing, and urban communities/cities. Alternatively, students can select their electives so that they can prepare for a doctoral program by taking advanced courses in mathematics, statistics, machine learning, natural language processing, and pattern recognition. 

Academic and Research Advisors

All nonthesis students are advised by the faculty advisor designated for their respective concentration or program. Students willing to pursue the thesis option must first find a research advisor within their first year of study. The research advisor will guide the students' thesis work, and thesis reader(s) may be assigned at the discretion of their research advisor. The research advisor must be a full-time or jointly appointed faculty in the MIE department. However, if the research advisor is outside the MIE department, before the thesis option can be approved, a faculty member with 51% or more appointments in the MIE department must be chosen as co-advisor, and a petition must be filed and approved by the co-advisor and the MIE Graduate Affairs Committee. Thesis option students are advised by the faculty advisor of their concentration before they select their research advisor(s). The research advisor and co-advisor must serve as thesis readers.

Plan of Study and Course Selection

It is recommended that all new students attend orientation sessions held by the MIE department and the Graduate School of Engineering to acquaint themselves with the coursework requirements and research activities of the department as well as with the general policies, procedures, and expectations.

In order to receive proper guidance with their coursework needs, all MS students are strongly encouraged to complete and submit a fully signed Plan of Study to the department before enrolling in second-semester courses. This form not only helps the students manage their coursework but it also helps the department to plan for requested course offerings. The PS form may be modified at any time as the students progress in their degree programs. 

Students pursuing study or research under the guidance of a faculty member can choose the project option by taking  Master’s Project ( IE 7945 ) . An MS project must be petitioned to the MIE Graduate Affairs Committee and approved by both the faculty member (instructor for Master's Project) and the student's academic advisor. The petition must clearly state the reason for taking the project course; a brief description of the goals; as well as the expected outcomes, deliverables, and grading scheme. 

Options for MS Students (Coursework Only, Project, or Thesis)

Students accepted into any of the MS programs in the MIE department can choose one of the three options: coursework only, project, or thesis. Please see the Program Requirements tab on the top menu of this page for more information. MS students who want to pursue project or thesis options must find, within the first year of their study, a faculty member or a research advisor who will be willing to direct and supervise a mutually agreed research project or MS thesis. Moreover, students who receive financial support from the university in the form of a research, teaching, or tuition assistantship must complete the thesis option.

Students who complete the thesis option must make a presentation of their thesis before approval by the department. The MS thesis presentation shall be publicly advertised at least one week in advance and all faculty members and students may attend and participate. If deemed appropriate by the research advisor, other faculty members may be invited to serve as thesis readers to provide technical opinions and judge the quality of the thesis and presentation. 

Change of Program/Concentration

Students enrolled in any of the MIE department programs or concentrations may change their current program or concentration no sooner than the beginning of their second full-time semester of study. In order for the program or concentration change request to be considered by the MIE Graduate Affairs Committee, the student must not be in the first semester of their current program, must have a 3.300 GPA, and have completed at least 8 semester hours of required coursework in their sought program at Northeastern.

Graduate Certificate Options

Students enrolled in a graduate degree program in the College of Engineering are eligible to pursue an engineering graduate certificate in addition to or in combination with the MS degree. For more information please refer to  Graduate Certificate Programs . Please note that students pursuing the Master of Science in Data Analytics Engineering are not eligible for the Graduate Certificate in Data Analytics Engineering.

Gordon Institute of Engineering Leadership

Master's Degree in   Data Analytics Engineering with Graduate Certificate in Engineering Leadership

Students may complete a Master of Science in Data Analytics Engineering in addition to earning a  Graduate Certificate in Engineering Leadership . Students must apply and be admitted to the Gordon Engineering Leadership Program in order to pursue this option. The program requires fulfillment of the 16-semester-hour curriculum required to earn the Graduate Certificate in Engineering Leadership, which includes an industry-based challenge project with multiple mentors. The integrated 32-semester-hour degree and certificate will require 16 semester hours of advisor-approved data analytics technical courses.

Engineering Business

Master's Degree in Data Analytics Engineering with Graduate Certificate in Engineering Business

Students may complete a Master of Science in Data Analytics Engineering in addition to earning a Graduate Certificate in Engineering Business . Students must apply and be admitted to the Galante Engineering Business Program in order to pursue this option. The integrated 32-semester-hour degree and certificate will require 16 semester hours of the data analytics engineering core courses and 16 semester hours from the outlined business-skill curriculum. The coursework, along with participation in cocurricular professional development elements, earn the Graduate Certificate in Engineering Business.

  • Concentrations and course offerings may vary by campus and/or by program modality.  Please consult with your advisor or admissions coach for the course availability each term at your campus or within your program modality.  
  • Certain options within the program may be  required  at certain campuses or for certain program modalities.  Please consult with your advisor or admissions coach for requirements at your campus or for your program modality. 

Complete all courses and requirements listed below unless otherwise indicated. 

Core Requirements

Course List
Code Title Hours
Foundations for Data Analytics Engineering4
or  Engineering Probability and Statistics
Computation and Visualization for Analytics4
Data Management for Analytics4
or  Data Management and Database Design
Data Mining in Engineering4
Deterministic Operations Research4
or  Algorithms
Note: , , and are required for students in Vancouver.

Complete one of the following options:

Coursework Option 1

Course List
Code Title Hours
Complete 12 semester hours from the elective course list below.12

Project Option

Course List
Code Title Hours
Master’s Project4
Complete 8 semester hours from the elective course list below.*8
*Students in Vancouver complete and 4 semester hours from the approved electives below.

Thesis Option 2

Course List
Code Title Hours
Master’s Project 4
Thesis4
Complete 4 semester hours from the elective course list below.**4
In addition to completing the thesis course, students must successfully complete the thesis submission process, including securing committee and Graduate School of Engineering signatures and submission of an electronic copy of their MS thesis to ProQuest.
*Students in Vancouver complete twice for a total of 8 semester hours.
**Students in Vancouver complete in lieu of an elective.

Optional Co-op Experience

Course List
Code Title Hours
Complete the following. Students must complete to qualify for co-op experience:
Introduction to Cooperative Education1
Co-op Work Experience0
or  Co-op Work Experience - Half-Time
or  Co-op Work Experience Abroad - Half-Time
or  Co-op Work Experience Abroad

Elective Course List  

Any course in the following list will serve as an elective course, provided the course is offered and the student satisfied prerequisites and program requirements. Students can take electives outside this list with a prior approval from the faculty advisor. 

Course List
Code Title Hours
General Engineering
Product Development for Engineers
Civil Engineering and Environmental Engineering
Time Series and Geospatial Data Sciences
Computer Science
Foundations of Artificial Intelligence
Game Artificial Intelligence
Database Management Systems
Computer Graphics
Pattern Recognition and Computer Vision
Robotic Science and Systems
Algorithms
Natural Language Processing
Machine Learning
Information Retrieval
Fundamentals of Cloud Computing
Data Science
Introduction to Programming for Data Science
Introduction to Data Management and Processing
Supervised Machine Learning and Learning Theory
Unsupervised Machine Learning and Data Mining
Electrical and Computer Engineering
Introduction to Machine Learning and Pattern Recognition
Advanced Machine Learning
Engineering Management
Engineering Project Management
Economic Decision Making
Financial Management for Engineers
Health Informatics
Introduction to Health Informatics and Health Information Systems
Data Management in Healthcare
Theoretical Foundations in Personal Health Informatics
Evaluating Health Technologies
Business of Healthcare Informatics
Improving the Patient Experience through Informatics
Management Issues in Healthcare Information Technology
Introduction to Health Data Analytics
Industrial Engineering
Healthcare Systems Modeling and Analysis
Manufacturing Methods and Processes
Human Performance
Supply Chain Engineering
Simulation Analysis
Intelligent Manufacturing
Statistical Methods in Engineering
Statistical Quality Control
Reliability Analysis and Risk Assessment
Applied Reinforcement Learning in Engineering
Statistical Learning for Engineering
Sociotechnical Systems: Computational Models for Design and Policy
Applied Natural Language Processing in Engineering
Neural Networks and Deep Learning
Information Systems
Advances in Data Sciences and Architecture
Mathematics
Introduction to Mathematical Methods and Modeling
Optimization and Complexity
Machine Learning and Statistical Learning Theory 1
Statistics for Bioinformatics
Mathematical Statistics
Applied Statistics
Regression, ANOVA, and Design
Network Science
Network Science 2
Network Economics
Bayesian and Network Statistics
Operations Research
Metaheuristics and Applications
Probabilistic Operation Research
Integer and Nonlinear Optimization
Network Analysis and Advanced Optimization
Convex Optimization and Applications
Logistics, Warehousing, and Scheduling
Physics
Network Science 1
Public Policy and Urban Affairs
Dynamic Modeling for Environmental Decision Making
Big Data for Cities
Geographic Information Systems for Urban and Regional Policy
Advanced Spatial Analysis of Urban Systems

Program Credit/GPA Requirements

32 total semester hours required (33 with optional co-op) Minimum 3.000 GPA required

Coursework option is not available to students in Vancouver.

A thesis is required for all students who receive financial support from the university in the form of a research, teaching, or tuition assistantship. The thesis topic should cover one or more of the areas from statistics, mathematics, optimization, data mining, machine learning, database design, Big Data, visualization tools, or forecasting methods. The thesis should train students for research in data and operations analytics and/or prepare them for a doctoral program.

Approved elective for students in Vancouver. 

Print Options

Send Page to Printer

Print this page.

Download Page (PDF)

The PDF will include all information unique to this page.

2023-24 Undergraduate Day PDF

2023-24 CPS Undergraduate PDF

2023-24 Graduate/Law PDF

2023-24 Course Descriptions PDF

  • Request Information
  • Find Faculty & Staff
  • Info For Toggle Info Return to Menu Menu
  • Search Open Search Close Search
  • Message from the Chair
  • Department Directory
  • Undergraduate Studies
  • Graduate Studies
  • Co-op & Experiential Learning
  • Research Themes
  • Research Centers
  • Faculty and Staff Directory
  • Annual Reports
  • Honors & Distinctions
  • Faculty Authored Textbooks
  • Faculty Hiring
  • Student Groups
  • MIE Diversity, Equity and Inclusion
  • Industrial Advisory Board
  • Resources for Current Students
  • Internal Department Portal
  • In the Media
  • Spotlight Stories

female with abstract data concept

Master of Science in Data Analytics Engineering

Using analytics to foster business success.

Through Northeastern’s interdisciplinary Master of Science, MS, Data Analytics Engineering program, you will build on your existing engineering or science foundation to gain employment at businesses of all kinds to improve their products, processes, systems, and enterprises, all through the power of data mining, data management, machine learning, and visualization.

You’ll have the flexibility to tailor your degree to your professional goals through a number of electives across Northeastern in areas including business, engineering, healthcare, manufacturing, and urban communities, computer science, and information systems.

The masters in data analytics engineering is ideal for those who wish to develop expertise in the analysis and optimization of data to solve problems and support decision-making. For those more interested in the process of generating, collecting, and mining data by developing algorithms and computational tools, Northeastern’s Khoury College of Computer Sciences also offers a Master of Science in Data Science .

Innovative Curriculum - MS in Data Analytics Engineering

Irrespective of your engineering major, you will gain rigorous analytical skills and research experience through technically advanced core courses in operations research, statistics, data mining, database management, and visualization. You can further specialize your degree with flexible electives from diverse disciplines across colleges at Northeastern in areas such as:

  • Smart Manufacturing
  • Healthcare Analytics
  • Network Science
  • Machine Learning
  • Advanced Optimization
  • Business Analytics
  • Internet of Things

Upon graduation with your MS in Data Analytics Engineering degree, you’ll be prepared to take on a data analyst position in any industry or to enter a doctoral program in areas including engineering healthcare, business, finance, security, or sustainability.

You’ll gain valuable research experience desired by employers by designing and developing analytics projects in both individual and group settings. Some recent research projects have included:

  • Investigating innovative models for tumor response forecasting to personalize pre-surgical treatment for breast cancer patients
  • Developing adaptive maintenance methods, standards, and metrics for monitoring and performance assurance (Mpass) of smart manufacturing equipment
  • Building a robust continuous objective multimodal pain assessment sensing system (COMPASS) using physiological signals and facial expressions

The MS in Data Analytics Engineering is available at Northeastern’s campus in Boston, MA as well as at our campuses in Seattle, WA and Vancouver, Canada .

The research project/thesis course is a requirement for students taking the DAE program in Vancouver.

Note: A subset of program courses are available at campuses outside of Boston.

  • An ability to identify, formulate, and solve complex engineering/scientific/quantitative problems.
  • An ability to explain and apply engineering design principles, as appropriate to the program’s educational objectives.
  • An ability to produce solutions that meet specified end-user needs with consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors.
  • Apply optimization, statistical, and machine-learning methods to solve data-driven problems.
  • Use database management methods and tools data for storage and retrieval, data processing, data visualization, and data mining.
  • An ability to apply the data analytics principles/methodologies/tools for transforming streams of data into understandable and actionable information to support decision making.
  • An ability to design and develop data analytics projects.

Over 15 graduate certificates are available to provide students the opportunity to develop a specialization in an area of their choice. Certificates can be taken in addition to or in combination with a master’s degree, or provide a pathway to a master’s degree in Northeastern’s College of Engineering. Master’s programs can also be combined with a Gordon Engineering Leadership certificate. Students should consult with their faculty advisor regarding these options.

Gordon Institute of Engineering Leadership Certificate

Students may complete a Master of Science in Data Analytics Engineering in addition to earning a Graduate Certificate in Engineering Leadership . Students must apply and be admitted to the Gordon Engineering Leadership Program in order to pursue this option. The program requires fulfillment of the 16-semester-hour curriculum required to earn the Graduate Certificate in Engineering Leadership, which includes an industry-based challenge project with multiple mentors. The integrated 32-semester-hour degree and certificate will require 16 semester hours of advisor-approved data analytics technical courses.

Engineering Business Certificate

Students may complete a Master of Science in Data Analytics Engineering in addition to earning a Graduate Certificate in Engineering Business. Students must apply and be admitted to the Galante Engineering Business Program in order to pursue this option. The program requires the applicant to have earned or be in a program to earn a Bachelor of Science in Engineering from Northeastern University. The integrated 32-semester-hour degree and certificate will require 16 semester hours of the data analytics engineering core courses and 16 semester hours from the outlined business-skill curriculum. The coursework, along with participation in co-curricular professional development elements, earn the Graduate Certificate in Engineering Business .

  • Ishpreet Kaur Sethi, MS’23
  • Zhuohan Yu MS, Data Analytics Engineering 2021
  • Shriram Karthikeyan, ME‘20
  • Aparna V. Alavilli, MS, Data Analytics Engineering 2020
  • Ankita Pradeep MS, Data Analytics Engineering 2019
  • Scholarship Report
  • Honored with Data Analytics Teaching Award

Experiential Learning

Northeastern combines rigorous academics with experiential learning and research to prepare students for real-world engineering challenges. The Cooperative Education Program , also known as a “co-op,” is one of the largest and most innovative in the world, and Northeastern is one of only a few that offers a Co-op Program for graduate students. Through this program students gain professional experience as part of the academic curriculum employed in their field of interest, giving them a competitive advantage upon graduation.

While on co-op, students spend a 4-, 6-, or 8-month placement working in industries ranging from finance and technology to energy and healthcare here in Boston and across the country. Recent MS in Data Analytics Engineering co-op partners include Amazon, Apple, Fidelity Investments, Fraunhofer USA, Wayfair, Roku, IBM, Grantham, Mayo, Van Otterloo & Co, LLC, Commonwealth of Massachusetts, Natixis Investment Managers, State Street Global Services, and McKinsey & Company, Inc.

student pointing to wall of paper with various charts and graphs

Program Goals

The Master of Science (MS) in Data Analytics Engineering is designed to help students acquire knowledge and skills to:

  • Discover opportunities to improve systems, processes, and enterprises through data analytics
  • Apply optimization, statistical, and machine-learning methods to solve complex problems involving large data from multiple sources
  • Collect and store data from a variety of sources, including Internet of Things (IoT), an integrated network of devices and sensors, customer touch points, processes, social media, and people
  • Work with technology teams to design and build large and complex SQL databases
  • Use tools and methods for data mining, big-data algorithms, and data visualization to generate reports for analysis and decision making
  • Create integrated views of data collected from multiple sources of an enterprise
  • Understand and explain results of data analytics to decision makers
  • Design and develop analytics projects

This degree program seeks to prepare students for a comprehensive list of tasks including collecting, storing, processing, and analyzing data; reporting statistics and patterns; drawing conclusions and insights; and making actionable recommendations.

The in-demand field of data analytics opens career doors around the globe and across industries, including healthcare, smart manufacturing, supply chain and logistics, national security, defense, banking, finance, marketing, and human resources.

Demand for data professionals has never been higher. Employment of operations research analysts is projected to grow 23 percent from 2021 to 2031, much faster than the average for all occupations, and the median annual wage is $83,000, upwards to $160,000 according to the U.S. Bureau of Labor Statistics, May 2021.

Academic Advising

The Academic Advisors in the Graduate Student Services office can help answer many of your questions and assist with various concerns regarding your program and student record. Use the link below to also determine which questions can be answered by your Faculty Program Advisors and OGS Advisors.

  • Graduate Student Services

Admissions & Aid

Ready to take the next step? Review degree requirements to see courses needed to complete this degree. Then, explore ways to fund your education. Finally, review admissions information to see our deadlines and gather the materials you need to Apply.

  • Degree Requirements
  • Admissions Information
  • TUITION & FINANCIAL AID
  • Connect with a Current Student

Student News

data engineering thesis

Expanding Career Opportunities Through the Galante Engineering Business Program

Through continued interest in the field, Tejas Karwa, E’23, industrial engineering, MS’24, data analytics engineering, joined the Galante Engineering Business Program to expand his knowledge of the engineering industry and broaden his career prospects.

data engineering thesis

2024 Lux. Veritas. Virtus. Inductees

Twenty-three engineering graduate students were inducted into the newly established Lux. Veritas. Virtus. society, a prestigious honor that recognizes exceptional graduate students who exemplify the university’s mission, ideals, and values.

data engineering thesis

2024 Huntington 100 COE Recipients

Sixteen students from the College of Engineering were selected as 2024 members of the Northeastern University “Huntington 100,” which is a group of students selected for their outstanding achievements which commensurate with the university’s mission, ideals, values, and Academic plan.

data engineering thesis

Creating a Roadmap to Global Opportunities

After completing an undergraduate degree in mathematics in her home of New Delhi, Ishpreet Kaur Sethi, MS’23, data analytics engineering, came to Northeastern’s College of Engineering to earn a master’s degree and maximize the co-op experience by working as a data analyst at the UK-based Alchemab Therapeutics. Now she is poised to pursue global opportunities.

  • Information Systems and Applied Computer Sciences
  • Professorship for Data Engineering
  •   University of bamberg
  • Subject groups
  • Computer Science

When you are interested in writing a Bachelor's or Master's thesis, please send me an e-mail (maximilian.schuele(at)uni-bamberg.de) with your transcript of records attached and your preferred topic wish.

Topics requiring C++

Sql++ extending database systems by building blocks.

The idea is to expose building blocks of database systems, for example hash tables, for data mining and machine learning algorithms. The thesis should build upon an existing open-source database system, for example Hyrise, where you should first write selected algorithms in SQL. Afterwards, you should improve the performance of the algorithms by adding certain suboperators to the database system. For example, you can perform gradient descent iteratively using recursive CTEs. Then you create an operator for iterations, for example trampolin, with less memory consumption. The building blocks should be accessed through an extension of SQL by user-defined functions (UDFs), calles SQL++.

SQL Compiler for LeanStore

LeanStore is an open-source system for OLTP and OLAP transaction but lacks an SQL interface. The goal of this thesis is to write a query compiler in C++.

Code-Generation for GPU Database Systems

Modern database system generate code instead of interpreting function call for an operator trees. In this thesis, you have to generate code to run on GPUs and investigate, how SIMT (single instruction mutliple threads) will accelerate query processing.

Camera Crowd Detection

You have to implement a protocol to access camera data for crowd detection.

Topic based on SQL

Usage of higher-order lambda functions.

For this topic, you have to ellaborate use cases for higher-order lambda functions in SQL. Lambda functions exist only as functions to customise database operators but the plan is to extend database systems by higher-order functions. A use case would be a ModelDB: a relation that stores different models that should be executed at runtime. In your thesis, you have to invent use cases for higher-order SQL lambda functions based on existing models for data pipelines.

Write a NumPy to SQL compiler based on a relational representation of matrices to compute matrix algebra in database systems.

Gleichungslösen in SQL | Equation Solving in SQL

Write or generate SQL queries in order to solve mathematical equation in SQL.

AutoDiff2SQL

Generate the SQL code for automatically deriving model functions to train neural networks in SQL.

data engineering thesis

  • DSpace@MIT Home
  • MIT Libraries
  • Doctoral Theses

Advanced process data analytics

Thumbnail

Other Contributors

Terms of use, description, date issued, collections.

Show Statistical Information

  • Zur Metanavigation
  • Zur Hauptnavigation
  • Zur Subnavigation
  • Zum Seitenfuss

Photo: Sarah Buth

Bachelor and Master Thesis

We offer a variety of cutting-edge and exciting research topics for Bachelor's and Master's theses. We cover a wide range of topics from Data Science, Natural Language Processing, Argument Mining, the Use of AI in Business, Ethics in AI and Multimodal AI. We are always open to suggestions for your own topics, so please feel free to contact us. We supervise students from all disciplines of business administration, business informatics, computer science and industrial engineering.

Thesis Topics

Example topics could be:

  • Conversational Artificial Intelligence in Insurance and Finance
  • Natural Language Processing for Understanding Financial Narratives: An Overview
  • Ethics at the Intersection of Finance and AI: A Comprehensive Literature Review
  • Explainable Natural Language Processing for Credit Risk Assessment Models: A Literature Review

Thesis Template

  • Latex Template for bachelor and master theses
  • How to use the latex template

Q1: How many pages do I need to write?

A: In general, the number of pages is only a poor indicator of the quality of a thesis. However, as a rule of thumb, bachelor theses should have around 30 pages, while master theses should be around 60 pages of main content (that is, without the appendix and lists of tables, symbols, figures, references etc.).

Q2: How often should I meet with my supervisor?

A: Your supervisors are typically very busy people. However, don't hesitate to ask in case you have questions. For instance, if you are unsure of some requirements, or in case you have methodological problems, it is absolutely necessary to talk to your supervisor. As a rule of thumb, you should meet at least three times (once in the beginning, once in the middle, and once before the submission).

Q3: Am I allowed to use any AI models in the process of writing my thesis?

A: In general, we neither forbid nor recommend the use of AI for writing support. However, if you use AI, please inform your supervisor. Also, you need to adhere to the recommendations on the use of AI writing assistants given by the faculty.

Q4: How much time do I have?

A: The exact timing is dependent on your study program! Thus, please check the examination requirements before the official start of your thesis -- you are responsible for sticking to the rules.

North Carolina A&T University Main Logo

  • College of Engineering
  • Academic Departments
  • Computational Data Science and Engineering
  • MS Degree Requirements

MS in Data Science and Engineering: Degree Requirements

The MS in Data Science and Engineering program offers three options, each requiring a total of 30 credits. The minimum requirements for all options are 

  • 12 credits from the core;
  • 1 credit Graduate Seminar CSE 792;
  • Research experience (CSE 797 for Thesis Option or 795 for Project Option or 794 for Course-Only Option); and
  • The balance of 30 credit hours, which depends on which option (thesis, project, or course only), of which 6 credit hours are expected to be CSE elective courses.
  • CSE 620: Introduction to Computational Software Tools.  
  • CSE 704: Data Processing and Visualization. 
  • CSE 708: Data Analytics and Engineering Applications. 
  • CSE 817: Fundamentals of Big Data Analysis.

Course-Only Option

  • Take 12 credit hours to cover the four core courses: CSE 620, CSE 704, CSE 708, and CSE 817.
  • Take 15 credit hours at the graduate level from CDSE, Engineering, Mathematics, Science, or Technology, or other graduate courses approved by the CDSE Department. In general, graduate courses from the STEM disciplines are typically approved.
  • Take 1 credit hour for Graduate Seminar 792
  • Take at least 2 credit hours to complete 30 credit hours from CSE courses, approved graduate-level courses (see above), or Supervised research.

Project Option

  • Take 12 credit hours at the graduate level from CDSE, Engineering, Mathematics, Science, or Technology, or other graduate courses approved by the CDSE Department. In general, graduate courses from the STEM disciplines are typically approved.
  • Take 3 credit hours of MS Project, CSE 795
  • Take at least 2 credit hours to complete 30 credit hours from CSE courses, approved graduate-level courses (see above), Supervised teaching or Supervised research, or Thesis continuation CSE 799.

Thesis Option

  • Take 9 credit hours at the graduate level from CDSE, Engineering, Mathematics, Science, or Technology, or other graduate courses approved by the CDSE Department. In general, graduate courses from the STEM disciplines are typically approved.
  • Take 6 credit hours of MS Thesis, CSE 797
  • Take at least 2 credit hours to complete 30 credit hours from CDSE courses, approved graduate-level courses (see above), MS Thesis CSE 792, Supervised teaching or Supervised research, or Thesis continuation CSE 799.
  • Current Students
  • Faculty + Staff
  • Alumni + Friends
  • Parents + Family
  • Community + Visitors
  • Bachelor's Degrees
  • Master's Degrees
  • Doctorate Degrees
  • Certificates
  • Arts or Design
  • Business & Industry
  • Communications & Media
  • Data Analytics & Information
  • Health & Wellness
  • Humanities & Social Sciences
  • Music & Performing Arts
  • Public Service
  • Multidisciplinary
  • Still Exploring & Undetermined
  • International
  • Bienvenidos
  • Featured Videos
  • College Tour
  • Tuition & Aid
  • Student Life
  • Search Type Search Search
  • Quicklinks:
  • STUDENT EMAIL
  • UNT DIRECTORY
  • INFO FOR CURRENT STUDENTS
  • INFO FOR FACULTY + STAFF
  • INFO FOR ALUMNI + FRIENDS
  • INFO FOR PARENTS + FAMILY
  • INFO FOR COMMUNITY + VISITORS
  • UNT LIBRARIES
  • UNT CALENDAR
  • JOBS AT UNT

data engineering thesis

Data Engineering Master's

Want more info.

We're so glad you're interested in UNT! Let us know if you'd like more information and we'll get you everything you need.

Why Earn a degree in Data Engineering?

Considering a career in one of technology’s most in-demand fields? The College of Engineering’s new Master of Science in Data Engineering may be for you. According to the Dice 2020 Tech Job Report , jobs in data engineering are growing by 50% each year, making data engineering the fastest-growing job in technology.

The Master of Science program in Data Engineering allows you to focus on your analytical, programming and engineering skills to:

  • Creatively solve data-related analytical problems
  • Integrate messy data into clean, usable data sets
  • Organize and retrieve large data efficiently

Core courses give students a solid foundation in data engineering and will dive into big data, data analytics, data visualization and database systems.

Concentration areas focus on data engineering and biomedical engineering.

Students who graduate from this program will be able to:

  • Build and maintain data pipeline systems
  • Improve data reliability, efficiency and quality
  • Prepare complex datasets to solve difficult problems
  • Understand efficient algorithms and data structures
  • Apply data engineering skills to their field of study
  • Efficiently visualize data
  • Effectively communicate technical information
  • Quickly adapt to new technologies
  • Collaborate to solve problems
  • Understand and use data software

Data Engineering Master's Highlights

What can you do with a degree in data engineering.

While estimates vary, a recent report from O’Reilly states companies typically need a minimum 2-3 data engineers for every data scientist to successfully complete projects, and the current job market is struggling to keep up with this demand. Our M.S. in Data Engineering program prepares students to enter this thriving job market right out of college.

The most in-demand jobs are data engineers, data architects, business intelligence architects, machine learning engineers, and data warehouse engineers/developers, with many data engineers working in many different fields.

Data Engineering Master's Courses You Could Take

Learn More About UNT

Explore more options.

College of Engineering

Artificial Intelligence Master's

Computer Science Master's

Department of Computer Science and Engineering

It’s easy to apply online. Join us and discover why we’re the choice of nearly 47,000 students.

  • Who’s Teaching What
  • Subject Updates
  • MEng program
  • Opportunities
  • Minor in Computer Science
  • Resources for Current Students
  • Program objectives and accreditation
  • Graduate program requirements
  • Admission process
  • Degree programs
  • Graduate research
  • EECS Graduate Funding
  • Resources for current students
  • Student profiles
  • Instructors
  • DEI data and documents
  • Recruitment and outreach
  • Community and resources
  • Get involved / self-education
  • Rising Stars in EECS
  • Graduate Application Assistance Program (GAAP)
  • MIT Summer Research Program (MSRP)
  • Sloan-MIT University Center for Exemplary Mentoring (UCEM)
  • Electrical Engineering
  • Computer Science
  • Artificial Intelligence + Decision-making
  • AI and Society
  • AI for Healthcare and Life Sciences
  • Artificial Intelligence and Machine Learning
  • Biological and Medical Devices and Systems
  • Communications Systems
  • Computational Biology
  • Computational Fabrication and Manufacturing
  • Computer Architecture
  • Educational Technology
  • Electronic, Magnetic, Optical and Quantum Materials and Devices
  • Graphics and Vision
  • Human-Computer Interaction
  • Information Science and Systems
  • Integrated Circuits and Systems
  • Nanoscale Materials, Devices, and Systems
  • Natural Language and Speech Processing
  • Optics + Photonics
  • Optimization and Game Theory
  • Programming Languages and Software Engineering
  • Quantum Computing, Communication, and Sensing
  • Security and Cryptography
  • Signal Processing
  • Systems and Networking
  • Systems Theory, Control, and Autonomy
  • Theory of Computation
  • Departmental History
  • Departmental Organization
  • Visiting Committee
  • News & Events
  • News & Events
  • EECS Celebrates Awards

Doctoral Thesis: From Data, to Models, and Back: Making ML “Predictably Reliable”

Kiva (32-G449)

By: Andrew Ilyas

Thesis Supervisors: Costis Daskalakis, Aleksander Madry

  • Date: Friday, August 23
  • Time: 2:30 pm - 4:00 pm
  • Category: Thesis Defense
  • Location: Kiva (32-G449)

Additional Location Details:

Abstract: Despite their impressive performance, training and deploying ML models is currently a somewhat messy affair. But does it have to be? In this defense, I’ll discuss some of my research on making ML “predictably reliable”—enabling developers to know when their models will work, when they will fail, and why. To begin, we use a case study of adversarial examples to show that human intuition can be a poor predictor of how ML models operate. Motivated by this, we present a few lines of work that aim to develop a precise understanding of the entire ML pipeline: from how we source data, to the datasets we train on, to the learning algorithms to use.

Main Content

Writing a thesis is the final step in obtaining a Bachelor or Master degree. A thesis is always coupled to a scientific project in some field of expertise. Candidates who want to write their thesis in the Big Data Analytics group should, therefore, be interested and trained in a field related to our research areas .

A thesis is an independent, scientific and practical work. This means that the thesis and its related project are conducted exclusively by the candidate; the execution follows proper scientific practices; and all necessary artifacts, algorithms and evaluations have been physically implemented and submitted as part of the thesis. A proper way of sharing code and evaluation artifacts is the creation of a public GitHub repository, which can, then, be referenced in the thesis. The thesis serves as a documentation for the project and as scientific analysis and reflection of the gathered insights.

For students interested in a thesis, we offer interesting topics and a close, continuous supervision during the entire thesis time. Every thesis is supervised by at least one member of our team, who can give advice and help in critical situations. The condensed results of our best master theses have been published at top scientifc venues, such as VLDB, CIKM, EDBT, etc.

A selection of open thesis topics can be found on this page. We also encourage interested students to suggest own ideas in the context of our research areas and to contact individual members of the group directly. An ideal thesis topic is connected in some form to the research projects of a group member. That group member will then become a supervisor for the thesis. Hence, taking a look at the personal pages and our current projects is a good starter for a thesis project. Recent publications on conferences, such as VLDB or SIGMOD , or open research challenges on, for example, Kaggle are good resources for finding interesting thesis ideas.

Organizational information

  • Exposé : Before starting a thesis, Master students have to write a 2-5 pages long exposé. The exposé is a description of the planned project and includes a motivation for the topic, a literature review on related work, a draft of the research/project idea, and a plan for the final evaluation. Please consider our template with initial instructions when starting your exposé. The exposé can be created in the context of the "Selbstständiges wissenschaftliches Arbeiten" module.
  • Timetable : Once the thesis project is started, it must be finished within six months for Master and four months for Bachelor theses. Only special events, such as times of sickness, can extend this period. If you are working on a regular job or if you need to take further courses during your thesis time, the thesis time can be extended as well. A thesis can be started at any time, which is in alignment with semester times but also asynchronous to semester times.
  • Presentations : The work on a Master thesis requires students to give at least two talks. A mid-term talk serves to get some additional feedback from a larger audience and to practice the final thesis defense; this talk is not graded. The final talk is a proper defense of the thesis and the final results; this talk is graded as one part of the academic performance.

Hints for the thesis

  • Length : A typical thesis is 30-60 pages (Bachelor) and 40-90 pages (Master) long.
  • Language : A thesis can be written in German or English. We recommend English, though.
  • Format : We highly recommend writing a thesis in LaTeX, as in this way many structural defects can easily be avoided.
  • Tips for writing a thesis
  • Tips for writing a paper (short)
  • Tips for writing a paper (long)

Bachelor and Master Theses

  • We aim to translate the batch processing-based Sindy algorithms for the discovery of inclusion dependencies with Akka into a reactive, more efficient data profiling approach.
  • We aim to translate the Many algorithm for inclusion dependency discovery on Web Tables into a partializing IND discovery algorithm that is better suited for data integration scenarios.
  • The data profiling language DPQL is a recently developed metadata profiling interface that serves the discovery of complex metadata patterns.
  • We aim to develop efficient profiling approaches that find these metadata patterns as fast as possible.
  • IoT applications, multi-sensor systems and many distributed software systems record time series in different frequencies, temporal alignments, speeds, and formats, which makes their integrated analysis a technically and algorithmically challenging task. We therefore aim to develop a time series engineering library that assists the integration and preparation of time series for analytical tasks, such as anomaly detection, forecasting, clustering etc.
  • As part of the project, we could generate and measure our own times series with different sensors and aggregate the measurements afterwards with the time series library into a single multivariate time series.
  • Based on the movement events of agents in cities, we aim to plan the placement of info-stations, such that these stations inform as many nearby agents as possible in some fixed time period.
  • The project will be conducted in collaboration with the emergenCity project.
  • We will use the streams of movement data and the Lambda engine that is currently in development at the UMR.
  • Keywords: Lambda queries, lattice search
  • Given non-invasive medical sensor measurements, such as heart beats or temperature curves, we aim to find anomalous recordings that may indicate diseases or body malfunctions via modern anomaly detection, clustering and/or prediction techniques for time series.
  • The project will be conducted in collaboration with the VirtualDoc project.
  • Keywords: time series analytics, machine learning
  • In this project, we aim to slice time series into semantically meaningful subsequences. In contrast to traditional sliding or hopping windows, semantic windows should capture variable-lengths concepts, such as hearth beats in ECG data. These subsequences will then support anomaly detection algorithms or clustering algorithms in creating better results.
  • Discovering anomalies in streaming data is a challenging task; hence, we aim to translate batch anomaly detection algorithm(s) into the streaming scenario.
  • Our goal is to discovery anomalies as fast as possibly by sacrificing as little precision as possible.
  • Keywords: stream processing
  • In film scoring, certain visual scenes are accompanied by appropriate sounds; we plan to automate this process with artificial intelligence.
  • Given a database with already scored films, we first extract the scene-to-sound mappings and, then, train a model to learn the scoring process.
  • The project will be conducted in collaboration with a professional film scorer.
  • Keywords: image processing, machine learning
  • First-Line schema matching produces similarity matrices which indicate how likely two attributes of different schemata represent the same semantic concept.
  • Second-Line schema matching consumes similarity matrices and aims to produce improved similarity matrices.
  • There are two main approaches for second-line matching: 1) similarity matrix boosting and 2) ensemble matching. While the former tries to transform a given similarity matrix into a more valuable one, the latter consumes multiple matrices and combines them to a single new similarity matrix.
  • We aim to improve the Hungarian Method by improving its efficiency in exchange for a bit of fuzzyness/approximation (= reduced correctness)
  • Also interesting: Can we allow (to some extend) 1:n and n:m mappings in the attribute matching?
  • Knowledgebases are a valuable source of publicly available data and data integration scenarios. To make these scenarios usable also for relational data integration systems, this project aims to develop a shredding algorithms that translates linked open data into meaningful relational tables for data integration purposes.
  • Data integration test scenarios are very rare, especially if these scenarios should offer special properties, such as join- and unionable tables, unary and complex attributes matches, a broad selection of data types, schema-based and schema-less data, real-world data values and many other properties. This project, therefore, aims to develop a relation decomposer that takes existing, integrated datasets as input and automatically generates different integration scenarios with specific properties from these seed datasets via relational decomposition.
  • The Web Data Commons Crawl is a large dataset of relational tables that stem from crawled HTML Web tables. These tables often store data about same/similar concepts, but they are due to their crawling completely unconnected. Hence, we aim to integrate the WDC commons corpus in a possibly meaningful and correct way, which is both a technically and conceptually challenging task.
  • Data in data lakes is subject to constant changes. Data lakes, thereby, lack most of the control mechanisms that traditional database systems would use to, for example, standardise schemata, maintain indexes or enforce constraints. In this project, we aim to develop a system named lakehouse that dynamically integrates certain parts of a Data Lake to serve certain user-defined queries.
  • The National Research Data Infrastructure (NFDI) deploys a meta-search engine for research data . We aim to improve their results by integrating multi-modal and heterogeneous research data.
  • The federated learning technique DataGossip proposes to exchange not only model weights, but also some training data items for better convergence on skewed data distributions; we aim to improve this technique with more intelligent training data selection techniques.
  • Keywords: federated learning, distributed computing
  • The BYTE Challenge is a digital learning platform for computer science that targets children from grade 3 to 13.
  • In this project, we aim to assist the platform development and the assessment and curation of digital learning material, which includes videos, quizzes, papers etc.
  • Efficient Partial Inclusion Dependency Discovery
  • Entwicklung einer Chat-KI für Data Engineering
  • Image2Surface: Predicting Surface Properties of Workpieces from Laserscan Images
  • Image2Surface: Data Engineering for Visual Analytics
  • Erkennung anomaler medizinischer Muster – Analyse nicht invasiver medizinischer Daten mittels maschinellen Lernens (2024)
  • Data Generation and Machine Learning in the Context of Optimizing a Twin Wire Arc Spray Process (2023)
  • A Clustering Approach to Column Type Annotation: Effects of Pre-Clustering (2023)
  • Holistische Integration von WebDaten (2023)
  • User-Centric Explainable Deep Reinforcement Learning for Decision Support Systems (2023)
  • Combining Time Series Anomaly Detection Algorithms (2023)
  • DPQLEngine: Processing the Data Profiling Query Language (2023)
  • Aggregating Machine Learning Models for the Energy Consumption Forecast of Heat Generators (2023)
  • Correlation Anomaly Detection in High-Dimensional Time Series (2023)
  • HYPAAD: Hyper Parameter Optimization in Anomaly Detection (2022)
  • Time Series Anomaly Detection: An Aircraft Turbine Case Study (2022)
  • Distributed Duplicate Detection on Streaming Data (2021)
  • UltraMine - Scalable Analytics on Time Series Data (2021)
  • Distributed Graph Based Approximate Nearest Neighbor Search (2020)
  • A2DB: A Reactive Database for Theta-Joins (2020)
  • Distributed Detection of Sequential Anomalies in Time Related Sequences (2020)
  • Efficient Distributed Discovery of Bidirectional Order Dependencies (2020)
  • Distributed Unique Column Combination Discovery (2019)
  • Reactive Inclusion Dependency Discovery (2019)
  • Inclusion Dependency Discovery on Streaming Data (2019)
  • Generating Data for Functional Dependency Profiling (2018)
  • Efficient Detection of Genuine Approximate Functional Dependencies (2018)
  • Efficient Discovery of Matching Dependencies (2017)
  • Discovering Interesting Conditional Functional Dependencies (2017)
  • Multivalued Dependency Detection (2016)
  • DataRefinery - Scalable Offer Processing with Apache Spark (2016)
  • Spinning a Web of Tables through Inclusion Dependencies (2014)
  • Discovery of Conditional Unique Column Combination (2014)
  • Discovering Matching Dependencies (2013)

data engineering thesis

  • Blockchain in Government
  • Data Summit

Twitter

  • TOPICS: Big Data
  • BI & Analytics
  • Data Integration
  • Database Management
  • Virtualization
  • More Topics Artificial Intelligence Blockchain Data Center Management Data Modeling Data Quality Data Warehousing Database Security Hadoop Internet of Things Master Data Management MultiValue Database Technology NoSQL

Data Engineering for the Modern Enterprise

data engineering thesis

KDNuggets, a community site for data professionals, ranked “ We Don’t Need Data Scientists, We Need Data Engineers ,” by Mihail Eric, a venture capitalist, researcher, and educator, as its top story of 2021. This sentiment holds even more true today, especially with the unending rush to leverage both generative and predictive AI within enterprise operations. Without the right kind of data, AI is dead in the water.

Data engineering—which includes not only data engineers by title but also their counterparts in adjacent fields such as database administration, management, architecture, and analysis—will ensure that AI initiatives are kept alive, well, and thriving. Accordingly, data engineers have risen to become the new stars in the AI-driven organization. For the purposes of this article, we assign roles in data engineering across a cluster of data professional categories. Collectively, as part of an overall data engineering team, these professionals are setting the tone and providing the guidance needed for developing fair, accurate, and business-viable AI models.

Everyone wants to embrace AI-related large language models in a big way, which only means more demand for data engineering. As a professional category, data engineers are essential, because in many cases, data scientists have been tasked with vetting and managing data resources, which diverts their time and resources away from building data-driven narratives for their businesses. In addition, since AI algorithms are tremendous data hogs, organizations need vibrant data pipelines to maintain the effectiveness of their AI efforts. 

This is fueling significant shifts in the practice and theory of data engineering. Demand for real-time, AI-ready data is creating new challenges and opportunities for those in data engineering and adjacent fields such as database administration, management, and analysis. In the process, data engineering has entered the spotlight as enablers of the 21 st -century enterprise.

This also requires more business savvy as part of the data engineering skills mix. Conversely, business teams need to have a better understanding of their data, and what it can do for their organizations. “Data practitioners are being asked to expand their knowledge of the business—while functional teams are finding they require their own internal data expertise to leverage their data,” a recent report from MIT Technology Review Insights states.

In essence, organizations are leaning heavily on data-engineering teams to turn their data assets into gold. Considerations such as organizational structure, data platform and architecture, and data governance are all essential to this process, especially as AI gets involved. Data engineers and their related colleagues are the go-to people who can make this happen.

The role of data engineering teams has always been clear: to design, construct, and maintain data architectures and ensure the viability of data moving through the organization’s systems—and this remains the primary mission. This includes ensuring that data is available for applications when and where it is needed by the business.

Helpful practices and technologies have emerged to help data engineering teams deliver on this mission, such as DevOps, DataOps, AIOps, and collaborative pipeline tools. Automation has lifted many of the burdens of database preparation, data modeling, quality assurance, and backup and storage.

As a result, the roles of data engineering teams are being elevated, from backroom maintenance to the forefront of the business. Data engineering is evolving into a role that involves greater strategizing for businesses seeking to either monetize data, leverage data to gain advantage in their markets, or boost innovation. This also involves serving as guardians of the data, ensuring compliance, cybersecurity, and privacy. Importantly, data engineering means making sure the data is there and it is ready anytime it is needed. This new importance has resulted in “staggering growth in data engineering jobs,” the MIT report states.  

The evolving nature of data engineering can be seen in recent job descriptions:

  • Ensure data accessibility for all: “Focus on centralizing our existing data into a library to make it more accessible to our teams and departments. Combine traditional and loosely connected intelligence data, developing and maintaining ETL processes, and building custom data solutions. Build and maintain scalable pipelines and infrastructure using AWS Step functions.”
  • Oversee both data analysis and data governance: “Analyze business processes to identify areas for improvement and optimization. Collaborate with stakeholders to gather and document business requirements. Provide insights and recommendations based on data analysis and business needs. Develop and maintain data governance frameworks. Ensure data quality and integrity across various business processes.”
  • Provide leadership to data teams: “Report to the chief technology officer and focus on driving the data strategy as a connected part of the brand ecosystem. Responsible for leading the development of data assets, privacy frameworks, and data standards and providing input into and managing the data roadmap. Contribute to privacy frameworks, terms, consents, and approaches to ensure we empower consumers, to effectively leverage data for their advantage, to abide by all laws and relevant best practices.”
  • Take the lead on AI initiatives: “Collaborate with data scientists and analysts to understand data requirements and translate them into scalable, high performant data pipeline solutions. Support data discovery and data preparation for model development. Perform detailed analysis of raw data sources by applying business context and collaborate with cross-functional teams to transform raw data into curated and certified data assets to be used for machine learning and business intelligence use cases. Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues. Develop, test, and maintain robust tools, frameworks, and libraries that standardize and streamline the data and machine learning lifecycle.”

Key to all data engineering roles is ensuring that the business comes first, and that all activity is directly connected with business requirements. Today’s data engineer needs to be a technologist, leader, facilitator, and troubleshooter.

This calls for close collaboration with data scientists and AI training specialists to ensure their models are receiving the data required to support business decision-makers and decisioning systems. Just as importantly, data engineering teams need to work in tandem with data owners to ensure that the right data sources are being tapped, and with end users to ensure they are working with the best available information. 

The challenge in today’s environment is to remove the obstacles and mitigate the challenges to effective data engineering—to ensure that data pipelines keep flowing, that much of the process is automated, is developed and operating collaboratively, and data insights are on target with business requirements.

Even if your job title is something other than “data engineer,” in many ways, everyone on the data team now has a role to play in ensuring the viability of a data-driven and AI-driven business.

Newsletters

Subscribe to Big Data Quarterly E-Edition

Recent Big Data Quarterly Issues

Big Data Quarterly: Summer 2024 Issue

  • Big Data Quarterly: Summer 2024 Issue
  • Big Data Quarterly: Spring 2024 Issue
  • Big Data Quarterly: Winter 2023 Issue
  • Big Data Quarterly: Fall 2023 Issue
  • Big Data Quarterly: Summer 2023 Issue

White Papers

Modern Data Integration and Governance for the AI Era

  • Modern Data Integration and Governance for the AI Era
  • The Complete Guide to Optimizing Cloud Data Spend
  • EDB Postgres Ai: Geo-distributed, Extreme High Availability
  • Customer Success Story: telegra deploys EDB Postgres Distributed to provide high availability for mission critical Postgres databases
  • TPRM privacy compliance: Questions to ask when working with third parties
  • Building Modern Data Apps: Choosing the Right Foundation and Tools
  • APM and DB Monitoring: Necessity or Overkill?
  • Modern Data Architecture for AI
  • Data Security and Governance for the AI Era
  • Data Engineering Best Practices for AI

EDB Postgres Ai: Geo-distributed, Extreme High Availability

  • Business Intelligence and Analytics
  • Cloud Computing
  • Data Center Management
  • Data Modeling
  • Data Quality
  • Data Warehousing
  • Database Security
  • Master Data Management
  • MultiValue Database Technology
  • NoSQL Central
  • DBTA E-Edition
  • Data and Information Management Newsletters
  • DBTA 100: The 100 Companies that Matter in Data
  • Trend Setting Products in Data and Information Management
  • DBTA Downloads
  • DBTA SourceBook
  • Defining Data
  • Destination CRM
  • Faulkner Information Services
  • InfoToday.com
  • InfoToday Europe
  • ITIResearch.com
  • Online Searcher
  • Smart Customer Service
  • Speech Technology
  • Streaming Media
  • Streaming Media Europe
  • Streaming Media Producer

IOUG

Open Access Theses and Dissertations

Thursday, April 18, 8:20am (EDT): Searching is temporarily offline. We apologize for the inconvenience and are working to bring searching back up as quickly as possible.

Advanced research and scholarship. Theses and dissertations, free to find, free to use.

Advanced search options

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in any language English Portuguese French German Spanish Swedish Lithuanian Dutch Italian Chinese Finnish Greek Published in any country US or Canada Argentina Australia Austria Belgium Bolivia Brazil Canada Chile China Colombia Czech Republic Denmark Estonia Finland France Germany Greece Hong Kong Hungary Iceland India Indonesia Ireland Italy Japan Latvia Lithuania Malaysia Mexico Netherlands New Zealand Norway Peru Portugal Russia Singapore South Africa South Korea Spain Sweden Switzerland Taiwan Thailand UK US Earliest date Latest date

Sorted by Relevance Author University Date

Only ETDs with Creative Commons licenses

Results per page: 30 60 100

October 3, 2022. OATD is dealing with a number of misbehaved crawlers and robots, and is currently taking some steps to minimize their impact on the system. This may require you to click through some security screen. Our apologies for any inconvenience.

Recent Additions

See all of this week’s new additions.

data engineering thesis

About OATD.org

OATD.org aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions . OATD currently indexes 7,202,573 theses and dissertations.

About OATD (our FAQ) .

Visual OATD.org

We’re happy to present several data visualizations to give an overall sense of the OATD.org collection by county of publication, language, and field of study.

You may also want to consult these sites to search for other theses:

  • Google Scholar
  • NDLTD , the Networked Digital Library of Theses and Dissertations. NDLTD provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not.
  • Proquest Theses and Dissertations (PQDT), a database of dissertations and theses, whether they were published electronically or in print, and mostly available for purchase. Access to PQDT may be limited; consult your local library for access information.

Duke Electrical & Computer Engineering

Master of Science in ECE

Uniquely interdisciplinary and flexible: coursework-only, project and thesis options.

Miroslave Pajic with students and robot

Program Benefits

The 30-credit Duke Master of Science in Electrical & Computer Engineering degree provides a unique combination of opportunities:

  • World-class research Integrated into a project-based learning environment
  • Flexible, individualized curriculum You choose: Thesis, Project or Coursework-only options
  • Professional development opportunities Take an internship or teaching assistantship
  • Excellent graduate outcomes Enter an elite PhD program or launch an industry career
  • Project MS option: 3 credits of ungraded research may substitute for standard coursework.
  • Thesis MS option: Up to 6 credits of ungraded research may substitute for standard coursework.
  • Responsible Conduct in Research (RCR) —3 training forums
  • ECE Master’s Success Seminar (ECE 701)—0 credits Weekly seminar (no tuition). Required for students entering Fall 2024 or later.
I was looking for that strong university-industry connection. That, along with the flexibility of the coursework, which gave me a lot more bandwidth for research, made Duke the best fit for me, in the end. Aniket Dalvi ’21 PhD Candidate at Duke University LinkedIn Logo

Choose Your Study Track

abstract image showing lines of software programming code

Degree Options & Requirements

  • Only graduate-level courses (500 and above) satisfy MS degree requirements.
  • No more than two ECE 899: Independent Study courses may be taken.
  • English for International Students (EIS) courses (EGR 505, 506, 705, 706) do not count toward the 30 total units required for the MS degree.
  • Students must maintain a 3.0 cumulative GPA to remain in good standing and to graduate.
  • Course selection is formally approved by submitting a Program of Study form.
  • MS students (except Duke 4+1) are required to take at least three full-time semesters to graduate.

Coursework Only

Requirements.

  • 30 units of graduate-level coursework as determined by the curricular track course requirements
  • ECE 701—ECE Master’s Success Seminar (0 credit, tuition-free) Required for students entering Fall 2024 or later.
  • 3 Responsible Conduct in Research (RCR) training forums in order to graduate.

Coursework MS Final Exam

The Graduate School requires a final exam approved by a committee made up of three Graduate Faculty members. The committee must be approved by the Director of Graduate Studies and the Dean of the Graduate School at least one month prior to the examination date. The student is not required to generate a written document for the ECE department, and the format of the exam is determined by the department.

  • 3 units of ungraded research (if desired, to substitute for standard coursework)

Project MS Final Exam

For the project option, a written research report and oral presentation are required to be presented to a committee made up of the student’s advisor and two other members of the graduate faculty, one of whom must be from a department other than ECE or outside the student’s main curricular area. The committee must be approved by the Director of Graduate Studies and the Dean of the Graduate School at least one month prior to the examination date. The formats of the written and oral project reports are determined by the student’s advisor. The project report is not submitted to the Graduate School; however, a final copy must be submitted to the ECE Department.

  • Up to 6 units of ungraded research (if desired, to substitute for standard coursework)

Thesis MS Final Exam

A written thesis must be uploaded by the guidelines presented in the Graduate School’s Guide for the Electronic Submission of Thesis and Dissertation , and the thesis must be defended orally before a committee composed of the faculty member under whose direction the work was done and at least two other members of the graduate faculty, one of whom must be from a department other than ECE or outside the student’s main curricular area. The committee must be approved by the Director of Graduate Studies and the Dean of the Graduate School at least one month prior to the examination date.

Additional Information

  • Complete Degree Requirements (PDF)
  • Admissions Requirements
  • Application Deadlines
  • Tuition & Financial Aid
  • Career Services

aerial view of Duke Chapel with fall trees

Take the Next Step

Want more information? Ready to join our community?

Master’s Contacts

Kevyn Light Profile Photo

Kevyn Light

Senior Program Coordinator

Matt Novik Profile Photo

Graduate Program Coordinator

Miroslav Pajic Profile Photo

Miroslav Pajic

Director of Master’s Studies, Professor in the Department of ECE

Krista Turner Profile Photo

Krista Turner

Master’s Program Coordinator

More Options

Meng in electrical & computer engineering, meng in photonics & optical sciences, introductory c programming specialization (online).

A survey and identification of generative adversarial network technology-based architectural variants and applications in computer vision

  • Original Article
  • Published: 14 August 2024

Cite this article

data engineering thesis

  • Kirtirajsinh Zala 1   na1 ,
  • Deep Thumar 2   na1 ,
  • Hiren Kumar Thakkar   ORCID: orcid.org/0000-0002-4196-7651 3   na1 ,
  • Urva Maheshwari 2   na1 &
  • Biswaranjan Acharya 4   na1  

Explore all metrics

The surging popularity of generative adversarial networks (GANs) has ignited a wave of innovation in the realm of computer vision, a highly explored subfield of deep learning. GANs are revolutionizing the area of machine learning because they use a game-based training technique. This is in contrast to traditional approaches to machine learning, which center on feature learning and picture production. Several subfields of computer vision have seen tremendous progress thanks to the integration of numerous processing approaches, including image processing, dynamic processing, text, audio, and video processing, as well as generalized generative adversarial networks (GANs). Nevertheless, despite the fact that GANs have made great progress, they still offer promise that has not been fully realized and space for additional development. GANs have a wide range of applications within computer vision, including data augmentation, displacement recording, dynamic modeling, and image processing. This article digs into recent advances made by GAN researchers working in the realm of AI-based security and defense and discusses their accomplishments. In particular, we investigate how well image optimization, image processing, and image stabilization are incorporated into GAN-driven picture training. We want to achieve our goal of providing a complete overview of the present status of GAN research by carefully evaluating research articles that have been subjected to peer review.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

data engineering thesis

Similar content being viewed by others

data engineering thesis

A Systematic Review on Generative Adversarial Network (GAN): Challenges and Future Directions

data engineering thesis

Generative Adversarial Networks: A Survey on Training, Variants, and Applications

data engineering thesis

Potential of generative adversarial net algorithms in image and video processing applications– a survey

Explore related subjects.

  • Artificial Intelligence

Data availability

This statement is Not applicable to the current study.

Adler J, Lunz S (2018) Banach wasserstein gan. Adv Neural Inf Process Syst 31:1049

Google Scholar  

Antipov G, Baccouche M, Dugelay JL (2017) Face aging with conditional generative adversarial networks. In: 2017 ieee international conference on image processing (ICIP), pp 2089–2093. https://doi.org/10.1109/ICIP.2017.8296650

Arjovsky M, Bottou L (2017) Towards principled methods for training generative adversarial networks. https://doi.org/10.48550/ARXIV.1701.04862 , arXiv:1701.04862

Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, proceedings of machine learning research, vol. 70 PMLR, pp 214–223. https://proceedings.mlr.press/v70/arjovsky17a.html

Arora R, Zhang L, Pecht M (2020) Generative adversarial networks for electrical prognostics. Progn Health Manag 1(2):15–23

Baker J (1975) The dragon system-an overview. IEEE Trans Acoust Speech Signal Process 23(1):24–29. https://doi.org/10.1109/TASSP.1975.1162650

Article   Google Scholar  

Barnes C, Shechtman E, Finkelstein A et al (2009) PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph (Proc SIGGRAPH) 28(3):24

Barua S, Erfani SM, Bailey J (2019) FCC-GAN: a fully connected and convolutional net architecture for gans. CoRR abs/1905.02417. arXiv:1905.02417

Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50

Berthelot D, Schumm T, Metz L (2017) Began: boundary equilibrium generative adversarial networks. https://doi.org/10.48550/ARXIV.1703.10717 , arXiv:1703.10717

Bharath K (2022) Implementing conditional generative adversarial networks. https://blog.paperspace.com/conditional-generative-adversarial-networks/ . Accessed 24 Oct 2023

Brock A, Lim T, Ritchie JM et al (2016) Neural photo editing with introspective adversarial networks. https://doi.org/10.48550/ARXIV.1609.07093 , arXiv:1609.07093

Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096

Brophy E, Wang Z, Ward TE (2019) Quick and easy time series generation with established image-based gans. https://doi.org/10.48550/ARXIV.1902.05624 , arXiv:1902.05624

Can C (2015) Research of the coordination control of the intersection based on the cooperative vehicle-infrastructure system. MS thesis, Department Transport, Beijing Jiaotong University, Beijing, China

Chen S (2019) Techniques in self-attention generative adversarial networks. https://pub.towardsai.net/techniques-in-self-attention-generative-adversarial-networks-22f735b22dfb

Chen X, Duan Y, Houthooft R, et al (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. https://doi.org/10.48550/ARXIV.1606.03657 , arXiv:1606.03657

Chen Y, Wu J, Cui M (2018) Automatic classification and detection of oranges based on computer vision. In: 2018 IEEE 4th international conference on computer and communications (ICCC), pp 1551–1556. https://doi.org/10.1109/CompComm.2018.8780680

Cherednik I, Philipp I (2018) DAHA and plane curve singularities. Algebr Geom Topol 18(1):333–385. https://doi.org/10.2140/agt.2018.18.333

Article   MathSciNet   Google Scholar  

Creswell A, White T, Dumoulin V et al (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65

Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

Denton E, Chintala S, Szlam A et al (2015) Deep generative image models using a laplacian pyramid of adversarial networks. https://doi.org/10.48550/ARXIV.1506.05751 , arXiv:1506.05751

Dong C, Loy CC, He K et al (2014) Learning a deep convolutional network for image super-resolution. In: European conference on computer vision, Springer, pp 184–199

Dong J, Rafayelyan M, Krzakala F et al (2020) Optical reservoir computing using multiple light scattering for chaotic systems prediction. IEEE J Sel Top Quantum Electron 26(1):1–12. https://doi.org/10.1109/jstqe.2019.2936281

Eskimez SE, Koishida K (2019) Speech super resolution generative adversarial network. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3717–3721. https://doi.org/10.1109/ICASSP.2019.8682215

Frid-Adar M, Klang E, Amitai M et al (2018) Synthetic data augmentation using gan for improved liver lesion classification. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp 289–293. https://doi.org/10.1109/ISBI.2018.8363576

Goldberger J, Ben-Reuven E (2016) Training deep neural-networks using a noise adaptation layer. In: International conference on learning representations

Gong M, Xu Y, Li C et al (2019a) Twin auxiliary classifiers gan. arXiv:1907.02690

Gong M, Xu Y, Li C et al (2019) Twin auxiliary classifiers GAN. Adv Neural Inf Process Syst 32:1328–1337

Gong X, Chang S, Jiang Y et al (2019c) Autogan: neural architecture search for generative adversarial networks. https://doi.org/10.48550/ARXIV.1908.03835 , arXiv:1908.03835

Goodfellow I (2017) Nips 2016 tutorial: generative adversarial networks. arXiv:1701.00160

Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139–144

Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. https://doi.org/10.48550/ARXIV.1406.2661 , arXiv:1406.2661

Greenspan H, van Ginneken B, Summers RM (2016) Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging 35(5):1153–1159. https://doi.org/10.1109/TMI.2016.2553401

Guerriero P, Orcioni S, Matacena I et al (2020) A gan based bidirectional switch for matrix converter applications. In: 2020 international symposium on power electronics, electrical drives, automation and motion (SPEEDAM), pp 375–380. https://doi.org/10.1109/SPEEDAM48782.2020.9161876

Gulrajani I, Ahmed F, Arjovsky M et al (2017) Improved training of wasserstein gans. https://doi.org/10.48550/ARXIV.1704.00028 , arXiv:1704.00028

He D, Chen W, Wang L et al (2014) A game-theoretic machine learning approach for revenue maximization in sponsored search. arXiv:1406.0728

He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527

Hitawala S (2018) Comparative study on generative adversarial networks. arXiv:1801.04271

Hou L, Cao Q, Shen H et al (2022) Conditional gans with auxiliary discriminative classifier. arXiv:2107.10060

Hu L, Ou J, Huang J et al (2020) A review of research on traffic conflicts based on intelligent vehicles. IEEE Access 8:24471–24483. https://doi.org/10.1109/ACCESS.2020.2970164

Huang JB, Kang SB, Ahuja N et al (2014) Image completion using planar structure guidance. ACM Trans Graph 33(4):1–10

Hui J (2018) Gan — self-attention generative adversarial networks (SAGAN). https://jonathan-hui.medium.com/gan-self-attention-generative-adversarial-networks-sagan-923fccde790c

Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion. ACM Trans Graph (ToG) 36(4):1–14

Im DJ, Ma H, Taylor G et al (2018) Quantitatively evaluating gans with divergences proposed for training. https://doi.org/10.48550/ARXIV.1803.01045 , arXiv:1803.01045

Isola P, Zhu JY, Zhou T, et al (2016) Image-to-image translation with conditional adversarial networks. https://doi.org/10.48550/ARXIV.1611.07004 , arXiv:1611.07004

Jetchev N, Bergmann U, Vollgraf R (2016) Texture synthesis with spatial generative adversarial networks. https://doi.org/10.48550/ARXIV.1611.08207 , arXiv:1611.08207

Jha D (2018) Not just another GAN paper — SAGAN. https://towardsdatascience.com/not-just-another-gan-paper-sagan-96e649f01a6b

Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21(10):1361–1371. https://doi.org/10.1109/TKDE.2008.234

Jin L, Tan F, Jiang S (2020) Generative adversarial network technologies and applications in computer vision. Comput Intell Neurosci 1459:107. https://doi.org/10.1155/2020/1459107

Jindal I, Nokleby M, Chen X (2017) Learning deep networks from noisy labels with dropout regularization. https://doi.org/10.48550/ARXIV.1705.03419 , arXiv:1705.03419

Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard gan. https://doi.org/10.48550/ARXIV.1807.00734 , arXiv:1807.00734

Jost Z (2019) Infogan — generative adversarial networks part III. https://towardsdatascience.com/infogan-generative-adversarial-networks-part-iii-380c0c6712cd

Kahembwe E, Ramamoorthy S (2020) Lower dimensional kernels for video discriminators. Neural Netw 132:506–520. https://doi.org/10.1016/j.neunet.2020.09.016

Kaneko T, Ushiku Y, Harada T (2018) Label-noise robust generative adversarial networks. https://doi.org/10.48550/ARXIV.1811.11165 , arXiv:1811.11165

Karnewar A, Wang O (2019) Msg-gan: multi-scale gradients for generative adversarial networks. https://doi.org/10.48550/ARXIV.1903.06048 , arXiv:1903.06048

Karras T, Aila T, Laine S et al (2017) Progressive growing of gans for improved quality, stability, and variation. https://doi.org/10.48550/ARXIV.1710.10196 , arXiv:1710.10196

Karras T, Laine S, Aila T (2021) A style-based generator architecture for generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 43(12):4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919

Kim T, Cha M, Kim H et al (2017) Learning to discover cross-domain relations with generative adversarial networks. https://doi.org/10.48550/ARXIV.1703.05192 , arXiv:1703.05192

Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. https://doi.org/10.48550/ARXIV.1412.6980 , arXiv:1412.6980

Kingma DP, Welling M (2013) Auto-encoding variational bayes. https://doi.org/10.48550/ARXIV.1312.6114 , arXiv:1312.6114

Kodali N, Abernethy J, Hays J et al (2017) On convergence and stability of gans. https://doi.org/10.48550/ARXIV.1705.07215 , arXiv:1705.07215

Kurutach T, Tamar A, Yang G et al (2018) Learning plannable representations with causal infogan. arXiv:1807.09341

Langr J, Bok V (2019) GANs in action. Manning Publications, New York, NY

LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539

Ledig C, Theis L, Huszár F et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690

Li K, Lee CH (2015) A deep neural network approach to speech bandwidth expansion. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4395–4399. https://doi.org/10.1109/ICASSP.2015.7178801

Li W, Wang Z, Li J et al (2019) Semi-supervised learning based on generative adversarial network: a comparison between good gan and bad gan approach. arXiv:1905.06484

Lim JH, Ye JC (2017) Geometric gan. https://doi.org/10.48550/ARXIV.1705.02894 , arXiv:1705.02894

Liu S, Sun Y, Zhu D et al (2017) Face aging with contextual generative adversarial nets. In: Proceedings of the 25th ACM international conference on multimedia. Association for computing machinery, New York, NY, USA, MM ’17, p 82–90. https://doi.org/10.1145/3123266.3123431

Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. https://doi.org/10.48550/ARXIV.1411.4038 , arXiv:1411.4038

Lucas A, López-Tapia S, Molina R et al (2019) Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans Image Process 28(7):3312–3327. https://doi.org/10.1109/TIP.2019.2895768

Lutz S, Amplianitis K, Smolic A (2018) Alphagan: generative adversarial networks for natural image matting. https://doi.org/10.48550/ARXIV.1807.10088 , arXiv:1807.10088

Ma J, Zhou Z, Wang B et al (2019) Hard ship detection via generative adversarial networks. In: 2019 Chinese control and decision conference (CCDC), pp 3961–3965. https://doi.org/10.1109/CCDC.2019.8833176

Mertes S, Schiller D, Lingenfelser F et al (2023) Intercategorical label interpolation for emotional face generation with conditional generative adversarial networks. Communications in computer and information science. Springer Nature Switzerland, Cham, pp 67–87

Mirza M, Osindero S (2014) Conditional generative adversarial nets. https://doi.org/10.48550/ARXIV.1411.1784 , arXiv:1411.1784

Miyato T, Koyama M (2018) cGANs with projection discriminator. https://doi.org/10.48550/ARXIV.1802.05637 , arXiv:1802.05637

Mu F (2019) Wasserstein-BiGAN: wasserstein BiGAN (bidirectional GAN trained using wasserstein distance). https://github.com/fmu2/Wasserstein-BiGAN

Odena A (2016) Semi-supervised learning with generative adversarial networks. https://doi.org/10.48550/ARXIV.1606.01583 , arXiv:1606.01583

Odena A, Olah C, Shlens J (2016) Conditional image synthesis with auxiliary classifier gans. https://doi.org/10.48550/ARXIV.1610.09585 , arXiv:1610.09585

Oliver A (2018) InfoGAN · depth first learning. https://www.depthfirstlearning.com/2018/InfoGAN

Pan Z, Yu W, Yi X et al (2019) Recent progress on generative adversarial networks (gans): a survey. IEEE Access 7:36322–36333. https://doi.org/10.1109/ACCESS.2019.2905015

Patrini G, Rozza A, Menon A et al (2016) Making deep neural networks robust to label noise: a loss correction approach. https://doi.org/10.48550/ARXIV.1609.03683 , arXiv:1609.03683

Pu Y, Gan Z, Henao R et al (2016) Variational autoencoder for deep learning of images, labels and captions. In: Lee D, Sugiyama M, Luxburg U et al (eds) Advances in neural information processing systems, vol 29. New York, Curran Associates Inc.

Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings IEEE, 77(2):257–286. https://doi.org/10.1109/5.18626

Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. https://doi.org/10.48550/ARXIV.1511.06434 , arXiv:1511.06434

Reed S, Akata Z, Yan X et al (2016) Generative adversarial text to image synthesis. In: International conference on machine learning, PMLR, pp 1060–1069

Ren S, He K, Girshick R et al (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

Ren S, He K, Girshick R et al (2017) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481. https://doi.org/10.1109/TPAMI.2016.2601099

Salakhutdinov R, Larochelle H (2010) Efficient learning of deep Boltzmann machines. In: Teh YW, Titterington M (eds.), proceedings of the thirteenth international conference on artificial intelligence and statistics, proceedings of machine learning research, vol. 9. PMLR, Chia Laguna Resort, Sardinia, Italy, pp 693–700. https://proceedings.mlr.press/v9/salakhutdinov10a.html

Salimans T, Zhang H, Radford A et al (2018) Improving gans using optimal transport. https://doi.org/10.48550/ARXIV.1803.05573 , arXiv:1803.05573

Shah H (2018) Using bidirectional generative adversarial networks to estimate value-at-risk for market risk. https://towardsdatascience.com/using-bidirectional-generative-adversarial-networks-to-estimate-value-at-risk-for-market-risk-c3dffbbde8dd

Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683

Shi T, Yuan Y, Fan C et al (2019) Face-to-parameter translation for game character auto-creation. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 161–170. https://doi.org/10.1109/ICCV.2019.00025

Shi W, Caballero J, Huszár F et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1874–1883. https://doi.org/10.1109/CVPR.2016.207

Sukhbaatar S, Bruna J, Paluri M et al (2014) Training convolutional networks with noisy labels. https://doi.org/10.48550/ARXIV.1406.2080 , arXiv:1406.2080

takuhirok (2021) Github - takuhirok/rGAN: rgan: label-noise robust generative adversarial networks. https://github.com/takuhirok/rGAN

Tran D, Ranganath R, Blei DM (2017) Hierarchical implicit models and likelihood-free variational inference. https://doi.org/10.48550/ARXIV.1702.08896 , arXiv:1702.08896

Tulyakov S, Liu MY, Yang X et al (2017) Mocogan: decomposing motion and content for video generation. https://doi.org/10.48550/ARXIV.1707.04993 , arXiv:1707.04993

Vaidya K (2021) Implementation of semi-supervised generative adversarial networks in Keras. https://towardsdatascience.com/implementation-of-semi-supervised-generative-adversarial-networks-in-keras-195a1b2c3ea6

Volkhonskiy D, Nazarov I, Borisenko B et al (2017) Steganographic generative adversarial networks. In: Proceedings of NIPS 2016 workshop on adversarial training

Wang X, Girshick R, Gupta A et al (2017) Non-local neural networks. https://doi.org/10.48550/ARXIV.1711.07971 , arXiv:1711.07971

Wang Z, She Q, Ward TE (2021) Generative adversarial networks in computer vision: a survey and taxonomy. ACM Comput Surv. https://doi.org/10.1145/3439723

Weng Y, Zhou H (2019) Data augmentation computing model based on generative adversarial network. IEEE Access 7:64223–64233. https://doi.org/10.1109/ACCESS.2019.2917207

Wissen D (2022) GAN and its applications: everything you need to know - daten & wissen. https://datenwissen.com/blog/gan-applications/ . Accessed 24 Oct 2023

Xu N, Price B, Cohen S et al (2017) Deep image matting. arXiv:1703.03872

Yang J, Price B, Cohen S et al (2016) Object contour detection with a fully convolutional encoder-decoder network. https://doi.org/10.48550/ARXIV.1603.04530 , arXiv:1603.04530

Yang TY (2020) Introduction to GANs. https://towardsdatascience.com/introduction-to-gans-877dd689cac1

Yi Z, Zhang H, Tan P et al (2017) Dualgan: unsupervised dual learning for image-to-image translation. https://doi.org/10.48550/ARXIV.1704.02510 , arXiv:1704.02510

Zhang H, Xu T, Li H et al (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5908–5916

Zhang H, Goodfellow I, Metaxas D et al (2018a) Self-attention generative adversarial networks. https://doi.org/10.48550/ARXIV.1805.08318 , arXiv:1805.08318

Zhang R, Isola P, Efros AA et al (2018b) The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 586–595. https://doi.org/10.1109/CVPR.2018.00068

Zhu JY, Park T, Isola P et al (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. https://doi.org/10.48550/ARXIV.1703.10593 , arXiv:1703.10593

Download references

This research received no external funding.

Author information

Kirtirajsinh Zala, Deep Thumar, Hiren Kumar Thakkar, Urva Maheshwari and Biswaranjan Acharya have contributed equally to this work.

Authors and Affiliations

Department of Information Technology, Marwadi University, Rajkot, Gujarat, 360006, India

Kirtirajsinh Zala

Faculty of Engineering, Marwadi Education Foundations, Rajkot, Gujarat, 360006, India

Deep Thumar & Urva Maheshwari

Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, 382007, India

Hiren Kumar Thakkar

Department of Computer Engineering- AI and BDA, Marwadi University, Rajkot, Gujarat, 360006, India

Biswaranjan Acharya

You can also search for this author in PubMed   Google Scholar

Contributions

All authors have diligently reviewed and contributed equally to this study.

Corresponding author

Correspondence to Hiren Kumar Thakkar .

Ethics declarations

Conflict of interest.

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Informed consent

Informed consent procedures were Not applicable to this research.

Human participants

This research did not involve any human participants or animals.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Zala, K., Thumar, D., Thakkar, H.K. et al. A survey and identification of generative adversarial network technology-based architectural variants and applications in computer vision. Int J Syst Assur Eng Manag (2024). https://doi.org/10.1007/s13198-024-02478-6

Download citation

Received : 06 February 2024

Revised : 07 July 2024

Accepted : 09 August 2024

Published : 14 August 2024

DOI : https://doi.org/10.1007/s13198-024-02478-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Generative adversarial networks
  • Computer vision
  • Loss-variants
  • Machine learning
  • Intelligent computing
  • Bio informatics
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. DATA ENGINEER THESIS

    data engineering thesis

  2. Top 50 Master Thesis Matlab Engineering Projects (Guide)

    data engineering thesis

  3. Fundamentals of Data Engineering Plan and Build Robust Data Systems

    data engineering thesis

  4. (PDF) BACHELOR THESIS Data Center Infrastructure: Network Design Concept

    data engineering thesis

  5. (PDF) Data Engineering

    data engineering thesis

  6. Thesis data analysis

    data engineering thesis

COMMENTS

  1. Data Science Masters Theses // Arch : Northwestern University

    Data Science Masters Theses. The Master of Science in Data Science program requires the successful completion of 12 courses to obtain a degree. These requirements cover six core courses, a leadership or project management course, two required courses corresponding to a declared specialization, two electives, and a capstone project or thesis.

  2. Master of Science in Engineering Data Science

    The Master of Science in Engineering Data Science at the University of Houston is a 10 course graduate curriculum program with both non-thesis and thesis options. A four-year bachelor's degree in engineering or engineering related fields, or computer science and data science and statistics is required in order to apply for the Engineering Data ...

  3. Top 20 Data Engineering Project Ideas 2024 [With Source Code]

    Discover the top 20 data engineering project ideas of 2024 suitable for all skill levels. Explore with source code to enhance your expertise.

  4. A Guide to Writing a Senior Thesis in Engineering

    John A Paulson School of Engineering and Applied Sciences This thesis guide has been put together to help guide students who are writing or interested in writing a thesis in engineering.

  5. PDF Data Science for Engineering: Open thesis topics

    The task in this thesis is to give a detailed overview of physics-informed neural network architec-tures, and to compare several on a standard data set from fluid dynamics. A particular focus will be set on the task to reconstruct complex physical data from a small number of realistically measurable data points.

  6. Top 20 Latest Research Problems in Big Data and Data Science

    In this article, I briefly introduced the big data research issues in general and listed Top 20 latest research problems in big data and data science in 2020. These problems are further divided and presented in 5 categories so that the researchers can pick up the problem based on their interests and skill set.

  7. Best Master thesis topic in the field of BI,Data engineering?

    Best Master thesis topic in the field of BI,Data engineering? I am a master student , doing my masters in Software Engineering. I have a interest towards data, my question here would be what would ...

  8. Master thesis in Data Engineering : r/dataengineering

    Master thesis in Data Engineering I am a MSc student in Data Science and Engineering and I am in the first semester of the 2nd year. In some days I will have a meeting with a really promising company and most likely I will have to propose a good master thesis topic in order to get an internship and the thesis possibility.

  9. Data Engineering

    The curriculum of the Data Engineering program provides students with a comprehensive understanding of the big data aspects of data analytics and data science, with the technological challenges of data acquisition, curation, and management. The program focuses on the application of computer skills and mathematical knowledge to solve real-world ...

  10. Graduate Thesis Or Dissertation

    Geospatial data analysis is a multifaceted discipline encompassing the collection, processing, and visualization of diverse datasets. It models and delineates the interactions of people, objects, a...

  11. 1063 PDFs

    The thesis preparation in the Department of Informatics Universitas Ahmad Dahlan is divided into two areas of interest, namely Intelligent Systems and Software and Data Engineering.

  12. MIT Theses

    MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

  13. Master's thesis finished

    Master's thesis finished - Thank you. Personal Project Showcase. Hi everyone! A few months ago I defended my Master Thesis on Big Data and got the maximum grade of 10.0 with honors. I want to thank this subreddit for the help and advice received in one of my previous posts. Also, if you want to build something similar and you think the project ...

  14. Data Analytics Engineering, MS

    The Master of Science in Data Analytics Engineering program helps students acquire knowledge and skills to: Discover opportunities to improve products, processes, systems, and enterprises through data analytics. Apply optimization, statistical, and machine-learning methods to solve complex problems involving large data from multiple sources.

  15. Datawarehouse thesis : r/dataengineering

    Hello friends, for my thesis I need to do research on what the most common factors are the cause a datawarehouse project to fail. Is there anybody…

  16. Data Analytics Engineering, MS, Master's

    The MS in Data Analytics Engineering is available at Northeastern's campus in Boston, MA as well as at our campuses in Seattle, WA and Vancouver, Canada. The research project/thesis course is a requirement for students taking the DAE program in Vancouver. Note: A subset of program courses are available at campuses outside of Boston.

  17. Theses

    The idea is to expose building blocks of database systems, for example hash tables, for data mining and machine learning algorithms. The thesis should build upon an existing open-source database system, for example Hyrise, where you should first write selected algorithms in SQL. Afterwards, you should improve the performance of the algorithms ...

  18. Advanced process data analytics

    Process data analytics is the application of statistics and related mathematical tools to data in order to understand, develop, and improve manufacturing processes. There have been growing opportunities in process data analytics because of advances in machine learning and technologies for data collection and storage.

  19. Bachelor and Master Thesis : Professorship of Data Science

    Bachelor and Master Thesis. We offer a variety of cutting-edge and exciting research topics for Bachelor's and Master's theses. We cover a wide range of topics from Data Science, Natural Language Processing, Argument Mining, the Use of AI in Business, Ethics in AI and Multimodal AI. We are always open to suggestions for your own topics, so ...

  20. MS in Data Science and Engineering: Degree Requirements

    The MS in Data Science and Engineering program offers three options, each requiring a total of 30 credits. The minimum requirements for all options are. 12 credits from the core; 1 credit Graduate Seminar CSE 792; Research experience (CSE 797 for Thesis Option or 795 for Project Option or 794 for Course-Only Option); and.

  21. Data Engineering Master's

    The Master of Science program in Data Engineering allows you to focus on your analytical, programming and engineering skills to: Creatively solve data-related analytical problems. Integrate messy data into clean, usable data sets. Organize and retrieve large data efficiently. Core courses give students a solid foundation in data engineering and ...

  22. Doctoral Thesis: From Data, to Models, and Back: Making ML "Predictably

    Artificial Intelligence and Decision-making combines intellectual traditions from across computer science and electrical engineering to develop techniques for the analysis and synthesis of systems that interact with an external world via perception, communication, and action; while also learning, making decisions and adapting to a changing environment.

  23. Theses

    Theses Writing a thesis is the final step in obtaining a Bachelor or Master degree. A thesis is always coupled to a scientific project in some field of expertise. Candidates who want to write their thesis in the Big Data Analytics group should, therefore, be interested and trained in a field related to our research areas.

  24. Data Engineering for the Modern Enterprise

    Data engineering—which includes not only data engineers by title but also their counterparts in adjacent fields such as database administration, management, architecture, and analysis—will ensure that AI initiatives are kept alive, well, and thriving. Accordingly, data engineers have risen to become the new stars in the AI-driven organization.

  25. OATD

    Advanced research and scholarship. Theses and dissertations, free to find, free to use.

  26. Super Micro Computer (NASDAQ:SMCI) A Bull Case Theory

    Super Micro Computer shares were trading at $69 when this thesis was published, vs. closing price of $624.65 on Aug 2. ... A computer network engineering team setting up a server array in a data ...

  27. Master of Science in ECE

    The 30-credit Duke Master of Science in Electrical & Computer Engineering degree provides a unique combination of opportunities: World-class research Integrated into a project-based learning environment; Flexible, individualized curriculum You choose: Thesis, Project or Coursework-only options; Professional development opportunities

  28. A survey and identification of generative adversarial network

    The surging popularity of generative adversarial networks (GANs) has ignited a wave of innovation in the realm of computer vision, a highly explored subfield of deep learning. GANs are revolutionizing the area of machine learning because they use a game-based training technique. This is in contrast to traditional approaches to machine learning, which center on feature learning and picture ...