BigData Tutor – PySpark Workflow Editor

Full-stack educational platform that converts natural-language instructions and visual workflows into PySpark code using an LLM assistant.

ReactFlaskMongoDBJWTPySparkLLM API Integration

Problem

  • PySpark has a steep learning curve: Spark concepts + Python syntax + transformation logic at once.
  • Beginners struggle translating data logic into correct, ordered PySpark code.

Solution

  • Users build workflows visually and describe steps in English.
  • The backend uses an LLM to generate ordered PySpark transformations aligned to workflow nodes.

Key Features

  • Drag-and-drop workflow builder
  • Natural language → ordered PySpark code generation
  • LLM-powered chatbot assistance
  • JWT authentication and MongoDB persistence
  • Clean separation between UI, API, AI layer, and storage

Screenshots

Homepage
Homepage
Login
Login
Workflow editor + chat
Workflow editor + chat

Tip: Click any image to preview.

Technical Deep Dive

Biggest challenge

Translating ambiguous natural-language data processing instructions into logically correct, step-ordered PySpark code while keeping it understandable for beginners.

Structured output

Prompt constraints + workflow state align code generation to specific nodes in ordered steps.

Next Improvements

  • Execute workflows on a real Spark cluster
  • Instructor review and feedback features