BigData Tutor – PySpark Workflow Editor
Full-stack educational platform that converts natural-language instructions and visual workflows into PySpark code using an LLM assistant.
ReactFlaskMongoDBJWTPySparkLLM API Integration
Problem
- PySpark has a steep learning curve: Spark concepts + Python syntax + transformation logic at once.
- Beginners struggle translating data logic into correct, ordered PySpark code.
Solution
- Users build workflows visually and describe steps in English.
- The backend uses an LLM to generate ordered PySpark transformations aligned to workflow nodes.
Key Features
- Drag-and-drop workflow builder
- Natural language → ordered PySpark code generation
- LLM-powered chatbot assistance
- JWT authentication and MongoDB persistence
- Clean separation between UI, API, AI layer, and storage
Screenshots
Tip: Click any image to preview.
Technical Deep Dive
Biggest challenge
Translating ambiguous natural-language data processing instructions into logically correct, step-ordered PySpark code while keeping it understandable for beginners.
Structured output
Prompt constraints + workflow state align code generation to specific nodes in ordered steps.
Next Improvements
- Execute workflows on a real Spark cluster
- Instructor review and feedback features