Technical deep-dives into my AI and machine learning projects, focusing on implementation details and development insights.
Developed an AI note-taking app using Next.js, Whisper (via Groq), and Gemini API. Built the full tech stack with Firebase integration, focusing on secure authentication and real-time data syncing. Currently in closed beta with active user feedback implementation.
Built an AI system using Google's Gemini AI for technical content conversion, implemented PDF processing for research papers, and integrated Kokoro TTS for natural voice synthesis. Features dynamic content scaling and multi-voice simulation with AI-generated hosts.
Developed a multi-agent system using Planner, Writer, and Editor agents powered by Mistral-small API. Implemented automated research and writing pipeline with SEO optimization. Built an interactive CLI workflow for topic input and markdown output generation.
Fine-tuned LLaMA 3.2 (3B) model on GSM8K dataset for enhanced mathematical and logical reasoning. Leveraged Unsloth for efficient fine-tuning and faster inference, achieving superior performance in complex problem-solving tasks.
Created a hand gesture-based calculator using computer vision for finger movement tracking and Gemini API for real-time computations. Implemented touch-free interaction system with live display updates and intuitive gesture recognition.
Built an AI-powered recipe generator using Gemini Pro Vision for ingredient recognition from images. Integrated Google GenerativeAI API for personalized cooking instructions and implemented natural language understanding for detailed recipe steps.