diff --git a/README.md b/README.md index f8b6795..42ac2aa 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,41 @@ # screenshot-to-code -This simple app converts a screenshot to code (HTML/Tailwind CSS, or React or Bootstrap or Vue). It uses GPT-4 Vision (or Claude 3) to generate the code and DALL-E 3 to generate similar-looking images. You can now also enter a URL to clone a live website. - -馃啎 Now, supporting Claude 3! +A simple tool to convert screenshots, mockups and Figma designs into clean, functional code using AI. https://github.com/abi/screenshot-to-code/assets/23818/6cebadae-2fe3-4986-ac6a-8fb9db030045 +Supported stacks: + +- HTML + Tailwind +- React + Tailwind +- Vue + Tailwind +- Bootstrap +- Ionic + Tailwind +- SVG + +Supported AI models: + +- GPT-4 Vision +- Claude 3 Sonnet (faster, and on par or better than GPT-4 vision for many inputs) +- DALL-E 3 for image generation + See the [Examples](#-examples) section below for more demos. +We also just added experimental support for taking a video/screen recording of a website in action and turning that into a functional prototype. + +![google in app quick 3](https://github.com/abi/screenshot-to-code/assets/23818/8758ffa4-9483-4b9b-bb66-abd6d1594c33) + +[Learn more about video here](https://github.com/abi/screenshot-to-code/wiki/Screen-Recording-to-Code). + [Follow me on Twitter for updates](https://twitter.com/_abi_). ## 馃殌 Try It Out! -馃啎 [Try it here](https://screenshottocode.com) (bring your own OpenAI key - **your key must have access to GPT-4 Vision. See [FAQ](#%EF%B8%8F-faqs) section below for details**). Or see [Getting Started](#-getting-started) below for local install instructions. - -## 馃専 Recent Updates - -- Mar 8 - 馃敟馃帀馃巵 Video-to-app: turn videos/screen recordings into functional apps -- Mar 5 - Added support for Claude Sonnet 3 (as capable as or better than GPT-4 Vision, and faster!) +馃啎 [Try it live on the hosted version](https://screenshottocode.com) (bring your own OpenAI key - **your key must have access to GPT-4 Vision. See [FAQ](#%EF%B8%8F-faqs) section below for details**). Or see [Getting Started](#-getting-started) below for local install instructions. ## 馃洜 Getting Started -The app has a React/Vite frontend and a FastAPI backend. You will need an OpenAI API key with access to the GPT-4 Vision API. +The app has a React/Vite frontend and a FastAPI backend. You will need an OpenAI API key with access to the GPT-4 Vision API or an Anthropic key if you want to use Claude Sonnet, or for experimental video support. Run the backend (I use Poetry for package management - `pip install poetry` if you don't have it): @@ -33,6 +47,8 @@ poetry shell poetry run uvicorn main:app --reload --port 7001 ``` +If you want to use Anthropic, add the `ANTHROPIC_API_KEY` to `backend/.env` with your API key from Anthropic. + Run the frontend: ```bash @@ -51,25 +67,6 @@ For debugging purposes, if you don't want to waste GPT4-Vision credits, you can MOCK=true poetry run uvicorn main:app --reload --port 7001 ``` -## Video to app (experimental) - -https://github.com/abi/screenshot-to-code/assets/23818/1468bef4-164f-4046-a6c8-4cfc40a5cdff - -Record yourself using any website or app or even a Figma prototype, drag & drop in a video and in a few minutes, get a functional, similar-looking app. - -[You need an Anthropic API key for this functionality. Follow instructions here.](https://github.com/abi/screenshot-to-code/blob/main/blog/video-to-app.md) - -## Configuration - -- You can configure the OpenAI base URL if you need to use a proxy: Set OPENAI_BASE_URL in the `backend/.env` or directly in the UI in the settings dialog - -## Using Claude 3 - -We recently added support for Claude 3 Sonnet. It performs well, on par or better than GPT-4 vision for many inputs, and it tends to be faster. - -1. Add an env var `ANTHROPIC_API_KEY` to `backend/.env` with your API key from Anthropic -2. When using the front-end, select "Claude 3 Sonnet" from the model dropdown - ## Docker If you have Docker installed on your system, in the root directory, run: @@ -85,6 +82,8 @@ The app will be up and running at http://localhost:5173. Note that you can't dev - **I'm running into an error when setting up the backend. How can I fix it?** [Try this](https://github.com/abi/screenshot-to-code/issues/3#issuecomment-1814777959). If that still doesn't work, open an issue. - **How do I get an OpenAI API key?** See https://github.com/abi/screenshot-to-code/blob/main/Troubleshooting.md +- **How can I configure an OpenAI proxy?** - you can configure the OpenAI base URL if you need to use a proxy: Set OPENAI_BASE_URL in the `backend/.env` or directly in the UI in the settings dialog +- **How can I update the backend host that my front-end connects to?** - Configure VITE_HTTP_BACKEND_URL and VITE_WS_BACKEND_URL in front/.env.local For example, set VITE_HTTP_BACKEND_URL=http://124.10.20.1:7001 - **How can I provide feedback?** For feedback, feature requests and bug reports, open an issue or ping me on [Twitter](https://twitter.com/_abi_). ## 馃摎 Examples diff --git a/backend/config.py b/backend/config.py index f12c969..8c190b9 100644 --- a/backend/config.py +++ b/backend/config.py @@ -5,7 +5,11 @@ import os ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY", None) +# Debugging-related + SHOULD_MOCK_AI_RESPONSE = bool(os.environ.get("MOCK", False)) +IS_DEBUG_ENABLED = bool(os.environ.get("IS_DEBUG_ENABLED", False)) +DEBUG_DIR = os.environ.get("DEBUG_DIR", "") # Set to True when running in production (on the hosted version) # Used as a feature flag to enable or disable certain features diff --git a/backend/debug/DebugFileWriter.py b/backend/debug/DebugFileWriter.py new file mode 100644 index 0000000..bbcd77b --- /dev/null +++ b/backend/debug/DebugFileWriter.py @@ -0,0 +1,30 @@ +import os +import logging +import uuid + +from config import DEBUG_DIR, IS_DEBUG_ENABLED + + +class DebugFileWriter: + def __init__(self): + if not IS_DEBUG_ENABLED: + return + + try: + self.debug_artifacts_path = os.path.expanduser( + f"{DEBUG_DIR}/{str(uuid.uuid4())}" + ) + os.makedirs(self.debug_artifacts_path, exist_ok=True) + print(f"Debugging artifacts will be stored in: {self.debug_artifacts_path}") + except: + logging.error("Failed to create debug directory") + + def write_to_file(self, filename: str, content: str) -> None: + try: + with open(os.path.join(self.debug_artifacts_path, filename), "w") as file: + file.write(content) + except Exception as e: + logging.error(f"Failed to write to file: {e}") + + def extract_html_content(self, text: str) -> str: + return str(text.split("")[-1].rsplit("", 1)[0] + "") diff --git a/backend/debug/__init__.py b/backend/debug/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/backend/llm.py b/backend/llm.py index 5b0d81e..2a19e9b 100644 --- a/backend/llm.py +++ b/backend/llm.py @@ -3,6 +3,8 @@ from typing import Any, Awaitable, Callable, List, cast from anthropic import AsyncAnthropic from openai import AsyncOpenAI from openai.types.chat import ChatCompletionMessageParam, ChatCompletionChunk +from config import IS_DEBUG_ENABLED +from debug.DebugFileWriter import DebugFileWriter from utils import pprint_prompt @@ -143,6 +145,10 @@ async def stream_claude_response_native( prefix = "" response = None + # For debugging + full_stream = "" + debug_file_writer = DebugFileWriter() + while current_pass_num <= max_passes: current_pass_num += 1 @@ -164,10 +170,22 @@ async def stream_claude_response_native( ) as stream: async for text in stream.text_stream: print(text, end="", flush=True) + full_stream += text await callback(text) - # Return final message response = await stream.get_final_message() + response_text = response.content[0].text + + # Write each pass's code to .html file and thinking to .txt file + if IS_DEBUG_ENABLED: + debug_file_writer.write_to_file( + f"pass_{current_pass_num - 1}.html", + debug_file_writer.extract_html_content(response_text), + ) + debug_file_writer.write_to_file( + f"thinking_pass_{current_pass_num - 1}.txt", + response_text.split("")[0], + ) # Set up messages array for next pass messages += [ @@ -185,6 +203,9 @@ async def stream_claude_response_native( # Close the Anthropic client await client.close() + if IS_DEBUG_ENABLED: + debug_file_writer.write_to_file("full_stream.txt", full_stream) + if not response: raise Exception("No HTML response found in AI response") else: diff --git a/backend/mock_llm.py b/backend/mock_llm.py index 0b903b7..b85b1b1 100644 --- a/backend/mock_llm.py +++ b/backend/mock_llm.py @@ -4,14 +4,14 @@ from typing import Awaitable, Callable from custom_types import InputMode -STREAM_CHUNK_SIZE = 5 +STREAM_CHUNK_SIZE = 20 async def mock_completion( process_chunk: Callable[[str], Awaitable[None]], input_mode: InputMode ) -> str: code_to_return = ( - GOOGLE_FORM_VIDEO_PROMPT_MOCK + TALLY_FORM_VIDEO_PROMPT_MOCK if input_mode == "video" else NO_IMAGES_NYTIMES_MOCK_CODE ) @@ -670,394 +670,460 @@ $(document).ready(function() { """ GOOGLE_FORM_VIDEO_PROMPT_MOCK = """ - -To build this: -- Create a search bar that allows typing and shows placeholder text -- Implement search suggestions that update as the user types -- Allow selecting a suggestion to perform that search -- Show search results with the query and an AI-powered overview -- Have filter tabs for different search verticals -- Allow clicking filter tabs to add/remove them, updating the URL -- Ensure the UI closely matches the Google style and colors + +User flow: +1. User starts on the Google search page and types in "times" in the search bar +2. As the user types, Google provides autocomplete suggestions related to "times" +3. User selects the "times" suggestion from the autocomplete dropdown +4. The search results page for "times" loads, showing various results related to The New York Times newspaper +5. User clicks the "Generate" button under "Get an AI-powered overview for this search?" +6. An AI-generated overview about The New York Times loads on the right side of the search results + +Code structure: +- HTML structure with header, search bar, autocomplete dropdown, search button +- Search results area to display search results +- Sidebar area to display the AI-generated overview +- Use Tailwind CSS utility classes for styling +- Use jQuery to handle user interactions: + - Typing in search bar to show/filter autocomplete suggestions + - Selecting autocomplete suggestion to populate search bar + - Clicking search button to display search results + - Clicking "Generate" button to display AI overview +- Hardcode search results and AI overview content for demo purposes - - - + + -
-
Gmail
-
Images
- User avatar -
- -
- Google logo +
+
+ Google logo +
+ + +
+ + Profile picture of the user +
-
- - -
+
+ +
+
The Crossword
+
Play the Daily New York Times Crossword puzzle edited by Will ...
+
-