Merge branch 'main' into hosted

This commit is contained in:
Abi Raja 2024-03-18 16:41:14 -04:00
commit 71dfde3892
43 changed files with 3425 additions and 282 deletions

View File

@ -1,3 +1,5 @@
{
"python.analysis.typeCheckingMode": "strict"
"python.analysis.typeCheckingMode": "strict",
"python.analysis.extraPaths": ["./backend"],
"python.autoComplete.extraPaths": ["./backend"]
}

19
Evaluation.md Normal file
View File

@ -0,0 +1,19 @@
## Evaluating models and prompts
Evaluation dataset consists of 16 screenshots. A Python script for running screenshot-to-code on the dataset and a UI for rating outputs is included. With this set up, we can compare and evaluate various models and prompts.
### Running evals
- Input screenshots should be located at `backend/evals_data/inputs` and the outputs will be `backend/evals_data/outputs`. If you want to modify this, modify `EVALS_DIR` in `backend/evals/config.py`. You can download the input screenshot dataset here: TODO.
- Set a stack (`STACK` var) in `backend/run_evals.py`
- Run `python backend/run_evals.py` - this runs the screenshot-to-code on the input dataset in parallel but it will still take a few minutes to complete.
- Once the script is done, you can find the outputs in `backend/evals_data/outputs`.
### Rating evals
In order to view and rate the outputs, visit your front-end at `/evals`.
- Rate each output on a scale of 1-4
- You can also print the page as PDF to share your results with others.
Generally, I run three tests for each model/prompt + stack combo and take the average score out of those tests to evaluate.

View File

@ -1,28 +1,23 @@
# screenshot-to-code
This simple app converts a screenshot to code (HTML/Tailwind CSS, or React or Bootstrap or Vue). It uses GPT-4 Vision to generate the code and DALL-E 3 to generate similar-looking images. You can now also enter a URL to clone a live website!
This simple app converts a screenshot to code (HTML/Tailwind CSS, or React or Bootstrap or Vue). It uses GPT-4 Vision (or Claude 3) to generate the code and DALL-E 3 to generate similar-looking images. You can now also enter a URL to clone a live website.
🆕 Now, supporting Claude 3!
https://github.com/abi/screenshot-to-code/assets/23818/6cebadae-2fe3-4986-ac6a-8fb9db030045
See the [Examples](#-examples) section below for more demos.
[Follow me on Twitter for updates](https://twitter.com/_abi_).
## 🚀 Try It Out!
🆕 [Try it here](https://screenshottocode.com) (bring your own OpenAI key - **your key must have access to GPT-4 Vision. See [FAQ](#%EF%B8%8F-faqs) section below for details**). Or see [Getting Started](#-getting-started) below for local install instructions.
## 🌟 Recent Updates
- Dec 11 - Start a new project from existing code (allows you to come back to an older project)
- Dec 7 - 🔥 🔥 🔥 View a history of your edits, and branch off them
- Nov 30 - Dark mode, output code in Ionic (thanks [@dialmedu](https://github.com/dialmedu)), set OpenAI base URL
- Nov 28 - 🔥 🔥 🔥 Customize your stack: React or Bootstrap or TailwindCSS
- Nov 23 - Send in a screenshot of the current replicated version (sometimes improves quality of subsequent generations)
- Nov 21 - Edit code in the code editor and preview changes live thanks to [@clean99](https://github.com/clean99)
- Nov 20 - Paste in a URL to screenshot and clone (requires [ScreenshotOne free API key](https://screenshotone.com?via=screenshot-to-code))
- Nov 19 - Support for dark/light code editor theme - thanks [@kachbit](https://github.com/kachbit)
- Nov 16 - Added a setting to disable DALL-E image generation if you don't need that
- Nov 16 - View code directly within the app
- Nov 15 - You can now instruct the AI to update the code as you wish. It is helpful if the AI messed up some styles or missed a section.
- Mar 8 - 🔥🎉🎁 Video-to-app: turn videos/screen recordings into functional apps
- Mar 5 - Added support for Claude Sonnet 3 (as capable as or better than GPT-4 Vision, and faster!)
## 🛠 Getting Started
@ -38,12 +33,6 @@ poetry shell
poetry run uvicorn main:app --reload --port 7001
```
You can also run the backend (when you're in `backend`):
```bash
poetry run pyright
```
Run the frontend:
```bash
@ -62,10 +51,25 @@ For debugging purposes, if you don't want to waste GPT4-Vision credits, you can
MOCK=true poetry run uvicorn main:app --reload --port 7001
```
## Video to app (experimental)
https://github.com/abi/screenshot-to-code/assets/23818/1468bef4-164f-4046-a6c8-4cfc40a5cdff
Record yourself using any website or app or even a Figma prototype, drag & drop in a video and in a few minutes, get a functional, similar-looking app.
[You need an Anthropic API key for this functionality. Follow instructions here.](https://github.com/abi/screenshot-to-code/blob/main/blog/video-to-app.md)
## Configuration
- You can configure the OpenAI base URL if you need to use a proxy: Set OPENAI_BASE_URL in the `backend/.env` or directly in the UI in the settings dialog
## Using Claude 3
We recently added support for Claude 3 Sonnet. It performs well, on par or better than GPT-4 vision for many inputs, and it tends to be faster.
1. Add an env var `ANTHROPIC_API_KEY` to `backend/.env` with your API key from Anthropic
2. When using the front-end, select "Claude 3 Sonnet" from the model dropdown
## Docker
If you have Docker installed on your system, in the root directory, run:

View File

@ -5,11 +5,12 @@ You don't need a ChatGPT Pro account. Screenshot to code uses API keys from your
1. Open [OpenAI Dashboard](https://platform.openai.com/)
1. Go to Settings > Billing
1. Click at the Add payment details
<img width="1030" alt="285636868-c80deb92-ab47-45cd-988f-deee67fbd44d" src="https://github.com/abi/screenshot-to-code/assets/23818/4e0f4b77-9578-4f9a-803c-c12b1502f3d7">
4. You have to buy some credits. The minimum is $5.
<img width="900" alt="285636868-c80deb92-ab47-45cd-988f-deee67fbd44d" src="https://github.com/abi/screenshot-to-code/assets/23818/4e0f4b77-9578-4f9a-803c-c12b1502f3d7">
4. You have to buy some credits. The minimum is $5.
5. Go to Settings > Limits and check at the bottom of the page, your current tier has to be "Tier 1" to have GPT4 access
<img width="785" alt="285636973-da38bd4d-8a78-4904-8027-ca67d729b933" src="https://github.com/abi/screenshot-to-code/assets/23818/8d07cd84-0cf9-4f88-bc00-80eba492eadf">
<img width="900" alt="285636973-da38bd4d-8a78-4904-8027-ca67d729b933" src="https://github.com/abi/screenshot-to-code/assets/23818/8d07cd84-0cf9-4f88-bc00-80eba492eadf">
6. Go to Screenshot to code and paste it in the Settings dialog under OpenAI key (gear icon). Your key is only stored in your browser. Never stored on our servers.
Some users have also reported that it can take upto 30 minutes after your credit purchase for the GPT4 vision model to be activated.

4
backend/.gitignore vendored
View File

@ -154,3 +154,7 @@ cython_debug/
# Temporary eval output
evals_data
# Temporary video evals (Remove before merge)
video_evals

View File

@ -0,0 +1,25 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: local
hooks:
- id: poetry-pytest
name: Run pytest with Poetry
entry: poetry run --directory backend pytest
language: system
pass_filenames: false
always_run: true
files: ^backend/
# - id: poetry-pyright
# name: Run pyright with Poetry
# entry: poetry run --directory backend pyright
# language: system
# pass_filenames: false
# always_run: true
# files: ^backend/

View File

@ -3,6 +3,7 @@
# TODO: Should only be set to true when value is 'True', not any abitrary truthy value
import os
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY", None)
SHOULD_MOCK_AI_RESPONSE = bool(os.environ.get("MOCK", False))

7
backend/custom_types.py Normal file
View File

@ -0,0 +1,7 @@
from typing import Literal
InputMode = Literal[
"image",
"video",
]

View File

@ -1,29 +1,40 @@
import os
from config import ANTHROPIC_API_KEY
from llm import stream_openai_response
from llm import stream_claude_response, stream_openai_response
from prompts import assemble_prompt
from prompts.types import Stack
from utils import pprint_prompt
async def generate_code_core(image_url: str, stack: Stack) -> str:
model = "CLAUDE"
prompt_messages = assemble_prompt(image_url, stack)
openai_api_key = os.environ.get("OPENAI_API_KEY")
anthropic_api_key = ANTHROPIC_API_KEY
openai_base_url = None
pprint_prompt(prompt_messages)
async def process_chunk(content: str):
pass
if not openai_api_key:
raise Exception("OpenAI API key not found")
if model == "CLAUDE":
if not anthropic_api_key:
raise Exception("Anthropic API key not found")
completion = await stream_openai_response(
prompt_messages,
api_key=openai_api_key,
base_url=openai_base_url,
callback=lambda x: process_chunk(x),
)
completion = await stream_claude_response(
prompt_messages,
api_key=anthropic_api_key,
callback=lambda x: process_chunk(x),
)
else:
if not openai_api_key:
raise Exception("OpenAI API key not found")
completion = await stream_openai_response(
prompt_messages,
api_key=openai_api_key,
base_url=openai_base_url,
callback=lambda x: process_chunk(x),
)
return completion

View File

@ -15,7 +15,7 @@ async def process_tasks(prompts: List[str], api_key: str, base_url: str):
print(f"An exception occurred: {result}")
processed_results.append(None)
else:
processed_results.append(result)
processed_results.append(result) # type: ignore
return processed_results
@ -30,7 +30,7 @@ async def generate_image(prompt: str, api_key: str, base_url: str):
"size": "1024x1024",
"prompt": prompt,
}
res = await client.images.generate(**image_params)
res = await client.images.generate(**image_params) # type: ignore
await client.close()
return res.data[0].url
@ -77,26 +77,26 @@ async def generate_images(
img["src"].startswith("https://placehold.co")
and image_cache.get(img.get("alt")) is None
):
alts.append(img.get("alt", None))
alts.append(img.get("alt", None)) # type: ignore
# Exclude images with no alt text
alts = [alt for alt in alts if alt is not None]
alts = [alt for alt in alts if alt is not None] # type: ignore
# Remove duplicates
prompts = list(set(alts))
prompts = list(set(alts)) # type: ignore
# Return early if there are no images to replace
if len(prompts) == 0:
if len(prompts) == 0: # type: ignore
return code
# Generate images
results = await process_tasks(prompts, api_key, base_url)
results = await process_tasks(prompts, api_key, base_url) # type: ignore
# Create a dict mapping alt text to image URL
mapped_image_urls = dict(zip(prompts, results))
mapped_image_urls = dict(zip(prompts, results)) # type: ignore
# Merge with image_cache
mapped_image_urls = {**mapped_image_urls, **image_cache}
mapped_image_urls = {**mapped_image_urls, **image_cache} # type: ignore
# Replace old image URLs with the generated URLs
for img in images:

View File

@ -1,8 +1,21 @@
from typing import Awaitable, Callable, List
from typing import Any, Awaitable, Callable, List, cast
from anthropic import AsyncAnthropic
from openai import AsyncOpenAI
from openai.types.chat import ChatCompletionMessageParam, ChatCompletionChunk
from utils import pprint_prompt
MODEL_GPT_4_VISION = "gpt-4-vision-preview"
MODEL_CLAUDE_SONNET = "claude-3-sonnet-20240229"
MODEL_CLAUDE_OPUS = "claude-3-opus-20240229"
MODEL_CLAUDE_HAIKU = "claude-3-haiku-20240307"
# Keep in sync with frontend (lib/models.ts)
CODE_GENERATION_MODELS = [
"gpt_4_vision",
"claude_3_sonnet",
]
async def stream_openai_response(
@ -34,3 +47,126 @@ async def stream_openai_response(
await client.close()
return full_response
# TODO: Have a seperate function that translates OpenAI messages to Claude messages
async def stream_claude_response(
messages: List[ChatCompletionMessageParam],
api_key: str,
callback: Callable[[str], Awaitable[None]],
) -> str:
client = AsyncAnthropic(api_key=api_key)
# Base parameters
model = MODEL_CLAUDE_SONNET
max_tokens = 4096
temperature = 0.0
# Translate OpenAI messages to Claude messages
system_prompt = cast(str, messages[0]["content"])
claude_messages = [dict(message) for message in messages[1:]]
for message in claude_messages:
if not isinstance(message["content"], list):
continue
for content in message["content"]: # type: ignore
if content["type"] == "image_url":
content["type"] = "image"
# Extract base64 data and media type from data URL
# Example base64 data URL: data:image/png;base64,iVBOR...
image_data_url = cast(str, content["image_url"]["url"])
media_type = image_data_url.split(";")[0].split(":")[1]
base64_data = image_data_url.split(",")[1]
# Remove OpenAI parameter
del content["image_url"]
content["source"] = {
"type": "base64",
"media_type": media_type,
"data": base64_data,
}
# Stream Claude response
async with client.messages.stream(
model=model,
max_tokens=max_tokens,
temperature=temperature,
system=system_prompt,
messages=claude_messages, # type: ignore
) as stream:
async for text in stream.text_stream:
await callback(text)
# Return final message
response = await stream.get_final_message()
return response.content[0].text
async def stream_claude_response_native(
system_prompt: str,
messages: list[Any],
api_key: str,
callback: Callable[[str], Awaitable[None]],
include_thinking: bool = False,
model: str = MODEL_CLAUDE_OPUS,
) -> str:
client = AsyncAnthropic(api_key=api_key)
# Base model parameters
max_tokens = 4096
temperature = 0.0
# Multi-pass flow
current_pass_num = 1
max_passes = 2
prefix = "<thinking>"
response = None
while current_pass_num <= max_passes:
current_pass_num += 1
# Set up message depending on whether we have a <thinking> prefix
messages_to_send = (
messages + [{"role": "assistant", "content": prefix}]
if include_thinking
else messages
)
pprint_prompt(messages_to_send)
async with client.messages.stream(
model=model,
max_tokens=max_tokens,
temperature=temperature,
system=system_prompt,
messages=messages_to_send, # type: ignore
) as stream:
async for text in stream.text_stream:
print(text, end="", flush=True)
await callback(text)
# Return final message
response = await stream.get_final_message()
# Set up messages array for next pass
messages += [
{"role": "assistant", "content": str(prefix) + response.content[0].text},
{
"role": "user",
"content": "You've done a good job with a first draft. Improve this further based on the original instructions so that the app is fully functional and looks like the original video of the app we're trying to replicate.",
},
]
print(
f"Token usage: Input Tokens: {response.usage.input_tokens}, Output Tokens: {response.usage.output_tokens}"
)
if not response:
raise Exception("No HTML response found in AI response")
else:
return response.content[0].text

File diff suppressed because it is too large Load Diff

955
backend/poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,114 @@
# Not used yet
# References:
# https://github.com/hundredblocks/transcription_demo
# https://docs.anthropic.com/claude/docs/prompt-engineering
# https://github.com/anthropics/anthropic-cookbook/blob/main/multimodal/best_practices_for_vision.ipynb
VIDEO_PROMPT = """
You are an expert at building single page, funtional apps using HTML, Jquery and Tailwind CSS.
You also have perfect vision and pay great attention to detail.
You will be given screenshots in order at consistent intervals from a video of a user interacting with a web app. You need to re-create the same app exactly such that the same user interactions will produce the same results in the app you build.
- Make sure the app looks exactly like the screenshot.
- Pay close attention to background color, text color, font size, font family,
padding, margin, border, etc. Match the colors and sizes exactly.
- For images, use placeholder images from https://placehold.co and include a detailed description of the image in the alt text so that an image generation AI can generate the image later.
- If some fuctionality requires a backend call, just mock the data instead.
- MAKE THE APP FUNCTIONAL using Javascript. Allow the user to interact with the app and get the same behavior as the video.
In terms of libraries,
- Use this script to include Tailwind: <script src="https://cdn.tailwindcss.com"></script>
- You can use Google Fonts
- Font Awesome for icons: <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"></link>
- Use jQuery: <script src="https://code.jquery.com/jquery-3.7.1.min.js"></script>
Before generating the code for the app, think step-by-step: first, about the user flow depicated in the video and then about you how would you build it and how you would structure the code. Do the thinking within <thinking></thinking> tags. Then, provide your code within <html></html> tags.
"""
VIDEO_PROMPT_ALPINE_JS = """
You are an expert at building single page, funtional apps using HTML, Alpine.js and Tailwind CSS.
You also have perfect vision and pay great attention to detail.
You will be given screenshots in order at consistent intervals from a video of a user interacting with a web app. You need to re-create the same app exactly such that the same user interactions will produce the same results in the app you build.
- Make sure the app looks exactly like the screenshot.
- Pay close attention to background color, text color, font size, font family,
padding, margin, border, etc. Match the colors and sizes exactly.
- For images, use placeholder images from https://placehold.co and include a detailed description of the image in the alt text so that an image generation AI can generate the image later.
- If some fuctionality requires a backend call, just mock the data instead.
- MAKE THE APP FUNCTIONAL using Javascript. Allow the user to interact with the app and get the same behavior as the video.
In terms of libraries,
- Use this script to include Tailwind: <script src="https://cdn.tailwindcss.com"></script>
- You can use Google Fonts
- Font Awesome for icons: <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"></link>
- Use Alpine.js: <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
Before generating the code for the app, think step-by-step: first, about the user flow depicated in the video and then about you how would you build it and how you would structure the code. Do the thinking within <thinking></thinking> tags. Then, provide your code within <html></html> tags.
"""
HTML_TAILWIND_CLAUDE_SYSTEM_PROMPT = """
You have perfect vision and pay great attention to detail which makes you an expert at building single page apps using Tailwind, HTML and JS.
You take screenshots of a reference web page from the user, and then build single page apps
using Tailwind, HTML and JS.
You might also be given a screenshot (The second image) of a web page that you have already built, and asked to
update it to look more like the reference image(The first image).
- Make sure the app looks exactly like the screenshot.
- Do not leave out smaller UI elements. Make sure to include every single thing in the screenshot.
- Pay close attention to background color, text color, font size, font family,
padding, margin, border, etc. Match the colors and sizes exactly.
- In particular, pay attention to background color and overall color scheme.
- Use the exact text from the screenshot.
- Do not add comments in the code such as "<!-- Add other navigation links as needed -->" and "<!-- ... other news items ... -->" in place of writing the full code. WRITE THE FULL CODE.
- Make sure to always get the layout right (if things are arranged in a row in the screenshot, they should be in a row in the app as well)
- Repeat elements as needed to match the screenshot. For example, if there are 15 items, the code should have 15 items. DO NOT LEAVE comments like "<!-- Repeat for each news item -->" or bad things will happen.
- For images, use placeholder images from https://placehold.co and include a detailed description of the image in the alt text so that an image generation AI can generate the image later.
In terms of libraries,
- Use this script to include Tailwind: <script src="https://cdn.tailwindcss.com"></script>
- You can use Google Fonts
- Font Awesome for icons: <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"></link>
Return only the full code in <html></html> tags.
Do not include markdown "```" or "```html" at the start or end.
"""
#
REACT_TAILWIND_CLAUDE_SYSTEM_PROMPT = """
You have perfect vision and pay great attention to detail which makes you an expert at building single page apps using React/Tailwind.
You take screenshots of a reference web page from the user, and then build single page apps
using React and Tailwind CSS.
You might also be given a screenshot (The second image) of a web page that you have already built, and asked to
update it to look more like the reference image(The first image).
- Make sure the app looks exactly like the screenshot.
- Do not leave out smaller UI elements. Make sure to include every single thing in the screenshot.
- Pay close attention to background color, text color, font size, font family,
padding, margin, border, etc. Match the colors and sizes exactly.
- In particular, pay attention to background color and overall color scheme.
- Use the exact text from the screenshot.
- Do not add comments in the code such as "<!-- Add other navigation links as needed -->" and "<!-- ... other news items ... -->" in place of writing the full code. WRITE THE FULL CODE.
- Make sure to always get the layout right (if things are arranged in a row in the screenshot, they should be in a row in the app as well)
- CREATE REUSABLE COMPONENTS FOR REPEATING ELEMENTS. For example, if there are 15 similar items in the screenshot, your code should include a reusable component that generates these items. and use loops to instantiate these components as needed.
- For images, use placeholder images from https://placehold.co and include a detailed description of the image in the alt text so that an image generation AI can generate the image later.
In terms of libraries,
- Use these script to include React so that it can run on a standalone page:
<script src="https://unpkg.com/react/umd/react.development.js"></script>
<script src="https://unpkg.com/react-dom/umd/react-dom.development.js"></script>
<script src="https://unpkg.com/@babel/standalone/babel.js"></script>
- Use this script to include Tailwind: <script src="https://cdn.tailwindcss.com"></script>
- You can use Google Fonts
- Font Awesome for icons: <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"></link>
Return only the full code in <html></html> tags.
Do not include markdown "```" or "```html" at the start or end.
"""

View File

@ -14,11 +14,14 @@ openai = "^1.2.4"
python-dotenv = "^1.0.0"
beautifulsoup4 = "^4.12.2"
httpx = "^0.25.1"
pre-commit = "^3.6.2"
anthropic = "^0.18.0"
moviepy = "^1.0.3"
sentry-sdk = {extras = ["fastapi"], version = "^1.38.0"}
[tool.poetry.group.dev.dependencies]
pytest = "^7.4.3"
pyright = "^1.1.345"
pyright = "^1.1.352"
[build-system]
requires = ["poetry-core"]

View File

@ -2,8 +2,15 @@ import os
import traceback
from fastapi import APIRouter, WebSocket
import openai
from config import IS_PROD, SHOULD_MOCK_AI_RESPONSE
from llm import stream_openai_response
from config import ANTHROPIC_API_KEY, IS_PROD, SHOULD_MOCK_AI_RESPONSE
from custom_types import InputMode
from llm import (
CODE_GENERATION_MODELS,
MODEL_CLAUDE_OPUS,
stream_claude_response,
stream_claude_response_native,
stream_openai_response,
)
from openai.types.chat import ChatCompletionMessageParam
from mock_llm import mock_completion
from typing import Dict, List, cast, get_args
@ -14,9 +21,11 @@ from datetime import datetime
import json
from routes.logging_utils import PaymentMethod, send_to_saas_backend
from routes.saas_utils import does_user_have_subscription_credits
from prompts.claude_prompts import VIDEO_PROMPT
from prompts.types import Stack
from utils import pprint_prompt # type: ignore
# from utils import pprint_prompt
from video.utils import extract_tag_content, assemble_claude_prompt_video # type: ignore
router = APIRouter()
@ -60,7 +69,29 @@ async def stream_code(websocket: WebSocket):
generated_code_config = ""
if "generatedCodeConfig" in params and params["generatedCodeConfig"]:
generated_code_config = params["generatedCodeConfig"]
print(f"Generating {generated_code_config} code")
if not generated_code_config in get_args(Stack):
await throw_error(f"Invalid generated code config: {generated_code_config}")
return
# Cast the variable to the Stack type
valid_stack = cast(Stack, generated_code_config)
# Validate the input mode
input_mode = params.get("inputMode")
if not input_mode in get_args(InputMode):
await throw_error(f"Invalid input mode: {input_mode}")
raise Exception(f"Invalid input mode: {input_mode}")
# Cast the variable to the right type
validated_input_mode = cast(InputMode, input_mode)
# Read the model from the request. Fall back to default if not provided.
code_generation_model = params.get("codeGenerationModel", "gpt_4_vision")
if code_generation_model not in CODE_GENERATION_MODELS:
await throw_error(f"Invalid model: {code_generation_model}")
raise Exception(f"Invalid model: {code_generation_model}")
print(
f"Generating {generated_code_config} code for uploaded {input_mode} using {code_generation_model} model..."
)
# Track how this generation is being paid for
payment_method: PaymentMethod = PaymentMethod.UNKNOWN
@ -130,13 +161,6 @@ async def stream_code(websocket: WebSocket):
)
raise Exception("No OpenAI API key found")
# Validate the generated code config
if not generated_code_config in get_args(Stack):
await throw_error(f"Invalid generated code config: {generated_code_config}")
return
# Cast the variable to the Stack type
valid_stack = cast(Stack, generated_code_config)
# Get the OpenAI Base URL from the request. Fall back to environment variable if not provided.
openai_base_url = None
# Disable user-specified OpenAI Base URL in prod
@ -223,18 +247,52 @@ async def stream_code(websocket: WebSocket):
image_cache = create_alt_url_mapping(params["history"][-2])
# pprint_prompt(prompt_messages)
if validated_input_mode == "video":
video_data_url = params["image"]
prompt_messages = await assemble_claude_prompt_video(video_data_url)
# pprint_prompt(prompt_messages) # type: ignore
if SHOULD_MOCK_AI_RESPONSE:
completion = await mock_completion(process_chunk)
completion = await mock_completion(
process_chunk, input_mode=validated_input_mode
)
else:
try:
completion = await stream_openai_response(
prompt_messages,
api_key=openai_api_key,
base_url=openai_base_url,
callback=lambda x: process_chunk(x),
)
if validated_input_mode == "video":
if not ANTHROPIC_API_KEY:
await throw_error(
"No Anthropic API key found. Please add the environment variable ANTHROPIC_API_KEY to backend/.env"
)
raise Exception("No Anthropic key")
completion = await stream_claude_response_native(
system_prompt=VIDEO_PROMPT,
messages=prompt_messages, # type: ignore
api_key=ANTHROPIC_API_KEY,
callback=lambda x: process_chunk(x),
model=MODEL_CLAUDE_OPUS,
include_thinking=True,
)
elif code_generation_model == "claude_3_sonnet":
if not ANTHROPIC_API_KEY:
await throw_error(
"No Anthropic API key found. Please add the environment variable ANTHROPIC_API_KEY to backend/.env"
)
raise Exception("No Anthropic key")
completion = await stream_claude_response(
prompt_messages, # type: ignore
api_key=ANTHROPIC_API_KEY,
callback=lambda x: process_chunk(x),
)
else:
completion = await stream_openai_response(
prompt_messages, # type: ignore
api_key=openai_api_key,
base_url=openai_base_url,
callback=lambda x: process_chunk(x),
)
except openai.AuthenticationError as e:
print("[GENERATE_CODE] Authentication failed", e)
error_message = (
@ -270,8 +328,11 @@ async def stream_code(websocket: WebSocket):
)
return await throw_error(error_message)
if validated_input_mode == "video":
completion = extract_tag_content("html", completion)
# Write the messages dict into a log so that we can debug later
write_logs(prompt_messages, completion)
write_logs(prompt_messages, completion) # type: ignore
if IS_PROD:
# Catch any errors from sending to SaaS backend and continue

View File

@ -11,6 +11,8 @@ from evals.config import EVALS_DIR
from evals.core import generate_code_core
from evals.utils import image_to_data_url
STACK = "html_tailwind"
async def main():
INPUT_DIR = EVALS_DIR + "/inputs"
@ -23,7 +25,7 @@ async def main():
for filename in evals:
filepath = os.path.join(INPUT_DIR, filename)
data_url = await image_to_data_url(filepath)
task = generate_code_core(data_url, "vue_tailwind")
task = generate_code_core(data_url, STACK)
tasks.append(task)
results = await asyncio.gather(*tasks)

134
backend/video/utils.py Normal file
View File

@ -0,0 +1,134 @@
# Extract HTML content from the completion string
import base64
import io
import mimetypes
import os
import tempfile
import uuid
from typing import Union, cast
from moviepy.editor import VideoFileClip # type: ignore
from PIL import Image
import math
DEBUG = True
TARGET_NUM_SCREENSHOTS = (
20 # Should be max that Claude supports (20) - reduce to save tokens on testing
)
async def assemble_claude_prompt_video(video_data_url: str):
images = split_video_into_screenshots(video_data_url)
# Save images to tmp if we're debugging
if DEBUG:
save_images_to_tmp(images)
# Validate number of images
print(f"Number of frames extracted from video: {len(images)}")
if len(images) > 20:
print(f"Too many screenshots: {len(images)}")
return
# Convert images to the message format for Claude
content_messages: list[dict[str, Union[dict[str, str], str]]] = []
for image in images:
# Convert Image to buffer
buffered = io.BytesIO()
image.save(buffered, format="JPEG")
# Encode bytes as base64
base64_data = base64.b64encode(buffered.getvalue()).decode("utf-8")
media_type = "image/jpeg"
content_messages.append(
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": base64_data,
},
}
)
return [
{
"role": "user",
"content": content_messages,
},
]
# Returns a list of images/frame (RGB format)
def split_video_into_screenshots(video_data_url: str) -> list[Image.Image]:
target_num_screenshots = TARGET_NUM_SCREENSHOTS
# Decode the base64 URL to get the video bytes
video_encoded_data = video_data_url.split(",")[1]
video_bytes = base64.b64decode(video_encoded_data)
mime_type = video_data_url.split(";")[0].split(":")[1]
suffix = mimetypes.guess_extension(mime_type)
with tempfile.NamedTemporaryFile(suffix=suffix, delete=True) as temp_video_file:
print(temp_video_file.name)
temp_video_file.write(video_bytes)
temp_video_file.flush()
clip = VideoFileClip(temp_video_file.name)
images: list[Image.Image] = []
total_frames = cast(int, clip.reader.nframes) # type: ignore
# Calculate frame skip interval by dividing total frames by the target number of screenshots
# Ensuring a minimum skip of 1 frame
frame_skip = max(1, math.ceil(total_frames / target_num_screenshots))
# Iterate over each frame in the clip
for i, frame in enumerate(clip.iter_frames()):
# Save every nth frame
if i % frame_skip == 0:
frame_image = Image.fromarray(frame) # type: ignore
images.append(frame_image)
# Ensure that we don't capture more than the desired number of frames
if len(images) >= target_num_screenshots:
break
# Close the video file to release resources
clip.close()
return images
# Save a list of PIL images to a random temporary directory
def save_images_to_tmp(images: list[Image.Image]):
# Create a unique temporary directory
unique_dir_name = f"screenshots_{uuid.uuid4()}"
tmp_screenshots_dir = os.path.join(tempfile.gettempdir(), unique_dir_name)
os.makedirs(tmp_screenshots_dir, exist_ok=True)
for idx, image in enumerate(images):
# Generate a unique image filename using index
image_filename = f"screenshot_{idx}.jpg"
tmp_filepath = os.path.join(tmp_screenshots_dir, image_filename)
image.save(tmp_filepath, format="JPEG")
print("Saved to " + tmp_screenshots_dir)
def extract_tag_content(tag: str, text: str) -> str:
"""
Extracts content for a given tag from the provided text.
:param tag: The tag to search for.
:param text: The text to search within.
:return: The content found within the tag, if any.
"""
tag_start = f"<{tag}>"
tag_end = f"</{tag}>"
start_idx = text.find(tag_start)
end_idx = text.find(tag_end, start_idx)
if start_idx != -1 and end_idx != -1:
return text[start_idx : end_idx + len(tag_end)]
return ""

123
backend/video_to_app.py Normal file
View File

@ -0,0 +1,123 @@
# Load environment variables first
from dotenv import load_dotenv
load_dotenv()
import base64
import mimetypes
import time
import subprocess
import os
import asyncio
from datetime import datetime
from prompts.claude_prompts import VIDEO_PROMPT
from utils import pprint_prompt
from config import ANTHROPIC_API_KEY
from video.utils import extract_tag_content, assemble_claude_prompt_video
from llm import (
MODEL_CLAUDE_OPUS,
# MODEL_CLAUDE_SONNET,
stream_claude_response_native,
)
STACK = "html_tailwind"
VIDEO_DIR = "./video_evals/videos"
SCREENSHOTS_DIR = "./video_evals/screenshots"
OUTPUTS_DIR = "./video_evals/outputs"
async def main():
video_filename = "shortest.mov"
is_followup = False
if not ANTHROPIC_API_KEY:
raise ValueError("ANTHROPIC_API_KEY is not set")
# Get previous HTML
previous_html = ""
if is_followup:
previous_html_file = max(
[
os.path.join(OUTPUTS_DIR, f)
for f in os.listdir(OUTPUTS_DIR)
if f.endswith(".html")
],
key=os.path.getctime,
)
with open(previous_html_file, "r") as file:
previous_html = file.read()
video_file = os.path.join(VIDEO_DIR, video_filename)
mime_type = mimetypes.guess_type(video_file)[0]
with open(video_file, "rb") as file:
video_content = file.read()
video_data_url = (
f"data:{mime_type};base64,{base64.b64encode(video_content).decode('utf-8')}"
)
prompt_messages = await assemble_claude_prompt_video(video_data_url)
# Tell the model to continue
# {"role": "assistant", "content": SECOND_MESSAGE},
# {"role": "user", "content": "continue"},
if is_followup:
prompt_messages += [
{"role": "assistant", "content": previous_html},
{
"role": "user",
"content": "You've done a good job with a first draft. Improve this further based on the original instructions so that the app is fully functional like in the original video.",
},
] # type: ignore
async def process_chunk(content: str):
print(content, end="", flush=True)
response_prefix = "<thinking>"
pprint_prompt(prompt_messages) # type: ignore
start_time = time.time()
completion = await stream_claude_response_native(
system_prompt=VIDEO_PROMPT,
messages=prompt_messages,
api_key=ANTHROPIC_API_KEY,
callback=lambda x: process_chunk(x),
model=MODEL_CLAUDE_OPUS,
include_thinking=True,
)
end_time = time.time()
# Prepend the response prefix to the completion
completion = response_prefix + completion
# Extract the outputs
html_content = extract_tag_content("html", completion)
thinking = extract_tag_content("thinking", completion)
print(thinking)
print(f"Operation took {end_time - start_time} seconds")
os.makedirs(OUTPUTS_DIR, exist_ok=True)
# Generate a unique filename based on the current time
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = f"video_test_output_{timestamp}.html"
output_path = os.path.join(OUTPUTS_DIR, filename)
# Write the HTML content to the file
with open(output_path, "w") as file:
file.write(html_content)
print(f"Output file path: {output_path}")
# Show a notification
subprocess.run(["osascript", "-e", 'display notification "Coding Complete"'])
asyncio.run(main())

59
blog/evaluating-claude.md Normal file
View File

@ -0,0 +1,59 @@
# Claude 3 for converting screenshots to code
Claude 3 dropped yesterday, claiming to rival GPT-4 on a wide variety of tasks. I maintain a very popular open source project called “screenshot-to-code” (this one!) that uses GPT-4 vision to convert screenshots/designs into clean code. Naturally, I was excited to see how good Claude 3 was at this task.
**TLDR:** Claude 3 is on par with GPT-4 vision for screenshot to code, better in some ways but worse in others.
## Evaluation Setup
I dont know of a public benchmark for “screenshot to code” so I created simple evaluation setup for the purposes of testing:
- **Evaluation Dataset**: 16 screenshots with a mix of UI elements, landing pages, dashboards and popular websites.
<img width="784" alt="Screenshot 2024-03-05 at 3 05 52PM" src="https://github.com/abi/screenshot-to-code/assets/23818/c32af2db-eb5a-44c1-9a19-2f0c3dd11ab4">
- **Evaluation Metric**: Replication accuracy, as in “How close does the generated code look to the screenshot?” While there are other metrics that are important like code quality, speed and so on, this is by far the #1 thing most users of the repo care about.
- **Evaluation Mechanism**: Each output is subjectively rated by a human on a rating scale from 0 to 4. 4 = very close to an exact replica while 0 = nothing like the screenshot. With 16 screenshots, the maximum any model can score is 64.
To make the evaluation process easy, I created [a Python script](https://github.com/abi/screenshot-to-code/blob/main/backend/run_evals.py) that runs code for all the inputs in parallel. I also made a simple UI to do a side-by-side comparison of the input and output.
![Google Chrome](https://github.com/abi/screenshot-to-code/assets/23818/38126f8f-205d-4ed1-b8cf-039e81dcc3d0)
## Results
Quick note about what kind of code well be generating: currently, screenshot-to-code supports generating code in HTML + Tailwind, React, Vue, and several other frameworks. Stacks can impact the replication accuracy quite a bit. For example, because Bootstrap uses a relatively restrictive set of user elements, generations using Bootstrap tend to have a distinct "Bootstrap" style.
I only ran the evals on HTML/Tailwind here which is the stack where GPT-4 vision tends to perform the best.
Here are the results (average of 3 runs for each model):
- GPT-4 Vision obtains a score of **65.10%** - this is what were trying to beat
- Claude 3 Sonnet receives a score of **70.31%**, which is a bit better.
- Surprisingly, Claude 3 Opus which is supposed to be the smarter and slower model scores worse than both GPT-4 vision and Claude 3 Sonnet, comes in at **61.46%**.
Overall, a very strong showing for Claude 3. Obviously, there's a lot of subjectivity involved in this evaluation but Claude 3 is definitely on par with GPT-4 Vision, if not better.
You can see the [side-by-side comparison for a run of Claude 3 Sonnet here](https://github.com/abi/screenshot-to-code-files/blob/main/sonnet%20results.png). And for [a run of GPT-4 Vision here](https://github.com/abi/screenshot-to-code-files/blob/main/gpt%204%20vision%20results.png).
Other notes:
- The prompts used are optimized for GPT-4 vision. Adjusting the prompts a bit for Claude did yield a small improvement. But nothing game-changing and potentially not worth the trade-off of maintaining two sets of prompts.
- All the models excel at code quality - the quality is usually comparable to a human or better.
- Claude 3 is much less lazy than GPT-4 Vision. When asked to recreate Hacker News, GPT-4 Vision will only create two items in the list and leave comments in this code like `<!-- Repeat for each news item -->` and `<!-- ... other news items ... -->`.
<img width="699" alt="Screenshot 2024-03-05 at 9 25 04PM" src="https://github.com/abi/screenshot-to-code/assets/23818/04b03155-45e0-40b0-8de0-b1f0b4382bee">
While Claude 3 Sonnet can sometimes be lazy too, most of the time, it does what you ask it to do.
<img width="904" alt="Screenshot 2024-03-05 at 9 30 23PM" src="https://github.com/abi/screenshot-to-code/assets/23818/b7c7d1ba-47c1-414d-928f-6989e81cf41d">
- For some reason, all the models struggle with side-by-side "flex" layouts
<img width="1090" alt="Screenshot 2024-03-05 at 9 20 58PM" src="https://github.com/abi/screenshot-to-code/assets/23818/8957bb3a-da66-467d-997d-1c7cc24e6d9a">
- Claude 3 Sonnet is a lot faster
- Claude 3 gets background and text colors wrong quite often! (like in the Hacker News image above)
- My suspicion is that Claude 3 Opus results can be improved to be on par with the other models through better prompting
Overall, I'm very impressed with Claude 3 Sonnet for this use case. I've added it as an alternative to GPT-4 Vision in the open source repo (hosted version update coming soon).
If youd like to contribute to this effort, I have some documentation on [running these evals yourself here](https://github.com/abi/screenshot-to-code/blob/main/Evaluation.md). I'm also working on a better evaluation mechanism with Elo ratings and would love some help on that.

18
blog/video-to-app.md Normal file
View File

@ -0,0 +1,18 @@
## Capture a screen recording of a web site in action and have the AI build it for you.
* Unlike screenshots, the app will not visually look exactly like the screen recording but it will be functional
* IMPORTANT: This is very experimental and each call is expensive (a few dollars). I would recommend setting up usage limits on your Anthropic account to avoid excess charges.
## Setup
This uses Claude 3 by Anthropic. Add an env var ANTHROPIC_API_KEY to backend/.env with your API key from Anthropic.
## Examples
https://github.com/abi/screenshot-to-code/assets/23818/fe236c1e-ab92-4d84-b63d-e73e5be9a726
## Tips for taking videos
* We extract frames from your video so linger over each feature for a second or two.

View File

@ -16,4 +16,4 @@ COPY ./ /app/
EXPOSE 5173
# Command to run the application
CMD ["yarn", "dev", "--host", "0.0.0.0"]
CMD ["yarn", "dev", "--host", "0.0.0.0"]

View File

@ -13,4 +13,4 @@
"components": "@/components",
"utils": "@/lib/utils"
}
}
}

View File

@ -2,11 +2,7 @@
<html lang="en">
<head>
<meta charset="UTF-8" />
<link
rel="icon"
type="image/svg+xml"
href="https://picoapps.xyz/favicon.png"
/>
<link rel="icon" type="image/png" href="/favicon/main.png" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!-- Google Fonts -->

View File

@ -53,7 +53,8 @@
"tailwindcss-animate": "^1.0.7",
"thememirror": "^2.0.1",
"vite-plugin-checker": "^0.6.2",
"zustand": "^4.4.7"
"zustand": "^4.4.7",
"webm-duration-fix": "^1.0.4"
},
"devDependencies": {
"@types/node": "^20.9.0",

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

View File

@ -37,6 +37,10 @@ import ImportCodeSection from "./components/ImportCodeSection";
import { useAuth } from "@clerk/clerk-react";
import { useStore } from "./store/store";
import { Stack } from "./lib/stacks";
import { CodeGenerationModel } from "./lib/models";
import ModelSettingsSection from "./components/ModelSettingsSection";
import { extractHtml } from "./components/preview/extractHtml";
import useBrowserTabIndicator from "./hooks/useBrowserTabIndicator";
const IS_OPENAI_DOWN = false;
@ -48,6 +52,8 @@ function App({ navbarComponent }: Props) {
const [appState, setAppState] = useState<AppState>(AppState.INITIAL);
const [generatedCode, setGeneratedCode] = useState<string>("");
const [inputMode, setInputMode] = useState<"image" | "video">("image");
const [referenceImages, setReferenceImages] = useState<string[]>([]);
const [executionConsole, setExecutionConsole] = useState<string[]>([]);
const [updateInstruction, setUpdateInstruction] = useState("");
@ -67,6 +73,7 @@ function App({ navbarComponent }: Props) {
isImageGenerationEnabled: true,
editorTheme: EditorTheme.COBALT,
generatedCodeConfig: Stack.HTML_TAILWIND,
codeGenerationModel: CodeGenerationModel.GPT_4_VISION,
// Only relevant for hosted version
isTermOfServiceAccepted: true,
accessCode: null,
@ -74,6 +81,10 @@ function App({ navbarComponent }: Props) {
"setting"
);
// Code generation model from local storage or the default value
const selectedCodeGenerationModel =
settings.codeGenerationModel || CodeGenerationModel.GPT_4_VISION;
// App history
const [appHistory, setAppHistory] = useState<History>([]);
// Tracks the currently shown version from app history
@ -84,6 +95,9 @@ function App({ navbarComponent }: Props) {
const wsRef = useRef<WebSocket>(null);
// Indicate coding state using the browser tab's favicon and title
useBrowserTabIndicator(appState === AppState.CODING);
// When the user already has the settings in local storage, newly added keys
// do not get added to the settings so if it's falsy, we populate it with the default
// value
@ -146,6 +160,11 @@ function App({ navbarComponent }: Props) {
cancelCodeGenerationAndReset();
};
const previewCode =
inputMode === "video" && appState === AppState.CODING
? extractHtml(generatedCode)
: generatedCode;
const cancelCodeGenerationAndReset = () => {
// When this is the first version, reset the entire app state
if (currentVersion === null) {
@ -231,16 +250,21 @@ function App({ navbarComponent }: Props) {
}
// Initial version creation
async function doCreate(referenceImages: string[]) {
async function doCreate(
referenceImages: string[],
inputMode: "image" | "video"
) {
// Reset any existing state
reset();
setReferenceImages(referenceImages);
setInputMode(inputMode);
if (referenceImages.length > 0) {
await doGenerateCode(
{
generationType: "create",
image: referenceImages[0],
inputMode,
},
currentVersion
);
@ -275,6 +299,7 @@ function App({ navbarComponent }: Props) {
await doGenerateCode(
{
generationType: "update",
inputMode,
image: referenceImages[0],
resultImage: resultImage,
history: updatedHistory,
@ -286,6 +311,7 @@ function App({ navbarComponent }: Props) {
await doGenerateCode(
{
generationType: "update",
inputMode,
image: referenceImages[0],
history: updatedHistory,
isImportedFromCode,
@ -312,6 +338,13 @@ function App({ navbarComponent }: Props) {
}));
}
function setCodeGenerationModel(codeGenerationModel: CodeGenerationModel) {
setSettings((prev) => ({
...prev,
codeGenerationModel,
}));
}
function importFromCode(code: string, stack: Stack) {
setIsImportedFromCode(true);
@ -355,6 +388,14 @@ function App({ navbarComponent }: Props) {
}
/>
<ModelSettingsSection
codeGenerationModel={selectedCodeGenerationModel}
setCodeGenerationModel={setCodeGenerationModel}
shouldDisableUpdates={
appState === AppState.CODING || appState === AppState.CODE_READY
}
/>
{IS_RUNNING_ON_CLOUD &&
!(settings.openAiApiKey || settings.accessCode) &&
subscriberTier === "free" && <OnboardingNote />}
@ -372,11 +413,25 @@ function App({ navbarComponent }: Props) {
{/* Show code preview only when coding */}
{appState === AppState.CODING && (
<div className="flex flex-col">
{/* Speed disclaimer for video mode */}
{inputMode === "video" && (
<div
className="bg-yellow-100 border-l-4 border-yellow-500 text-yellow-700
p-2 text-xs mb-4 mt-1"
>
Code generation from videos can take 3-4 minutes. We do
multiple passes to get the best result. Please be patient.
</div>
)}
<div className="flex items-center gap-x-1">
<Spinner />
{executionConsole.slice(-1)[0]}
</div>
<div className="flex mt-4 w-full">
<CodePreview code={generatedCode} />
<div className="flex w-full">
<Button
onClick={cancelCodeGeneration}
className="w-full dark:text-white dark:bg-gray-700"
@ -384,7 +439,6 @@ function App({ navbarComponent }: Props) {
Cancel
</Button>
</div>
<CodePreview code={generatedCode} />
</div>
)}
@ -440,14 +494,27 @@ function App({ navbarComponent }: Props) {
"scanning relative": appState === AppState.CODING,
})}
>
<img
className="w-[340px] border border-gray-200 rounded-md"
src={referenceImages[0]}
alt="Reference"
/>
{inputMode === "image" && (
<img
className="w-[340px] border border-gray-200 rounded-md"
src={referenceImages[0]}
alt="Reference"
/>
)}
{inputMode === "video" && (
<video
muted
autoPlay
loop
className="w-[340px] border border-gray-200 rounded-md"
src={referenceImages[0]}
/>
)}
</div>
<div className="text-gray-400 uppercase text-sm text-center mt-1">
Original Screenshot
{inputMode === "video"
? "Original Video"
: "Original Screenshot"}
</div>
</div>
)}
@ -519,14 +586,14 @@ function App({ navbarComponent }: Props) {
</TabsList>
</div>
<TabsContent value="desktop">
<Preview code={generatedCode} device="desktop" />
<Preview code={previewCode} device="desktop" />
</TabsContent>
<TabsContent value="mobile">
<Preview code={generatedCode} device="mobile" />
<Preview code={previewCode} device="mobile" />
</TabsContent>
<TabsContent value="code">
<CodeTab
code={generatedCode}
code={previewCode}
setCode={setGeneratedCode}
settings={settings}
/>

View File

@ -3,6 +3,10 @@ import { useState, useEffect, useMemo } from "react";
import { useDropzone } from "react-dropzone";
// import { PromptImage } from "../../../types";
import { toast } from "react-hot-toast";
import { URLS } from "../urls";
import { Badge } from "./ui/badge";
import ScreenRecorder from "./recording/ScreenRecorder";
import { ScreenRecorderState } from "../types";
const baseStyle = {
flex: 1,
@ -51,20 +55,31 @@ type FileWithPreview = {
} & File;
interface Props {
setReferenceImages: (referenceImages: string[]) => void;
setReferenceImages: (
referenceImages: string[],
inputMode: "image" | "video"
) => void;
}
function ImageUpload({ setReferenceImages }: Props) {
const [files, setFiles] = useState<FileWithPreview[]>([]);
// TODO: Switch to Zustand
const [screenRecorderState, setScreenRecorderState] =
useState<ScreenRecorderState>(ScreenRecorderState.INITIAL);
const { getRootProps, getInputProps, isFocused, isDragAccept, isDragReject } =
useDropzone({
maxFiles: 1,
maxSize: 1024 * 1024 * 5, // 5 MB
// Keep in sync with backend supported mime types
maxSize: 1024 * 1024 * 20, // 20 MB
accept: {
// Image formats
"image/png": [".png"],
"image/jpeg": [".jpeg"],
"image/jpg": [".jpg"],
// Video formats
"video/quicktime": [".mov"],
"video/mp4": [".mp4"],
"video/webm": [".webm"],
},
onDrop: (acceptedFiles) => {
// Set up the preview thumbnail images
@ -79,7 +94,14 @@ function ImageUpload({ setReferenceImages }: Props) {
// Convert images to data URLs and set the prompt images state
Promise.all(acceptedFiles.map((file) => fileToDataURL(file)))
.then((dataUrls) => {
setReferenceImages(dataUrls.map((dataUrl) => dataUrl as string));
if (dataUrls.length > 0) {
setReferenceImages(
dataUrls.map((dataUrl) => dataUrl as string),
(dataUrls[0] as string).startsWith("data:video")
? "video"
: "image"
);
}
})
.catch((error) => {
toast.error("Error reading files" + error);
@ -141,15 +163,34 @@ function ImageUpload({ setReferenceImages }: Props) {
return (
<section className="container">
{/* eslint-disable-next-line @typescript-eslint/no-explicit-any */}
<div {...getRootProps({ style: style as any })}>
<input {...getInputProps()} />
<p className="text-slate-700 text-lg">
Drag & drop a screenshot here, <br />
or paste from clipboard, <br />
or click to upload
</p>
</div>
{screenRecorderState === ScreenRecorderState.INITIAL && (
/* eslint-disable-next-line @typescript-eslint/no-explicit-any */
<div {...getRootProps({ style: style as any })}>
<input {...getInputProps()} />
<p className="text-slate-700 text-lg">
Drag & drop a screenshot here, <br />
or click to upload
</p>
</div>
)}
{screenRecorderState === ScreenRecorderState.INITIAL && (
<div className="text-center text-sm text-slate-800 mt-4">
<Badge>New!</Badge> Upload a screen recording (.mp4, .mov) or record
your screen to clone a whole app (experimental).{" "}
<a
className="underline"
href={URLS["intro-to-video"]}
target="_blank"
>
Learn more.
</a>
</div>
)}
<ScreenRecorder
screenRecorderState={screenRecorderState}
setScreenRecorderState={setScreenRecorderState}
generateCode={setReferenceImages}
/>
</section>
);
}

View File

@ -0,0 +1,65 @@
import {
Select,
SelectContent,
SelectGroup,
SelectItem,
SelectTrigger,
} from "./ui/select";
import {
CODE_GENERATION_MODEL_DESCRIPTIONS,
CodeGenerationModel,
} from "../lib/models";
import { Badge } from "./ui/badge";
interface Props {
codeGenerationModel: CodeGenerationModel;
setCodeGenerationModel: (codeGenerationModel: CodeGenerationModel) => void;
shouldDisableUpdates?: boolean;
}
function ModelSettingsSection({
codeGenerationModel,
setCodeGenerationModel,
shouldDisableUpdates = false,
}: Props) {
return (
<div className="flex flex-col gap-y-2 justify-between text-sm">
<div className="grid grid-cols-3 items-center gap-4">
<span>Model:</span>
<Select
value={codeGenerationModel}
onValueChange={(value: string) =>
setCodeGenerationModel(value as CodeGenerationModel)
}
disabled={shouldDisableUpdates}
>
<SelectTrigger className="col-span-2" id="output-settings-js">
<span className="font-semibold">
{CODE_GENERATION_MODEL_DESCRIPTIONS[codeGenerationModel].name}
</span>
</SelectTrigger>
<SelectContent>
<SelectGroup>
{Object.values(CodeGenerationModel).map((model) => (
<SelectItem key={model} value={model}>
<div className="flex items-center">
<span className="font-semibold">
{CODE_GENERATION_MODEL_DESCRIPTIONS[model].name}
</span>
{CODE_GENERATION_MODEL_DESCRIPTIONS[model].inBeta && (
<Badge className="ml-2" variant="secondary">
Beta
</Badge>
)}
</div>
</SelectItem>
))}
</SelectGroup>
</SelectContent>
</Select>
</div>
</div>
);
}
export default ModelSettingsSection;

View File

@ -1,6 +1,6 @@
import { useEffect, useRef } from "react";
import classNames from "classnames";
// import useThrottle from "../hooks/useThrottle";
import useThrottle from "../hooks/useThrottle";
interface Props {
code: string;
@ -8,17 +8,14 @@ interface Props {
}
function Preview({ code, device }: Props) {
const throttledCode = code;
// Temporary disable throttling for the preview not updating when the code changes
// useThrottle(code, 200);
const iframeRef = useRef<HTMLIFrameElement | null>(null);
// Don't update code more often than every 200ms.
const throttledCode = useThrottle(code, 200);
useEffect(() => {
const iframe = iframeRef.current;
if (iframe && iframe.contentDocument) {
iframe.contentDocument.open();
iframe.contentDocument.write(throttledCode);
iframe.contentDocument.close();
if (iframeRef.current) {
iframeRef.current.srcdoc = throttledCode;
}
}, [throttledCode]);

View File

@ -8,7 +8,7 @@ import { useAuth } from "@clerk/clerk-react";
interface Props {
screenshotOneApiKey: string | null;
doCreate: (urls: string[]) => void;
doCreate: (urls: string[], inputMode: "image" | "video") => void;
}
export function UrlInputSection({ doCreate, screenshotOneApiKey }: Props) {
@ -51,7 +51,7 @@ export function UrlInputSection({ doCreate, screenshotOneApiKey }: Props) {
}
const res = await response.json();
doCreate([res.url]);
doCreate([res.url], "image");
} catch (error) {
console.error(error);
toast.error(

View File

@ -0,0 +1,16 @@
// Not robust enough to support <html lang='en'> for instance
export function extractHtml(code: string): string {
const lastHtmlStartIndex = code.lastIndexOf("<html>");
let htmlEndIndex = code.indexOf("</html>", lastHtmlStartIndex);
if (lastHtmlStartIndex !== -1) {
// If "</html>" is found, adjust htmlEndIndex to include the "</html>" tag
if (htmlEndIndex !== -1) {
htmlEndIndex += "</html>".length;
return code.slice(lastHtmlStartIndex, htmlEndIndex);
}
// If "</html>" is not found, return the rest of the string starting from the last "<html>"
return code.slice(lastHtmlStartIndex);
}
return "";
}

View File

@ -0,0 +1,10 @@
export function simpleHash(str: string, seed = 0) {
let hash = seed;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = (hash << 5) - hash + char;
hash |= 0; // Convert to 32bit integer
}
return hash;
}

View File

@ -0,0 +1,126 @@
import { useState } from "react";
import { Button } from "../ui/button";
import { ScreenRecorderState } from "../../types";
import { blobToBase64DataUrl } from "./utils";
import fixWebmDuration from "webm-duration-fix";
import toast from "react-hot-toast";
interface Props {
screenRecorderState: ScreenRecorderState;
setScreenRecorderState: (state: ScreenRecorderState) => void;
generateCode: (
referenceImages: string[],
inputMode: "image" | "video"
) => void;
}
function ScreenRecorder({
screenRecorderState,
setScreenRecorderState,
generateCode,
}: Props) {
const [mediaStream, setMediaStream] = useState<MediaStream | null>(null);
const [mediaRecorder, setMediaRecorder] = useState<MediaRecorder | null>(
null
);
const [screenRecordingDataUrl, setScreenRecordingDataUrl] = useState<
string | null
>(null);
const startScreenRecording = async () => {
try {
// Get the screen recording stream
const stream = await navigator.mediaDevices.getDisplayMedia({
video: true,
audio: { echoCancellation: true },
});
setMediaStream(stream);
// TODO: Test across different browsers
// Create the media recorder
const options = { mimeType: "video/webm" };
const mediaRecorder = new MediaRecorder(stream, options);
setMediaRecorder(mediaRecorder);
const chunks: BlobPart[] = [];
// Accumalate chunks as data is available
mediaRecorder.ondataavailable = (e: BlobEvent) => chunks.push(e.data);
// When media recorder is stopped, create a data URL
mediaRecorder.onstop = async () => {
// TODO: Do I need to fix duration if it's not a webm?
const completeBlob = await fixWebmDuration(
new Blob(chunks, {
type: options.mimeType,
})
);
const dataUrl = await blobToBase64DataUrl(completeBlob);
setScreenRecordingDataUrl(dataUrl);
setScreenRecorderState(ScreenRecorderState.FINISHED);
};
// Start recording
mediaRecorder.start();
setScreenRecorderState(ScreenRecorderState.RECORDING);
} catch (error) {
toast.error("Could not start screen recording");
throw error;
}
};
const stopScreenRecording = () => {
// Stop the recorder
if (mediaRecorder) {
mediaRecorder.stop();
setMediaRecorder(null);
}
// Stop the screen sharing stream
if (mediaStream) {
mediaStream.getTracks().forEach((track) => {
track.stop();
});
}
};
const kickoffGeneration = () => {
if (screenRecordingDataUrl) {
generateCode([screenRecordingDataUrl], "video");
} else {
toast.error("Screen recording does not exist. Please try again.");
throw new Error("No screen recording data url");
}
};
return (
<div className="flex items-center justify-center my-3">
{screenRecorderState === ScreenRecorderState.INITIAL && (
<Button onClick={startScreenRecording}>Record Screen</Button>
)}
{screenRecorderState === ScreenRecorderState.RECORDING && (
<div className="flex items-center flex-col gap-y-4">
<div className="flex items-center mr-2 text-xl gap-x-1">
<span className="block h-10 w-10 bg-red-600 rounded-full mr-1 animate-pulse"></span>
<span>Recording...</span>
</div>
<Button onClick={stopScreenRecording}>Finish Recording</Button>
</div>
)}
{screenRecorderState === ScreenRecorderState.FINISHED && (
<div className="flex items-center flex-col gap-y-4">
<div className="flex items-center mr-2 text-xl gap-x-1">
<span>Screen Recording Captured.</span>
</div>
<Button onClick={kickoffGeneration}>Generate</Button>
</div>
)}
</div>
);
}
export default ScreenRecorder;

View File

@ -0,0 +1,31 @@
export function downloadBlob(blob: Blob) {
// Create a URL for the blob object
const videoURL = URL.createObjectURL(blob);
// Create a temporary anchor element and trigger the download
const a = document.createElement("a");
a.href = videoURL;
a.download = "recording.webm";
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
// Clear object URL
URL.revokeObjectURL(videoURL);
}
export function blobToBase64DataUrl(blob: Blob): Promise<string> {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => {
if (reader.result) {
resolve(reader.result as string);
} else {
reject(new Error("FileReader did not return a result."));
}
};
reader.onerror = () =>
reject(new Error("FileReader encountered an error."));
reader.readAsDataURL(blob);
});
}

View File

@ -159,4 +159,4 @@ export {
SelectSeparator,
SelectScrollUpButton,
SelectScrollDownButton,
}
}

View File

@ -0,0 +1,29 @@
import { useEffect } from "react";
const CODING_SETTINGS = {
title: "Coding...",
favicon: "/favicon/coding.png",
};
const DEFAULT_SETTINGS = {
title: "Screenshot to Code",
favicon: "/favicon/main.png",
};
const useBrowserTabIndicator = (isCoding: boolean) => {
useEffect(() => {
const settings = isCoding ? CODING_SETTINGS : DEFAULT_SETTINGS;
// Set favicon
const faviconEl = document.querySelector(
"link[rel='icon']"
) as HTMLLinkElement | null;
if (faviconEl) {
faviconEl.href = settings.favicon;
}
// Set title
document.title = settings.title;
}, [isCoding]);
};
export default useBrowserTabIndicator;

View File

@ -0,0 +1,12 @@
// Keep in sync with backend (llm.py)
export enum CodeGenerationModel {
GPT_4_VISION = "gpt_4_vision",
CLAUDE_3_SONNET = "claude_3_sonnet",
}
export const CODE_GENERATION_MODEL_DESCRIPTIONS: {
[key in CodeGenerationModel]: { name: string; inBeta: boolean };
} = {
gpt_4_vision: { name: "GPT-4 Vision", inBeta: false },
claude_3_sonnet: { name: "Claude 3 Sonnet", inBeta: true },
};

View File

@ -1,4 +1,5 @@
import { Stack } from "./lib/stacks";
import { CodeGenerationModel } from "./lib/models";
export enum EditorTheme {
ESPRESSO = "espresso",
@ -12,6 +13,7 @@ export interface Settings {
isImageGenerationEnabled: boolean;
editorTheme: EditorTheme;
generatedCodeConfig: Stack;
codeGenerationModel: CodeGenerationModel;
// Only relevant for hosted version
isTermOfServiceAccepted: boolean;
accessCode: string | null;
@ -23,8 +25,15 @@ export enum AppState {
CODE_READY = "CODE_READY",
}
export enum ScreenRecorderState {
INITIAL = "initial",
RECORDING = "recording",
FINISHED = "finished",
}
export interface CodeGenerationParams {
generationType: "create" | "update";
inputMode: "image" | "video";
image: string;
resultImage?: string;
history?: string[];

4
frontend/src/urls.ts Normal file
View File

@ -0,0 +1,4 @@
export const URLS = {
"intro-to-video":
"https://github.com/abi/screenshot-to-code/blob/main/blog/video-to-app.md",
};

View File

@ -1704,6 +1704,11 @@ base64-arraybuffer@^1.0.2:
resolved "https://registry.npmjs.org/base64-arraybuffer/-/base64-arraybuffer-1.0.2.tgz#1c37589a7c4b0746e34bd1feb951da2df01c1bdc"
integrity sha512-I3yl4r9QB5ZRY3XuJVEPfc2XhZO6YweFPI+UovAzn+8/hb3oJ6lnysaFcjVpkCPfVWFUDvoZ8kmVDP7WyRtYtQ==
base64-js@^1.3.1:
version "1.5.1"
resolved "https://registry.yarnpkg.com/base64-js/-/base64-js-1.5.1.tgz#1b1b440160a5bf7ad40b650f095963481903930a"
integrity sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==
binary-extensions@^2.0.0:
version "2.2.0"
resolved "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.2.0.tgz"
@ -1751,6 +1756,14 @@ buffer-from@^1.0.0:
resolved "https://registry.yarnpkg.com/buffer-from/-/buffer-from-1.1.2.tgz#2b146a6fd72e80b4f55d255f35ed59a3a9a41bd5"
integrity sha512-E+XQCRwSbaaiChtv6k6Dwgc+bx+Bs6vuKJHHl5kox/BaKbhiXzqQOwK4cO22yElGp2OCmjwVhT3HmxgyPGnJfQ==
buffer@^6.0.3:
version "6.0.3"
resolved "https://registry.yarnpkg.com/buffer/-/buffer-6.0.3.tgz#2ace578459cc8fbe2a70aaa8f52ee63b6a74c6c6"
integrity sha512-FTiCpNxtwiZZHEZbcbTIcZjERVICn9yq/pDFkTl95/AxzD1naBctN7YO68riM/gLSDY7sdrMby8hofADYuuqOA==
dependencies:
base64-js "^1.3.1"
ieee754 "^1.2.1"
cac@^6.7.14:
version "6.7.14"
resolved "https://registry.yarnpkg.com/cac/-/cac-6.7.14.tgz#804e1e6f506ee363cb0e3ccbb09cad5dd9870959"
@ -2111,6 +2124,11 @@ dotenv@^16.0.0:
resolved "https://registry.yarnpkg.com/dotenv/-/dotenv-16.3.1.tgz#369034de7d7e5b120972693352a3bf112172cc3e"
integrity sha512-IPzF4w4/Rd94bA9imS68tZBaYyBWSCE47V1RGuMrB94iyTOIEwRmVL2x/4An+6mETpLrKJ5hQkB8W4kFAadeIQ==
ebml-block@^1.1.2:
version "1.1.2"
resolved "https://registry.yarnpkg.com/ebml-block/-/ebml-block-1.1.2.tgz#fd49951b0faf5a3049bdd61c851a76b5e679c290"
integrity sha512-HgNlIsRFP6D9VKU5atCeHRJY7XkJP8bOe8yEhd8NB7B3b4++VWTyauz6g650iiPmLfPLGlVpoJmGSgMfXDYusg==
ejs@^3.1.6:
version "3.1.9"
resolved "https://registry.yarnpkg.com/ejs/-/ejs-3.1.9.tgz#03c9e8777fe12686a9effcef22303ca3d8eeb361"
@ -2304,6 +2322,11 @@ esutils@^2.0.2:
resolved "https://registry.npmjs.org/esutils/-/esutils-2.0.3.tgz"
integrity sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g==
events@^3.3.0:
version "3.3.0"
resolved "https://registry.yarnpkg.com/events/-/events-3.3.0.tgz#31a95ad0a924e2d2c419a813aeb2c4e878ea7400"
integrity sha512-mQw+2fkQbALzQ7V0MY0IqdnXNOeTtP4r0lN9z7AAawCXgqea7bDii20AYrIBrFd/Hx0M2Ocz6S111CaFkUcb0Q==
execa@^8.0.1:
version "8.0.1"
resolved "https://registry.yarnpkg.com/execa/-/execa-8.0.1.tgz#51f6a5943b580f963c3ca9c6321796db8cc39b8c"
@ -2595,6 +2618,11 @@ human-signals@^5.0.0:
resolved "https://registry.yarnpkg.com/human-signals/-/human-signals-5.0.0.tgz#42665a284f9ae0dade3ba41ebc37eb4b852f3a28"
integrity sha512-AXcZb6vzzrFAUE61HnN4mpLqd/cSIwNQjtNWR0euPm6y0iqx3G4gOXaIDdtdDwZmhwe82LA6+zinmW4UBWVePQ==
ieee754@^1.2.1:
version "1.2.1"
resolved "https://registry.yarnpkg.com/ieee754/-/ieee754-1.2.1.tgz#8eb7a10a63fff25d15a57b001586d177d1b0d352"
integrity sha512-dcyqhDvX1C46lXZcVqCpK+FtMRQVdIMN6/Df5js2zouUsqG7I6sFxitIC+7KYK29KdXOLHdu9zL4sFnoVQnqaA==
ignore@^5.2.0, ignore@^5.2.4:
version "5.2.4"
resolved "https://registry.npmjs.org/ignore/-/ignore-5.2.4.tgz"
@ -2626,6 +2654,11 @@ inherits@2:
resolved "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz"
integrity sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==
int64-buffer@^1.0.1:
version "1.0.1"
resolved "https://registry.yarnpkg.com/int64-buffer/-/int64-buffer-1.0.1.tgz#c78d841b444cadf036cd04f8683696c740f15dca"
integrity sha512-+3azY4pXrjAupJHU1V9uGERWlhoqNswJNji6aD/02xac7oxol508AsMC5lxKhEqyZeDFy3enq5OGWXF4u75hiw==
invariant@^2.2.4:
version "2.2.4"
resolved "https://registry.yarnpkg.com/invariant/-/invariant-2.2.4.tgz#610f3c92c9359ce1db616e538008d23ff35158e6"
@ -3970,6 +4003,16 @@ w3c-keyname@^2.2.4:
resolved "https://registry.yarnpkg.com/w3c-keyname/-/w3c-keyname-2.2.8.tgz#7b17c8c6883d4e8b86ac8aba79d39e880f8869c5"
integrity sha512-dpojBhNsCNN7T82Tm7k26A6G9ML3NkhDsnw9n/eoxSRlVBB4CEtIQ/KTCLI2Fwf3ataSXRhYFkQi3SlnFwPvPQ==
webm-duration-fix@^1.0.4:
version "1.0.4"
resolved "https://registry.yarnpkg.com/webm-duration-fix/-/webm-duration-fix-1.0.4.tgz#fef235cb3d3ed3363507f705a7577dbb9fdedae6"
integrity sha512-kvhmSmEnuohtK+j+mJswqCCM2ViKb9W8Ch0oAxcaeUvpok5CsMORQLnea+CYKDXPG6JH12H0CbRK85qhfeZLew==
dependencies:
buffer "^6.0.3"
ebml-block "^1.1.2"
events "^3.3.0"
int64-buffer "^1.0.1"
which@^2.0.1:
version "2.0.2"
resolved "https://registry.npmjs.org/which/-/which-2.0.2.tgz"

View File

@ -1,42 +0,0 @@
# Sweep AI turns bugs & feature requests into code changes (https://sweep.dev)
# For details on our config file, check out our docs at https://docs.sweep.dev/usage/config
# This setting contains a list of rules that Sweep will check for. If any of these rules are broken in a new commit, Sweep will create an pull request to fix the broken rule.
rules:
- "All docstrings and comments should be up to date."
['All new business logic should have corresponding unit tests.', 'Refactor large functions to be more modular.', 'Add docstrings to all functions and file headers.']
# This is the branch that Sweep will develop from and make pull requests to. Most people use 'main' or 'master' but some users also use 'dev' or 'staging'.
branch: 'main'
# By default Sweep will read the logs and outputs from your existing Github Actions. To disable this, set this to false.
gha_enabled: True
# This is the description of your project. It will be used by sweep when creating PRs. You can tell Sweep what's unique about your project, what frameworks you use, or anything else you want.
#
# Example:
#
# description: sweepai/sweep is a python project. The main api endpoints are in sweepai/api.py. Write code that adheres to PEP8.
description: ''
# This sets whether to create pull requests as drafts. If this is set to True, then all pull requests will be created as drafts and GitHub Actions will not be triggered.
draft: False
# This is a list of directories that Sweep will not be able to edit.
blocked_dirs: []
# This is a list of documentation links that Sweep will use to help it understand your code. You can add links to documentation for any packages you use here.
#
# Example:
#
# docs:
# - PyGitHub: ["https://pygithub.readthedocs.io/en/latest/", "We use pygithub to interact with the GitHub API"]
docs: []
# Sandbox executes commands in a sandboxed environment to validate code changes after every edit to guarantee pristine code. For more details, see the [Sandbox](./sandbox) page.
sandbox:
install:
- trunk init
check:
- trunk fmt {file_path} || return 0
- trunk check --fix --print-failures {file_path}