Introduction
In this tutorial we’re going to translate an image with text in a foreign language to english - but you could translate it to another language too.
To do that I found a french recipe for crêpes that looks appetizing but I don’t speak french so let’s see how we can leverage PixieBrix to translate this recipe into something easier for me to “digest”
Overview
Our needs
To achieve this translation of an image written in a different language - to a language we understand, we’re going to set up two sets of bricks - one to capture the image, one to display the results, specifically:
- A contextual menu with a modal: to trigger the start of the translation by allowing us to select the languages to translate from and to, and to make sure we capture only the area of interest in our browser’s tab (i.e. we want to translate only the image of the recipe - not the whole tab)
- A sidebar: to display the original and translated text at the end of the translation
This is the flowchart of the steps we’re going to take and a rough idea of all the bricks we’re going to combine together to get our desired results!
1. The contextual menu
Open the page editor and then press the Add button to select Context menu as a trigger
Then click the blue “Create new Context Menu” button.
You should see this first brick appear in the list:
We’re going to set this Name to Translation Context Menu
We’re also giving it a Title: Translate an image with OCR
Additionally, we’re going to set Sites and Automatic Permission (under Advanced) to ALL URLs
by pressing the corresponding blue link
We’re done with this brick - now we will add a new brick called Screenshot Tab
Add another brick: Screenshot
To do that we will click on the + button under the Context Menu brick
Then type Screenshot Tab in the Search bar and select the first result - now press the blue Add button to add it to our list of bricks.
Ensure the output key of this brick is named @screenshot We will be use this output key in the next step
There’s nothing to configure in this step - so let’s go ahead and add another brick
Add another brick: Showing a modal
To do that we will click on the + button under the Screenshot Tab brick
This time we’re adding a way to render a modal.
In the search bar type Show a modal or a sidebar form, then add the first resulting brick
Configuring the Show a Modal brick
The first thing I will do here is rename this brick to a shorter name: Show a form
you will notice the brick’s name changes as you rename it, but if you delete your custom name it will revert back to its default name
Take a note of the output variable for this brick: @form - we will be using it in a later step.
We will set this Form title to Language Selector
We will be adding two fields to this form and configure them in the following way:
1st field
Attributes Names & Values:
- Name:
source
- Label:
Image language
- Field Description:
The language used in the image
- Input Type:
Dropdown
- Default Value:
fra
- Options (1 per row):
- spa
- eng
- ara
- deu
- fra
- ita
- jpn
- por
- rus
Attributes names | Attributes values |
Name | source |
Label | Image language |
Field Description | The language used in the image |
Input Type | Dropdown |
Default Value | fra |
Options (1 per row) | |
spa | |
eng | |
ara | |
dee | |
fra | |
ita | |
jpn | |
por | |
rus |
Make this a Required field and Cancelable
Also ensure Location is set to modal
(which is its default)
Now we will add a 2nd field
2nd field
Attributes Names & Values:
- Name:
target
- Label: Target
Language
- Field Description:
The language to translate the image to
- Input Type:
Dropdown
- Default Value:
en
- Options (1 per row):
- es
- en
- ar
- de
- fr
- it
- ja
- pt
- ru
Attributes names | Attributes values |
Name | target |
Label | Target language |
Field Description | The language to translate the image to |
Input Type | Dropdown |
Default Value | en |
Options (1 per row) | |
es | |
en | |
ar | |
de | |
fr | |
it | |
ja | |
pt | |
ru |
Make this a Required field and Cancelable
3rd field
This third field will allow us to edit the image by cropping the screenshot of the whole tab before sending it
Attributes Names & Values:
- Name:
crop
- Label:
Image Cropper
- Input Type:
Image Crop
- Image Source:
@screenshot.data
Attributes names | Attributes values |
Name | crop |
Label | Image Cropper |
Input Type | Image Crop |
Image Source | @screenshot.data |
Add another brick: Parse Data URL
Once more, we will click on the + button under the Show form brick
This time we’re adding a way to parse the data returned by the screenshot (which is the image encoded in base64).
Base64 is an encoding algorithm that converts any characters, binary data, and even images or sound files into a readable string, which can be saved or transported over the network without data loss. The characters generated from Base64 encoding consist of Latin letters, digits, plus, and slash. Base64 is most commonly used as a MIME (Multipurpose Internet Mail Extensions) transfer encoding for email. (source: https://elmah.io/tools/base64-image-encoder/)
In the search bar type Parse Data URL, then add the first resulting brick
Configuring the Parse Data URL brick
First of all, let’s make sure the Output key is named @dataURL
As an input we’re going to set the value of url to @form.crop
And that’s it for this brick!
Add another brick: Set shared page state
We will click on the + button under the Parse Data URL brick and search for the Set shared page state brick and add it.
This brick will hold the values of the state of the page, useful for when the mods are crunching numbers and data with external API calls - like a traffic light.
Configuring the Set shared page state brick
Make sure the Output value is set as @state
We will add two property fields to this brick:
- loading: and set a
0/1
value to 1 - translation: and set a
0/1
value to 0
and also set mergeStrategy to shallow
It should look like this
Add another brick: HTTP Request
We will click on the + button under the Set shared page state brick and search for the HTTP Request brick.
Once you located it, add it.
Configuring the HTTP Request brick
We’re going to make sure the Output key variable name for this one is @response
Then set the URL to https://ocr-supreme.p.rapidapi.com/ocr/image
Select from the dropdown the service to be equal to OCR Supreme
For method we need to select post
then scroll down to data and we will be setting 3 property
- data: set this to
@dataURL.body
This is the data we will be sending to the API endpoint, which we stored earlier in the @dataURL output key - lang: set this to
@form.source
This is the value of the modal, where we selected the source language - the original language of the image - output: set this to
plain text
That’s it for this brick
Add another brick: Translate
We will click on the + button under the HTTP Request brick - search for the Translate brick.
Once you located it, add it.
Configuring the Translate brick
Ensure the output key name for this brick is @translated
Then we will set the following values
- query:
@response.data
- target:
@form.target
This will make our query match the value stored in @response.data which is the data returned by the HTTP request we did to the OCR service - which converted the text on the image, into actual parse-able text. At the same time, we want to pass the value for the source language the language of the image and the target language, the language that we want the image’s text to be translated to.
Add another brick: Set shared page state
We will click on the + button under the Translate brick and search for the Set shared page state brick and add it.
Configuring the Set shared page state brick
Let’s ensure the Output key value is set to @state
In this brick we’re going to set 3 property values:
- loading: set the value to
1
from the 0/1 value picker - extracted: set the value to
@response.data
- translation: set the value to
@translated.translatedText
Add another brick: Show sidebar
Final brick for this part - we want to show the results in the sidebar - and thus we need to trigger the opening of a sidebar.
We will click on the + button under the Set shared page state brick and search for the Show sidebar brick.
Once you located it, add it.
We don’t have to configure this one - and we can finally take a 5 min break to hydrate from this tutorial and take it all in!
Adding a new mod: Showing results in the sidebar
In the previous part of this tutorial we just created a context menu that will process a screenshot of the current tab - when you click on a page.
This is the logic:
The image is converted to a base64 data string. We then send this as a payload to an OCR API that converts it to text, and ultimately with an additional external API call we translate the text to the desired target language.
Now we need to take the translated data and show a nice sidebar panel with the image’s text and the translated version
Add a new Trigger: Show sidebar
First we’re going to add a Sidebar Panel mod - this sidebar will take the data from the previous mod and display it nicely...on the side of your browser’s tab!
Configuring the Show sidebar brick
Set the Name to Translation Results
Set the Heading to Translation Results
too - but you could call the heading to anything you like!
Additionally we’re going to set Sites to ALL URLs
Add a new Brick: Get shared page state
In this step we’re retrieving the state that we had set in the previous part of the tutorial, the reason we’re doing this is that we do not want the translation to pop-up
In the Get shared page state brick we want to set the Output key value to be named @state
Add a new Brick: Render document
The last step of this tutorial is to render the sidebar, this will require to format the sidebar and include some templating language to display the values in the right order
On the right hand side of this brick delete all the containers (the boxes with the dotted line) and then add one back
To delete a container, highlight it on the right hand side by clicking it, this will then create a new section in the middle of the page editor, where a red button appears ✨.
Press the red button that says “Remove Element” to remove it.
See this image below to see exactly what I clicked first, and the button I clicked second.
Repeat this to remove all elements until you’re left with the only container body
Next we want to click the three dot menu and add a Header 1
Double click Header and rename its Header 1 value (in the middle of the page editor) to Translation
Using the three dot menu add another element called Text
With this new field being selected change the value of text to be this code below
{% if @state.translation %}
{{ @state.translation }}
{% elif @state.loading %}
Running OCR and translation
{% else %}
No translated text. Right click the page to translate text.
{% endif %}
This is what it should look like:
This code looks for the translated strings and shows them in the side bar when the translation occurs.
Using the three dot menu add another element called Header 1
Double click the field and change the value of title to be Original
Using the three dot menu add another element called Text
With this new field being selected change the value of text to be this code below
{% if @state.translation %}
{{ @state.extracted }}
{% else %}
None
{% endif %}
And we’re done!
Save your mods and we can go ahead and test it out.
Conclusion
Open up this recipe
Right click and select the Translate this image with OCR
I picked Image language as “fra” (that’s the code used by the API for “France” aka french)
I also added Target language to be “eng” (yup that’s English!)
Then I use the image cropper in the modal, to resize and focus on the page exactly the part of the recipe I want to translate
A few moments later the sidebar opens
and just like that - automagically - I was able to translate this section of the recipe!
Now rinse, and repeat for the other parts of the image, one step at a time.