Image to text conversion and object detection using Amazon Rekognition and Python.
As the machines are getting trained, automation in almost everything is made possible by many tools and services. Amazon Rekognition is one of them, it uses Artificial intelligence and Deep learning for object detection, detection of text in an image and for many other similar purposes.
In this tutorial, you will learn how to use AWS Rekognition to extract text from an image. AWS Rekognition can be used along with other AWS services like AWS lambda and S3 bucket to automate this process very easily. At the end of this tutorial you will learn:
How to create IAM role.
Creating an S3 bucket and uploading files in the bucket.
Creating Lambda function.
Automating image to text conversion using the above mentioned AWS services and Python.
At the end of this tutorial you will also learn detecting objects in an image using Amazon Rekognition.
Lastly, you will learn to optimize the results.
Prerequisites
To follow along with this tutorial, you will need the following:
Amazon AWS account. ( For this tutorial, a free tier account will also work).
Basic knowledge of Python ( Python 3.7 is used in this tutorial but Python 3.6 will also work).
Basic knowledge of Python programming.
Create an AWS account
Start by logging in or registering to the AWS Management console. By visiting: aws.amazon.com/console To start with, you can use a free tier account. With a free tier account you can analyze up to 5,000 images using Amazon Rekognition. Additionally, with a free tier account you can make up to 1M free requests!
Create an IAM role
An IAM role defines a set of permissions which governs restricted use of AWS resources. Here, we will create an IAM role so that our AWS lambda function can execute and get S3 objects. In other words AWS lambda will assume the role, in order to perform the required task. Additionally, we will use the AmazonRekognitionFullAccess policy. Let’s get started!
Navigate to AWS IAM Console.
Click on Roles.
Click on Create role. To create a new role.
Since we are creating this role for an AWS Service. Select the AWS Services tab.
Click on AWS lambda followed by Next permissions.
Now, we are all set to give AWS lambda permissions. We will be adding 2 policies for our lambda function. These are predefined policies by AWS. To do so, filter and select the policies by following names:
AWSLambdaExecute
AmazonRekognitionFullAccess
Now go ahead and add a tag to this role you just created!. Tags helps in identifying and organizing the resources. I am giving the key name as “image_text” and value as image.
Click on review.
Now go ahead, and give your role a name.
I am going to give the name as “My_lambda” for now
- Finally, click on “create role”. Yes! Our new role is created!!
We successfully created the role which our lambda function will assume while executing. (Our python code)
Create an S3 bucket
Now, we need to store the image somewhere. So that our lambda function can get the image easily. For that we will create an S3 bucket. And upload our image there. You can make use of an already defined S3 bucket or create a new one. Feel free to skip this part if you already have a S3 bucket.
Navigate to S3 management Console.
Click on create bucket.
Give your bucket a name. For this tutorial, I am keeping the name simply as ‘imagetotext567’. Note that S3 bucket names have to be unique globally. If the name is already associated with a bucket, you cannot assign the same name to your bucket. For this tutorial you can keep the region as default.
TIP: If the name still doesn’t work for you try adding random numbers after the name. Also, note that your bucket name cannot contain any uppercase letters.
One more thing to note here is : By default our S3 bucket will block all public access. If you want to create a bucket to host a website you must untick that option. Finally, click on create bucket and move on.
And we are done creating an S3 bucket!
Upload an image in the S3 bucket
Now click on your bucket name and choose “upload” to upload the image.
Finally, click on Add files and then “upload”.
And we are done with uploading our image.
Create Lambda function and write Python code
Here comes the exciting part! We will now write our program in Python using Lambda function and execute it. Firstly, we need to create a lambda function.
Navigate to the lambda management console.
Click on create function
Choose the language as python 3.7 (python 3.6 can also be used) and give it a name. (Here, “ImageTotext”) It's time to assign a role to our lambda function.
By clicking on the change execution role and choosing the name of the role we created (at the very beginning of this tutorial) “My_lambda”.
Then click on create function
Now it's time to write our code. Very first step is to import a library called “boto3”. It will allow us to use the services of AWS. Next step is to reference Amazon Rekognition API:
client=boto3.client('rekognition')
And then fetching our image from the S3 bucket.
In order to get your image from the S3 bucket use the following line of code:
response=client.detect_text(Image={'S3Object':{'Bucket':’bucket’,'Name':’photo’}})
Replace bucket by your bucket name and photo by your image name.
import json
import boto3
def lambda_handler(event, context):
client=boto3.client('rekognition')
response=client.detect_text(Image={'S3Object':
{'Bucket':"imagetotext567",'Name':"passport.jpg"}})
Now, go to the configuration tab to change the time of execution.
Click on edit. Add the time. Max time which can be allocated to the function is 900 seconds and by default time is 3 seconds. For this tutorial, we can allocate anything between 3 - 4 minutes.
Then save the changes.
Let's first simply try to print the response. We will do the formatting later. Till now:
We fetched our image from the S3 bucket.
Gave our function an appropriate amount of time to run properly.
Used detect_text method to detect text from our image.
The image used in this tutorial is taken from: Passport
- Click on Deploy to deploy the changes.
NOTE:Every time you make any changes in the code you will have to Deploy the changes. Otherwise, it will not reflect on execution.
- Then click on Test. The response we will get is:
The above results displays the detected text in the image and its type. Also, the confidence level for that text. But do we really need to know Geometry, Parent Id and Id? It depends upon the requirements. For this tutorial we will make the execution results clean by only displaying what is required.
One more thing to note is that it also shows us results for text with confidence as low as 42%.
Optimization
Now let's just optimise our code to just display “DetectedText” and its “Type”.
From the TextDetections object we will choose only some name/value pairs that we want.
To do this:
Create a variable "detectText" to get the "TextDetection" object.
Iterate through "detectText" to get only three name/value pairs. "DetectedText", “Confidence” and "Type".
Do not forget to round up the value of confidence.
import json
import boto3
def lambda_handler(event, context, min_confidence=90):
client=boto3.client('rekognition')
response=client.detect_text(Image={'S3Object':{'Bucket':"imagetotext567",'Name':"passport.jpg"}})
detectedText = response['TextDetections']
print("Congratulations! You just fetched text from the image successfully. Total number of responses fetched from the given image {}".format(len(detectedText)))
#Iterate through detectedText to get the required name/value pairs.
for text in detectedText:
#Get DetectedText
print('Detected Text:' + text['DetectedText'], end=" ")
#Get the Confidence
print('Confidence:' + "{}".format(round(text['Confidence'])) + '%', end=" ")
#Get the type of the text
print('Text Type:' + text['Type'])
print("-")
return "Congratulations! You just fetched text from Image."
def main():
test = lambda_handler(event, context)
if __name__ == "__main__":
main()
Finally, deploy the changes and test the code.
The output looks much nicer now!
What if you want to identify objects in an image? The whole process will be the same. You just have to use the “detect_labels” method. Amazon Rekognition uses thousands of labels. Below is the image we will use for a quick demonstration to detect labels in an image.
Go to the S3 bucket and simply upload the image.
Create another function choosing Python 3.7 as a language and give the execution role which we previously created in this tutorial. Image source: DOG
import json
import boto3
def lambda_handler(event, context, min_confidence=90):
client=boto3.client('rekognition')
response=client.detect_labels(Image={'S3Object':{'Bucket':"imagetotext567",'Name':"Dog.jpg"}},
MinConfidence=89)
detectedObjects = response['Labels']
#Iterate through text
for objs in detectedObjects:
#Get the name
print('Detected Objects:' + objs['Name'], end=" ")
#Get the Confidence
print('Confidence:' + "{}".format(round(objs['Confidence'])) + '%')
print("-")
return "Great we were able to identify objects in the image"
def main():
test = lambda_handler(event, context)
if __name__ == "__main__":
main()
In the above code we used a filter "MinConfidence" to only restrict our results to objects with minimum confidence of 89%. Below is the result:
Conclusion
In this tutorial you learnt how to integrate AWS lambda , Rekognition, Python and S3 bucket to identify objects in an image as well as to find text in an image.
Image recognition and image to text conversion is used to automate a lot of banal tasks. For instance, if you have a lot of images for analysis, automation like this can help save a lot of time and also helps in avoiding human errors.
Congratulations on taking your first step towards automation!
NOTE:Be sure to delete S3 bucket and AWS lambda function after finishing this tutorial to avoid any additional charges. If you are new to AWS, then below snapshots can help you with the deletion procedure.