top

Google Image Captioning Model Available

Yesterday one announcement came from Google that it has open-sourced its “Show And Tell”, a model for automatically generating captions for images. Almost 100% of our generation is obsessed with Instagram. But after this obsession also many of them remain indecisive about which photo to put and what will the caption for the image. Fortunately, this problem is going to solve permanently as people now can use an image captioning model in TensorFlow to caption their photos. In the year 2014, Google published a paper on the model and later in 2015 released the newer and more accurate version of the model. The model is now available on Github under the open source apache license. Google’s “Show and Tell” has a 93.9 percent of accuracy rate. Its previous versions fell between 89.6 percent and 91.8 percent accuracy. Previously, the versions had the accuracy ranged from 89.6 percent and 91.8 percent. A small change in accuracy will have a large amount of impact on usability. Achieving this accuracy was a very difficult task as both the vision and language frameworks should understand the picture. The dedicated team worked trained both vision and language frameworks with captions created by real people. It prevented the system to simply name the objects in a frame.Instead of that, it will make a descriptive and meaningful sentence according to the image. An accurate model can be created by taking into consideration how the objects are related to one another. For Example: In below picture, it is the man flying the kite, not just a man with a kite above him.                  Picture Credit : Google Research BlogBelow we have given some patterns and these patterns are combined to create original captions in previously unseen images.                                                             Picture Credit : Google Research Blog This model has the ability to bridge the gaps and connect objects with context. This technology will be helpful for scene recognition when a computer vision system needs to differentiate the scenes. Read More about this on Google Research Blog.
Rated 4.0/5 based on 20 customer reviews
Normal Mode Dark Mode

Google Image Captioning Model Available

Geneva Clark
What's New
24th Sep, 2016
Google Image Captioning Model Available

Yesterday one announcement came from Google that it has open-sourced its “Show And Tell”, a model for automatically generating captions for images.

Almost 100% of our generation is obsessed with Instagram. But after this obsession also many of them remain indecisive about which photo to put and what will the caption for the image. Fortunately, this problem is going to solve permanently as people now can use an image captioning model in TensorFlow to caption their photos.

In the year 2014, Google published a paper on the model and later in 2015 released the newer and more accurate version of the model. The model is now available on Github under the open source apache license.

Google’s “Show and Tell” has a 93.9 percent of accuracy rate. Its previous versions fell between 89.6 percent and 91.8 percent accuracy. Previously, the versions had the accuracy ranged from 89.6 percent and 91.8 percent. A small change in accuracy will have a large amount of impact on usability.

Achieving this accuracy was a very difficult task as both the vision and language frameworks should understand the picture. The dedicated team worked trained both vision and language frameworks with captions created by real people. It prevented the system to simply name the objects in a frame.Instead of that, it will make a descriptive and meaningful sentence according to the image.

An accurate model can be created by taking into consideration how the objects are related to one another. For Example: In below picture, it is the man flying the kite, not just a man with a kite above him.Google Image Captioning Model                  Picture Credit : Google Research BlogBelow we have given some patterns and these patterns are combined to create original captions in previously unseen images.Google Image Captioning Model 1                                                             Picture Credit : Google Research Blog

This model has the ability to bridge the gaps and connect objects with context. This technology will be helpful for scene recognition when a computer vision system needs to differentiate the scenes.

Read More about this on Google Research Blog.

Geneva

Geneva Clark

Blog Author
Geneva specializes in back-end web development and has always been fascinated by the dynamic part of the web. Talk to her about modern web applications and she and loves to nerd out on all things Ruby on Rails.

Leave a Reply

Your email address will not be published. Required fields are marked *

SUBSCRIBE OUR BLOG

Follow Us On

Share on

other Blogs

20% Discount