Microsoft has come up with a new public tool called Microsoft Concept Graph which uses probase as a base for this technology .Probase provides 5.4 million concepts leaving behind other knowledge databases like Cyc, which provides 120,000 concepts.
The main aim of all the networked information is to support the text analysis by combining interpretations along with probabilities-the similar concept is used by humans for using the rapid process of elimination to perform the same task.
The human world consists of many ideas about worldly facts, and this technology tries to create the same facts. The ecology of Microsoft Concept Graph alone includes more than 5.4 million concepts. The graph gives the idea of the conceptual distribution where the Y-Axis show the number of instances of each concept contains and the X axis shows 5.4million concepts grouped by their size. Existing knowledge has very fewer concepts when compared with this technology, which is not sufficient for the modeling human world.
For example: Let us take a problem in natural language processing. Human beings don’t find these sentences like “animals other than cats such as rats” which has to two different meanings, but machine parsing may result in two possible sentences: “Rats are animals” or “rats are cats”.
In order to map text format entities into correct concept categories with some probabilities, Conceptualization model targets are used, which may depend on the contexts of the entities.
For example ”Apple” could be automatically mapped to “Fruit” ,”Company” and “Manufacturer” etc . With several probabilities.This model provides computers with the common sense computation capability and makes machines “conscious ” of the imaginative world of human beings,through which machines are made to understand human communication in the text.
Proper conceptualization mapping of instances or short texts into a large auto analyzed concept space, which is also a vector space with human-level concept reasoning.It can be considered as both human understandable and machine understandable text embedding.It also has the capacity to provide text concept tagging and short text acceptance comparison computation etc. for understanding the text. This can earn several text processing applications which include search engines,online marketing,automatic question-answering, endorsing systems and artificial intelligence system.
The current version released can rank categorical relevance for any text entry. Microsoft’s basic-level conceptualization will be provided to preferentially rank efficient and appropriate categories alongside other measures like MI, PMI, PMI^k, and Typicality.