The aim of this short summary is to provide an overview of the tasks an AI system is, should be and will be capable to execute in the near future.
In particular we focus on classifying tasks with respect to NLP.
When testing existing algorithms it is of great importance to have a clear interpretation of the results.
This can be achieved by setting up specific learning environments in such a way that the complexity of the task is always under control.
More generally, the state-of-the-art problems determine next development steps for Musio.
common sense, coreference, compound, conjunction, deduction, induction, argument relation, counting, negation, indefinite knowledge, time reasoning, positional and size reasoning, path finding, motivational reasoning.
Let us start by mentioning several AI tests which are out there.
The Allen Institute for AI provides a test on general knowledge, called ARISO, in the form of science exams for 4th, 8th and 12th grade students.
These exams take the form of multiple choice questions and only recently the best algorithms on the 8th grade exam reached a score of 0.60.
A test that is more standardized is the Winograd schema challenge, where a sentence with a coreference and two possible word choices has to be completed.
Both these tests require general knowledge about the world which is not provided in the test situation.
In contrast the follow tests contain the required background knowledge in order to answer distinct questions.
The MCTest provides 660 stories with associated multiple choice questions, which each assume a different reasoning.
More specific is the Children’s book test (CBT) which measures language modeling in a wider linguistic context by the before mentioned sentence completion task.
Closely related to real-world tasks are the following tests.
The CNN QA tests is based on news articles with abstract bullet point summaries which have been rephrased as questions.
Focusing on the topic movies, the Movie dialog data set allows to ask factoid questions and provides recommendations.
Despite being limited topic-wise this task is more suited for training and building dialog systems.
A different approach to overcoming the general scarcity of large datasets is based on simulating the needed data.
Clearly, this does not allow to capture the complexity of natural language however in this way it is possible to specify certain classes of self-contained tasks.
Only to name a few here, supporting fact questions require to distinguish between given relevant and irrelevant facts, deduction and induction assume a certain logic reasoning and positional and size reasoning allow to recognize objects.
Among the hardest tasks is path finding according to a relative description between to points A and B with respect to a point C.
Especially, tasks that require a kind of long term memory provide hard tasks.
Appropriated tested algorithms take the form of N-Gram models, structured support vector machines or certain types of recurrent neural networks.
Very promising are also memory networks with neural components.
With regard to Musio’s focus on emotional and sentimental reasoning tasks beyond logical reasoning should be considered.
“TOWARDS AI-COMPLETE QUESTION ANSWERING: A SET OF PREREQUISITE TOY TASKS“(PDF). TOWARDS AI-COMPLETE QUESTION ANSWERING: A SET OF PREREQUISITE TOY TASKS. December 2015. Retrieved Feburary 23, 2016.
“Task generation for testing text understanding and reasoning“(GIT). Task generation for testing text understanding and reasoning. Retrieved Feburary 23, 2016.
“Artificial Tasks for Artificial Intelligence” (PDF). Artificial Tasks for Artificial Intelligence. May 2015. Retrieved Feburary 23, 2016.
“THE GOLDILOCKS PRINCIPLE: READING CHILDREN’S BOOKS WITH EXPLICIT MEMORY REPRESENTATIONS“(PDF). THE GOLDILOCKS PRINCIPLE: READING CHILDREN’S BOOKS WITH EXPLICIT MEMORY REPRESENTATIONS. January 2016. Retrieved Feburary 23, 2016.
“Teaching Machines to Read and Comprehend” (PDF). Teaching Machines to Read and Comprehend. November 2015. Retrieved February 23, 2016.
“EVALUATING PREREQUISITE QUALITIES FOR LEARN- ING END-TO-END DIALOG SYSTEMS” (PDF). EVALUATING PREREQUISITE QUALITIES FOR LEARN- ING END-TO-END DIALOG SYSTEMS. Jan 2016. Retrieved Feburary 23, 2016.