Communication in a human Language involves a complex process of phrases, subtle nuances of words and the context in which each of the words is uttered. Most of the process is intuitive to us while it is extremely complicated to translate the logic to a machine-understandable software for further processing. We have implemented a technique of Natural language processing to paraphrase the apparent meaning of the given input in this Logical Progression Engine.
Natural Language Selection Process - Concept: The key to the Natural Language Selection Process is careful analysis of the semantics and the statistical probability analysis of the resultant set of phrases in conveying a particular idea or concept. Athena seeks to wrestle the problem of the Natural language communication using the following core modules, which are inter-dependant:
1. Morphological Analyzer
2. Statistical Probability Analyzer
3. Logical Progression of Ideas or Concepts
Content selection is a domain dependent task that needs to be individually tailored to suit specific goals. From the user's perspective, it is the selection of relevant content that is of primary importance. One of the pre-requisites for designing an appropriate content selection framework is a vast knowledge base that contains in-depth information that can be used in a page.
A knowledge base contains knowledge about a specified topic - data on the subject and the relationship amongst the individual elements to each other. It is not simply a matter of spewing forth a stream of unrelated data triggered by a stimulus. The algorithm behind the NLS uses an inference mechanism to deduce what is expected of it from the supplied input and a pattern established from a statistical analysis. This is akin to an expert knowledge system where a domain specific knowledge base is combined with an inference engine to process the knowledge encoded in the knowledge base in order to respond to a user's request for advice.
Designing such an inference engine involves overcoming specific constraints for selecting text for each piece of information. This helps in developing a morphological analyzer that can be included in the natural language selection system, which will identify all information that is potentially relevant.
This information can be either filtered or augmented by later stages in the pipeline.
The first stage of the process is to identify pieces from the input that match verbatim to items within the database. These items are easy to identify and can be included in the output data. The rest of the input in text or semantic form has to be verbalized. This necessitates the use of a statistical selection technique.
Through the judicious use of a 'stop word' list of words or phrases, which can dilute the quality of the selection of the theme for the given text, and using statistical probability analysis of the implied semantics, Athena decides on the core set of keywords.
Some of the 'stop' words Athena uses:
a
the
of
on
in
rare
form
this
that
which
what
is
was
it
general
as
an
at
Sentence formation is the next step of generating a complete English sentence from a semantic representation of the chosen theme for the given set of words. Then it looks through its knowledge base seeking information relevant to the selected set of phrases and writes a web page using a customizable template. A built in lexicon of over a quarter million words in English helps framing the Natural language selection set of rules, a manageable task for Athena.
Athena:
In this beta version, we have amassed a collection of data relevant to Health and medicine. When you type sickle into your browser, as there is no page in the server named as 'sickle' the server passes on the processing to Athena (code name of the Logical Progression engine) which takes the semantic input and using a logical progression analysis and a statistical probability acquisition from the core data concludes rightly that you are seeking information on 'Sickle Cell Anaemia'.
This results in a page of about 2000 words with the following details:
1. Description of the disease - Sickle Cell Anemia
2. Manifestations of sickle cell Anemia
3. Evaluation of the condition
Then it follows (logical progression of the semantics) from the word 'Sickle Cell Anemia' to its natural corollary - Anemia. It adds a section on the causes of Anemia, the symptoms associated with anemia and some general self-help information.
In this page, Athena includes about 58 hyperlinks on some of the words and phrases about which the user may wish to expand upon. We list below the first 10 hyperlinks to show the relevancy of the chosen topic and the natural logical progression from one idea to another:
1. Sickle cell anaemia
2. red blood cells
3. Haemoglobin
4. SEIZURES
5. CONGESTIVE HEART FAILURE
6. PLEURISY
7. FBC
8. anaemia
9. bacterial
10. prognosis
In our second example we will type: cell
into our browser which renders a page of about 3000 words.
The word 'cell' connotes a broad array of meanings even inside health related issues. It can mean red blood cell or a cancerous cell. It can also include cell counts as in the case of 'White Blood Cell Count' or even a Polycythaemia Vera - a condition of enlarged spleen resulting in enhanced production of blood cells, by logical extension.
This time Athena deduces that you wanted to find some general information about cells - in particular about blood cells. So it delivers content starting from Sickle Cell Anemia, full blood count, causes of low blood cell count, diseases associated with blood cells, symptoms and general self-help information.
In our third example we enter Anaemia
We are presented with a page of about 3000 words containing the following information:
a. Anaemia and its Causes
b. Self Help section in the event noticeable of anemic conditions
c. Various forms of anemia - including our old friend - Sickle cell anemia
d. Some drugs and their direct effect on the causes of Anemia
The above page includes about 113 hyperlinks as the page contains some information about the possible interaction of drugs in the treatment of anemia or the implication of long-term use of some drugs resulting in Anemia.
Resolution of some language quirks:
One of the most complex problems for the development of a Natural Language Selection process is the enunciation of rule sets to handle some quirks of the English language. Abbreviations have become an indispensable part of written communication and Athena is equipped to handle abbreviations with aplomb.
Example:
http://www.targetwoman/athena/CT
will result in a page on CT-scanning or COMPUTERIZED AXIAL TOMOGRAPHY , whilst the following request for 'X' :
X
will make Athena to render a page titled as "X-Ray - Chest X-ray" with this snippet on the first line: The chest x-ray is the most frequently performed x-ray.
Application of Athena:
As the Athena is primarily intended to deliver content over the Internet, the potential for the deployment of Athena is enormous.
The key advantages of Athena over other traditional methodologies:
1. Consistent User experience
2. Consistent User interface for potentially thousands of pages with minimum development effort
3. Search-Engine Friendly delivery of pages. All the search engine robots will see a static HTML page instead of a plethora of dynamic pages with stifling parameters
4. Delivery of a page for almost any keyword associated with the chosen topic resulting possibly in thousands of search engine crawlable pages.
5. Template driven design resulting in minimum effort in development. Changing of meta tags and fine tuning of the pages for the ever changing algorithm targeting individual search engines can be easier than ever.
6. Human visitor friendly URLs in all the hyperlinks. Eg: http://athena.targetwoman/emphysema
http://athena.targetwoman/Smoking
http://athena.targetwoman/high blood pressure
In other words, you need not remember the parameters or a string of hieroglyphics after the domain name to point to a page. You simply type the medical condition, disease name or a drug name to see the information.
7. Highly user interactive - You can drill down to the particular segment of information that you seek in the minimum of clicks ever possible.