Abstract |
The web and social media contains millions of pages whose text review objects
or events. It will be very helpful if one benefits of other's published opinions and
experiences before taking decisions concerning these entities. Also, for opinions to be
comprehensive, analysis should provide the attitude for the entity as well as its basic
aspects or features. In this work, we propose a domain independent approach that
extracts both of the entity aspects and their attitudes for Arabic reviews. The proposed
approach does not exploit predefined sets of features, nor domain ontology hierarchy.
Instead we add sentiment tags on the pattern and root levels of Arabic lexicon and
used these tags to extract the opinion carrying words and their polarities.
The proposed approach relies on dividing the opinion mining task into three
dependent subtasks at word, sentence, and document levels. The word level concerns
with extracting the opinion carrying, negation, and intensifier words. The sentence
level concerns with extracting the candidate aspects using syntactic patterns for
Arabic sentences and based on the opinion-carrying words. The document level
aggregates the lemma forms of the extracted aspects to summarize the entity
orientation. The nondeterministic nature of some roots used in different ways in
different domains affects the degree of sentiment role certainty. A certainty factor is
proposed to express the percentage of orientation certainty of each aspect and
declaring its effect on the system accuracy.
The proposed system is evaluated on the entity-level using a dataset of 500 movie
reviews with accuracy 96%. Then the system is evaluated on the aspect-level using
200 Arabic reviews in different domains (Novels, Products, Movies, Football game
events and Hotels). It extracted aspects, at 89% recall and 85% precision with respect
to the aspects defined by domain experts. This proves that the proposed system can be
used for generic domains beyond the limited coverage of existing ontologies. |