Learning algorithms generally require that database be described in terms of a set of measurable features. Feature extraction is the process of driving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency and allow higher classification accuracy. This paper presents a proposed algorithm for feature extraction from real world (medical) database. The algorithm performs two simultaneous stages. The first stage (rough pruning) deals with memo (text) attributes, abstracts it with the help of specific domain dictionary then drops the less information attribute(s) via probability measure of the values inside each attribute. In the second stage (fine pruning) the set of relevant attribute is determined via the calculation of certain evaluation function. This function depends on the calculation of the correlation and conditional probability between attributes-to-attribute and attribute-to-target. The paper also presents a proposed search algorithm that reduces the search space linearly. |