Amazon.com logo
Enter keywords
 

How Fuzzy Query Works (part 2)

Let's look at the school administrator example once again, only this time, let's suppose that our school administrator works for a public school system where student attendance is used to determine how much federal funding the school receives. This adds a new dimension to what defines a "good" student. To our administrator, attendance is just as important as grade point average.

So she is forced to make yet another arbitrary decision about the definition of a "good" student. She decides that a "good" student is one who has a grade point average of at least 3.5 and has less than 10 absences for the year. In light of this, she devises the following query:

SELECT * FROM STUDENTS
WHERE (GPA >= 3.5) AND (ABSENCES < 10);

After a few moments, the database returns a list of all students who have grade point averages of 3.5 or higher and who have less than 10 absences for the school year.

NameGPAAbsences
Becky Springston4.03.0
Jeff Williams3.59.75
Leroy Jefferson3.754.5
Mike Salisbury4.01.0
Kevin Costner3.751.0
Shelly Whitman3.754.0
Barry Allen3.92.0
Billy Kidd4.09.75
John Conner4.07.0

This is what she asked for all right, but she notes that these students are not sorted in any particular order. Consequently, it is very difficult to see just who the best students are. After a few minutes of digging through her SQL manual, she discovers a way to sort the list of students using SQL.

So she puts together the following SQL query:

SELECT * FROM STUDENTS
WHERE (GPA >= 3.5) AND (ABSENCES < 10)
ORDER BY GPA DESC, ABSENCES ASC;

After a few moments, the database returns a list of all students who have grade point averages of 3.5 or higher and who have less than 10 absences for the school year. Furthermore the students are sorted in order from highest GPA to lowest. Within any group of students at a given GPA level, they are sorted by number of absences.

NameGPAAbsences
Mike Salisbury4.01.0
Becky Springston4.03.0
John Conner4.07.0
Billy Kidd4.09.75
Barry Allen3.92.0
Kevin Costner3.751.0
Shelly Whitman3.754.0
Leroy Jefferson3.754.5
Jeff Williams3.59.75

She notes that this list is better, but because GPA and attendance are equally important, this list still does not reflect the semantic intent of her query. She feels that Billy Kidd should not be higher on the list than Barry Allen. After all, there is only a tenth of a grade point difference in their grades, but Barry has much better attendance than Billy. Further, she notes that there are some students who in her opinion were "good" students but, for whatever reason, simply did not make the list.

So she pulls the file of one student who she is sure should be on the list but for some reason is not. After reviewing the student's file she finds that this particular student has perfect attendance (0 absences) but only has a GPA of 3.49. She pulls another record and discovers that the query left out a student who has a 4.0 GPA but has 10 absences. In her opinion, this student was a much better student than Jeff Williams, who did make the list.

What she realizes at this point is that because of the arbitrary limits she has set for determining "good" from not "good", some students who highly fit her idea of a "good" student were simply left out while other students who only barely fit her idea of a "good" student were included. After all, why is a student who has a 3.49 GPA and perfect attendance not considered a good student, while another student with a 3.50 GPA and 9.5 days of absence considered a good student? Does a GPA of 0.01 really make that much of a difference? If so, then why does a student who has a 4.0 GPA and 10 days absence get left out when another student who only has a GPA of 3.5 and 9.75 days of absence makes the cut? Does 0.25 days make that much of a difference?

What this shows is that the concept of a "good" student is a vague one and cannot be readily expressed in terms of classical or crisp logic. In our example, the school administrator probably didn't really "intend" to select students based on (GPA >= 3.5) and (ABSENCES < 10). It is far more likely that she intended to select students who had "GOOD" grades and "GOOD" attendance. This is where fuzzy logic comes to the rescue. Fuzzy logic is well suited to expressing the intent of a database query when the semantics of the query are rather vague.




For more information E-Mail: FuzzyQuery@Sonalysts.com

Fuzzy Systems Solutions
Sonalysts Inc.
215 Parkway North
Waterford, CT 06385
Tel: 800-526-8091 Fax: 860-447-8883

© 2003 Sonalysts Inc. All rights reserved.