EXPLAINING MISSING ANSWERS TO TOP-K SQL QUERIES

EXPLAINING MISSING ANSWERS TO TOP-K SQL QUERIES

ABSTRACT

Due to the fact that existing database systems are increasingly more difﬁcult to use, improving the quality and the usabilityof database systems has gained tremendous momentum over the last few years. In particular, the feature of explaining why someexpected tuples are missing in the result of a query has received more attention. In this paper, we study the problem of explainingmissing answers to top-k queries in the context of SQL (i.e., with selection, projection, join and aggregation). To approach this problem,we use the query-reﬁnement method. That is, given as inputs the original top-k SQL query and a set of missing tuples, our algorithmsreturn to the user a reﬁned query that includes both the missing tuples and the original query results. Case studies and experimentalresults show that our algorithms are able to return high quality explanations efﬁciently.

EXISTING SYSTEM:

Explaining a null answer for a database query was set out by but the concept of why-not was ﬁrst formally discussedin . That work answers a user’s why-not question on SelectProject-Join(SPJ) queries by telling her which query operator(s)eliminated her desired tuples. After that, this line of work hasgradually expanded. , the missing answers of SPJ and SPJA queries are explained by a data-reﬁnement approach,i.e., it tells the user how the data should be modiﬁed (e.g., addinga tuple) if she wants the missing answer back to the result. In ,a query-reﬁnement approach is adopted. The answer to a why-notquestion is to tell the user how to revise her original SPJA queriesso that the missing answers can return to the result. They deﬁnethat a good reﬁned query should be (a) similar —have few “edits”comparing with the original query (e.g., modifying the constantvalue in a selection predicate is a type of edit; adding/removing ajoin predicate is another type of edit) and (b) precise — have fewextra tuples in the result, except the original result plus the missingtuples. In this paper, we adopt the query-reﬁnement approach asour explanation model and also apply the above similarity andprecision metrics.

PROPOSED SYSTEM:

To address the problem of answering why-not questions ontop-k SQL queries, we employ the query reﬁnement approach. Speciﬁcally, given as inputs the original top-k SQLquery and a set of missing tuples, this approach requires to returnto the user a reﬁned query whose result includes the missingtuples as well as the original query results. In this paper, we showthat ﬁnding the best reﬁned query is actually computationallyexpensive. Afterwards, we present efﬁcient algorithms that canobtain the best approximate explanations (i.e., the reﬁned query)in reasonable time. We present case studies to demonstrate oursolutions. We also present experimental results to show that oursolutions return high quality explanations efﬁciently. This paperis an extension of , which discussed answering why-notquestions on top-k queries in the absence of other SQL constructssuch as selection, projection, join, and aggregation.

CONCLUSION

In this paper, we have studied the problem of answering whynotquestions on top-k SQL queries. Our target is to give anexplanation to a user who is wondering why her expected answersare missing in the query result. We return to the user a reﬁnedquery that can include the missing expected answers back to theresult. Our case studies and experimental results show that oursolutions efﬁciently return very high quality solutions. In futurework, we will study this issue on queries involving non-numericattributes.

REFERENCES

[1] S. Agrawal, S. Chaudhuri, and G. Das, “DBXplorer: A System forKeyword-Based Search over Relational Databases,” in ICDE, 2002, pp.5–16.

[2] H. Wu, G. Li, C. Li, and L. Zhou, “Seaform: Search-As-You-Type inForms,” in PVLDB, vol. 3, no. 2, 2010, pp. 1565–1568.

[3] J. Akbarnejad, G. Chatzopoulou, M. Eirinaki, S. Koshy, S. Mittal, D. On,N. Polyzotis, and J. S. V. Varman, “SQL QueRIE Recommendations,” inPVLDB, vol. 3, no. 2, 2010, pp. 1597–1600.

[4] M. B. N. Khoussainova, Y.C. Kwon and D. Suciu, “Snipsuggest: ContextAwareAutocompletion for SQL,” in PVLDB, vol. 4, no. 1, 2010, pp.22–33.

[5] A. Chapman and H. Jagadish, “Why not?” in SIGMOD, 2009, pp. 523–534.

[6] J. Huang, T. Chen, A.-H. Doan, and J. F. Naughton, “On the Provenanceof Non-Answers to Queries over Extracted Data,” in PVLDB, 2008, pp.736–747.

[7] M. Herschel and M. A. Hern´andez, “Explaining Missing Answers toSPJUA Queries,” in PVLDB, 2010, pp. 185–196.

[8] Q. T. Tran and C.-Y. Chan, “How to ConQueR Why-not Questions,” inSIGMOD, 2010, pp. 15–26.

[9] Z. He and E. Lo, “Answering Why-Not Questions on Top-K Queries,” inICDE, 2012.

[10] I. Md. Saiful, Z. Rui, and L. Chengfei, “On Answering Why-notQuestions in Reverse Skyline Queries,” in ICDE, 2013, pp. 973–984.

[11] Z. He and E. Lo, “Answering why-not questions on top-k queries,” IEEETrans. Knowl. Data Eng., vol. 26, no. 6, pp. 1300–1315, 2014.

[12] A. Motro, “Query Generalization: A Method for Interpreting Null Answers,”in Expert Database Workshop, 1984, pp. 597–616

EXPLAINING MISSING ANSWERS TO TOP-K SQL QUERIES

Recent Post

Project Categories