EXPLAINING MISSING ANSWERS TO TOP-K SQL QUERIES
ABSTRACT
Due to the fact that existing database systems are increasingly more difficult to use, improving the quality and the usabilityof database systems has gained tremendous momentum over the last few years. In particular, the feature of explaining why someexpected tuples are missing in the result of a query has received more attention. In this paper, we study the problem of explainingmissing answers to top-k queries in the context of SQL (i.e., with selection, projection, join and aggregation). To approach this problem,we use the query-refinement method. That is, given as inputs the original top-k SQL query and a set of missing tuples, our algorithmsreturn to the user a refined query that includes both the missing tuples and the original query results. Case studies and experimentalresults show that our algorithms are able to return high quality explanations efficiently.
EXISTING SYSTEM:
Explaining a null answer for a database query was set out by but the concept of why-not was first formally discussedin . That work answers a user’s why-not question on SelectProject-Join(SPJ) queries by telling her which query operator(s)eliminated her desired tuples. After that, this line of work hasgradually expanded. , the missing answers of SPJ and SPJA queries are explained by a data-refinement approach,i.e., it tells the user how the data should be modified (e.g., addinga tuple) if she wants the missing answer back to the result. In ,a query-refinement approach is adopted. The answer to a why-notquestion is to tell the user how to revise her original SPJA queriesso that the missing answers can return to the result. They definethat a good refined query should be (a) similar —have few “edits”comparing with the original query (e.g., modifying the constantvalue in a selection predicate is a type of edit; adding/removing ajoin predicate is another type of edit) and (b) precise — have fewextra tuples in the result, except the original result plus the missingtuples. In this paper, we adopt the query-refinement approach asour explanation model and also apply the above similarity andprecision metrics.
PROPOSED SYSTEM:
To address the problem of answering why-not questions ontop-k SQL queries, we employ the query refinement approach. Specifically, given as inputs the original top-k SQLquery and a set of missing tuples, this approach requires to returnto the user a refined query whose result includes the missingtuples as well as the original query results. In this paper, we showthat finding the best refined query is actually computationallyexpensive. Afterwards, we present efficient algorithms that canobtain the best approximate explanations (i.e., the refined query)in reasonable time. We present case studies to demonstrate oursolutions. We also present experimental results to show that oursolutions return high quality explanations efficiently. This paperis an extension of , which discussed answering why-notquestions on top-k queries in the absence of other SQL constructssuch as selection, projection, join, and aggregation.
CONCLUSION
In this paper, we have studied the problem of answering whynotquestions on top-k SQL queries. Our target is to give anexplanation to a user who is wondering why her expected answersare missing in the query result. We return to the user a refinedquery that can include the missing expected answers back to theresult. Our case studies and experimental results show that oursolutions efficiently return very high quality solutions. In futurework, we will study this issue on queries involving non-numericattributes.
REFERENCES
[1] S. Agrawal, S. Chaudhuri, and G. Das, “DBXplorer: A System forKeyword-Based Search over Relational Databases,” in ICDE, 2002, pp.5–16.
[2] H. Wu, G. Li, C. Li, and L. Zhou, “Seaform: Search-As-You-Type inForms,” in PVLDB, vol. 3, no. 2, 2010, pp. 1565–1568.
[3] J. Akbarnejad, G. Chatzopoulou, M. Eirinaki, S. Koshy, S. Mittal, D. On,N. Polyzotis, and J. S. V. Varman, “SQL QueRIE Recommendations,” inPVLDB, vol. 3, no. 2, 2010, pp. 1597–1600.
[4] M. B. N. Khoussainova, Y.C. Kwon and D. Suciu, “Snipsuggest: ContextAwareAutocompletion for SQL,” in PVLDB, vol. 4, no. 1, 2010, pp.22–33.
[5] A. Chapman and H. Jagadish, “Why not?” in SIGMOD, 2009, pp. 523–534.
[6] J. Huang, T. Chen, A.-H. Doan, and J. F. Naughton, “On the Provenanceof Non-Answers to Queries over Extracted Data,” in PVLDB, 2008, pp.736–747.
[7] M. Herschel and M. A. Hern´andez, “Explaining Missing Answers toSPJUA Queries,” in PVLDB, 2010, pp. 185–196.
[8] Q. T. Tran and C.-Y. Chan, “How to ConQueR Why-not Questions,” inSIGMOD, 2010, pp. 15–26.
[9] Z. He and E. Lo, “Answering Why-Not Questions on Top-K Queries,” inICDE, 2012.
[10] I. Md. Saiful, Z. Rui, and L. Chengfei, “On Answering Why-notQuestions in Reverse Skyline Queries,” in ICDE, 2013, pp. 973–984.
[11] Z. He and E. Lo, “Answering why-not questions on top-k queries,” IEEETrans. Knowl. Data Eng., vol. 26, no. 6, pp. 1300–1315, 2014.
[12] A. Motro, “Query Generalization: A Method for Interpreting Null Answers,”in Expert Database Workshop, 1984, pp. 597–616