Project 2: SQL Solution

$35.00 $30.80




Use of postgres is highly recommended. Other DBMS’s may be used for development but the answers must be able to run against postgres server in the lab.


Part A: Querying the Sales Database [50%]


The Sales database tables are available under “hw2” schema on the lab postgres system (under cs564instr database). You are encouraged to create views in your own schema to make it easier to access the hw2 tables.


[The tables can also be created using the script available in ~shatdal/lab/Sales.sql. If you are using postgres on your personal machine, to load the database, simply type “psql -f Sales.sql” on the command prompt or “\i Sales.sql” inside psql. It would create a schema hw2 in your postgres DBMS.]


The hw2 schema has the following 4 tables. The key of each table is underlined and the foreign keys are also mentioned:


  • Holidays(WeekDate, IsHoliday)


  • Stores(Store, Type, Size)
  • TemporalData (Store, WeekDate, Temperature, FuelPrice, CPI, UnemploymentRate)
    • Store is a foreign key referencing Stores (Store).
    • WeekDate is a foreign key referencing Holidays (WeekDate).
  • Sales(Store, Dept, WeekDate, WeeklySales)
    • Store is a foreign key referencing Stores (Store).
    • WeekDate is a foreign key referencing Holidays (WeekDate).
    • (Store, WeekDate) is a foreign key referencing TemporalData (Store, WeekDate)



Write SQL queries over the given schema that obtain the answers to the following questions:


  1. Which stores had the largest and smallest overall sales during holiday weeks?


  1. Get the stores at locations where the unemployment rate exceeded 10% at least once but the fuel price never exceeded 4.


  1. How many non-holiday weeks had larger sales than the overall average sales during holiday weeks?


  1. Get the total sales per month, and its contribution to total sales (across months) overall for each type of store.


  1. Which stores have had sales in every department in that store for every month of at least one calendar year among 2010, 2011, and 2012?


  1. For each of the 4 numeric attributes in TemporalData, are they positively or negatively correlated with sales and how strong is the correlation? Your SQL query should output an instance with the following schema with 4 rows:


Output6 (AttributeName VARCHAR(20), CorrSign char(1), CorrValue (float))


e.g., (Temperature, -, -0.5) In your query, the values of AttributeName can be hardcoded string literals, but the other values must be computed automatically using SQL queries over the given database instance.


  1. Which departments contribute to at least 5% of store sales across for at least 3 stores? List the departments and their average contribution to sales across the stores.


  1. Get the top 10 departments overall ranked by total sales normalized by the size of the store where the sales were recorded.


  1. For the top 10 departments (in above query, #8), find the 3-monthly moving average and cumulative total of sales.


  1. The accounting department has asked for the following report. Write a SQL Query that would most closely produce the needed report. Quarters are defined traditionally, with Jan,Feb,March being Q1, etc.


Year Quarters Store Type A Store Type B Store Type C
    Sales Sales Sales
2010 Q1
2010 Q2
2010 Q3
2010 Q4
2012 Q4



Note: use of LIMIT N; feature in postgres is not allowed to constrain number of rows returned.




PART B: Sampling Application [50%]


Since postgres (and most DBMS’s) don’t allow sampling, the goal is to create a JDBC application that would let user create samples of data in the DBMS. The Sales data from Part A could be used for developing/testing the application. You would use sampling without replacement.


The application should:


  1. The JDBC connection should be to the lab postgres server. Instructions are posted on the webpage as to how to connect to lab postgres server. You may optionally also connect to your own hosted postgres for development/testing. Note that lab postgres can only be connected to from lab linux machines because of security protocols.


  1. Accept a table name or a query (prompts can distinguish the two if needed)


  1. Ask for how many sample rows are desired


  1. Ask if the user wants a table created for the sampled rows (instead of being returned). [These tables could be sampled in next iteration.]


  1. Fetch/insert exactly that number of random samples from the table or query result


  1. Allow the user to reset the seed of the random number generator


  1. Errors may be given for any syntax error in queries or invalid table entries


  1. If number of samples requested is greater than rows in the table/query, all rows should be returned (or inserted in the sample table) and a message should be given noting that fact.
  2. Ask if user wants more samples (or quit)


The application MAY NOT fetch the entire table or query result and do the sampling work in the application itself. (That is, it is assumed that the original table is too large to fit in memory.)


You may create additional tables (and insert rows) in the database (under your schema).


You may execute any query you like in the database.


Hint: One way to number all rows in a table is to use the “row_number()” ordered-analytic function.


The algorithm for random sampling of N rows (from Knuth: Art of computer programming,


volume 2: semi-numerical algorithms) is on the last page.




You are required to submit a zipped folder with the following contents:


  1. A “.sql” file per question. Name your files as “query<number>.sql”, e.g., the file “query2.sql” is for question 2 (of part A). These would contain the SQL and the result rows.


  1. The JDBC app (both java file and executable). Show the result of sampling 10 rows from each of the tables in part A. Additional testing/validation would be done for grade.



  1. txt with your group members’ name, CS logins and Wisc ids. and describe any assumptions etc made in the application that are not in the documentation of the code.