[Big Data concept] (10) Give one example of Big Data application you know. Use the detailed example to explain each of the five Big V’s. If you are required to design a database system for this application, what are the best data models (relational, XML, RDF, among others) you would use to represent the data and why?
[Relational Data Model] (30) As of January 2017, the OpenFlights Airports Database (https://openflights.org/data.html) contains over 10,000 airports, train stations and ferry terminals spanning the globe. Each entry in the Airport table contains the following:
Airport ID Unique OpenFlights identifier for this airport.
NameName of airport. May or may not contain the City name.
Main city served by airport. May be spelled differently from Name.
Country or territory where airport is located. See countries.dat to cross-reference to ISO
3-letter IATA code. Null if not assigned/unknown.
4-letter ICAO code.
Decimal degrees, usually to six significant digits. Negative is South, positive is North.
Decimal degrees, usually to six significant digits. Negative is West, positive is East.
Hours offset from UTC. Fractional hours are expressed as decimals, eg. India is 5.5.
Daylight savings time. One of E (Europe), A (US/Canada), S (South America), O (Australia), Z
(New Zealand), N (None) or U (Unknown). See also: Help: Time
Tz database time zone
Timezone in “tz” (Olson) format, eg. “America/Los_Angeles”.
Type of the airport. Value “airport” for air terminals, “station” for train stations, “port” for
ferry terminals and “unknown” if not known. In airports.csv, only type=airport is included.
Source of this data. “OurAirports” for data sourced from OurAirports, “Legacy” for old data
not matched to OurAirports (mostly DAFIF), “User” for unverified user contributions. In airports.csv, only source=OurAirports is included.
(5) Consider the following terms: relation schema, relational database schema, domain, attribute, attribute domain, relation instance. Give what these terms are with the above Airport database. Give one small (4-5 tuples) instance of the Airport table.
(10) There are three databases in the OpenFlight dataset: Airport, Airline, and Route. Give the schema of these three databases and mark the primary keys, foreign keys and provide examples of functional dependencies you identified over the three tables. [You may draw a diagram to illustrate the schema, PKs, FKs and Fds]\
[FD inferencing] (10)
Recall Armstrong’s axioms.
Reflexivity rule: if Y ⊆ X then X → Y
Augmentation rule: if X → Y then XZ → YZ
Transitivity rule: if X → Y and Y → Z then X → Z
Give two examples for using Armstrong’s inference rules to induce new FDs from the set of FDs you designed in question 2 (b).
Prove the following inference rules also hold, using FD definition and Armstrong’s Axioms. a. decomposition rule: if X → YZ then: X → Y and X → Z
b. Pseudo transitivity: if X → Y and YW → Z then: XW → Z
(d) [Normalization] (5) Given a relation R(A1, A2, A3, A4), with three FDs A2, A3 → A4 ; A3, A4 → A1; A1, A2→ A3. Provide the 3NF and BCNF form of the schema and explain why.
[Relational Algebra] (20) Consider the following database schema: Movies (Title, Director, Actor);
Location (Theater, Address, Phone number); Schedule (Theater, Title, Time).
Express the following queries in relational algebra (select σ, project ∏ , Cartesian product X, join (theta-join))
-Q1: which theaters feature “Zootopia”?
-Q2: List the names and address of theaters featuring a film directed by Steven Spielberg.
-Q3: What are the address and phone number of the Le Champo theater?
-Q4: List pairs of actors that acted in the same movie. (* you want to use renaming on Movies and join the Movies with its copy Movie’).
[Join Operators] (40) This sets of questions test the understanding of basic database search operators. Consider a join ⋈!.#$%.&. We ignore the cost of output the result, and measure the cost with the number of I/Os. Given the information about relations to be joined below: Relation contains 20,000 tuples and has 10 tuples per block. Relation contains 100,000 tuples and has 10 tuples per block. Attribute is the primary key of . In total, 52 blocks are available in memory. Assume neither relation has any index.
(10) Describe a block nested join algorithm, Give the cost of joining and with a block nested loops join.
(15) Describe a sort-merge join algorithm. Give the cost of joining and with a sort-merge join.
(15) Describe a hash-join algorithm. Give the cost of joining and with a hash join.