My Prospectus
From CSWiki
Thesis: Correctness, Completeness & Freshness : Methods to Ensure Security & Integrity of Outsourced Databases
Contents |
[edit] What I am planning to accomplish
[edit] Brief Thesis specification
Thesis Topic 'Correctness, Completeness & Freshness : Methods to Ensure Security & Integrity of Outsourced Databases'
Security of an outsourced database is an important concern as the database is in control with a third party(foreign party) hosting the database.There is a sheer possibility of the third party manipulating the database or replying with false and incomplete results when the database is queried.
The term security involves integrity and authenticity of the database. Authenticity(correctness) means a querier should get original, unchanged tuples while Integrity (completeness) makes sure that the result set includes all the original tuples matching the query predicate.Freshness emphasizes the result set to appear from the recent snapshot of the database.
I am working on providing completeness,correctness & freshness to outsourced databases and testing each of these under varying circumstances such as static database versus dynamic database, single attribute query versus multiple attribute queries, range selection versus single tuple query, one dimensional search attribute versus multidimensional search.
[edit] Issues to be investigated
Correctness :
The existing schemes to ensure correctness has let in abundant scope for improvements as they involve storing signatures of all the tuples and when the database is queried for a particular attribute the result set includes all the tuples and its corresponding signatures. This scheme is tedious as verifying the signature of each individual tuple consumes a large amount of time for the querier. This method is inefficient in terms of computation time and cost for the querier.
My thesis work is oriented towards aggregating the signatures of all individual tuples into one single signature. Although there are existing models supporting aggregate signature scheme, it has been of little use as each individual tuple's signature is yet to be stored leading to increased storage space and bigger verification object. I would work on the concept of allocating a single signature for a set of tuples (based on attribute value). This will reduce the bandwidth utility and storage space at the server.When the database is queried, the resulting result set should contain the aggregated signature along with the matching tuples. Now verifying this single aggregated signature is equivalent to verifying the signatures of all tuples in the result set. Moreover, the size of the aggregated signature is same as the individual signature.
One of the possible methods to aggregate the signatures of individual tuples is by BGLS scheme. If G1 is cyclic additive group with generator g1, G2 is cyclic multiplicative group and e is bilinear map such that e: G1*G1->G2. A bilinear mapping
e: G1*G1->G2 where |G1|=|G2|.
Now the BGLS signature scheme requires the use of a full domain hash function h(): {0,1}*->G1 that maps the binary strings to non zero points in G1.Key generation involves picking a random x belonging to Zp, and computing v = xg1.
Signing a message m involves computing H = h(m), where H belongs to G1 and K = xH. The signature is K. To verify a signature one needs to compute H=h(m) and check that e(k,g1) = e(H,v)
Now the task is to find a scheme that enables us to aggregate all the signatures of the tuples in the result set in to one signature called aggregate signature.
Completeness
One of the possible methods to achieve completeness is by chaining the signatures of all individual tuples. The current method to chain tuple signatures is by using a cryptographic hash function such as SHA (secure hash algorithm)where we consider all the IPR(immediate predecessors) along each searchable dimensions. However, this scheme is tedious as there is a need to collect the hashes of IPRs along each dimension. The signature scheme used here is as shown below
----------------------------------(1)
My thesis work is oriented towards modifying the existing signature chaining scheme so that the modified chain contains hashes without using the IPRs. This would reduce the size of the verification object as well as the computation overhead for the querier. One of the possible ways to achieve this is by including only the boundary tuples instead of the IPRs along each dimension. For example in the diagram shown below if Ra to Rb are the resultant tuples then to ensure completeness we include the boundary tuples R(a-1) and R(b+1), instead of Ru & Rv (IPRs of that dimension) Then the resulting verification object would have tuples {(Ra...Rb),(R(a-1),R(b+1)) & Ks } where 'Ks' is the aggregated signature(to ensure correctness).
Freshness
One of the important questions that arises here is freshness a part of database security? Yes, freshness is part of database security. It prevents the server from replaying the old signature chains.
My thesis work is oriented towards allowing the server to return another entity (apart from tuples, signature chain & aggregated signature) say 'Rf'in order to verify freshness of the result set.One of the possible approaches to achieve this is by finding the path from the result set to the root of the Merkle Hash Tree(MHT). Upon receiving a query reply, querier verifies freshness by recomputing the root of the MHT and validating it by verifying the signature of the root which is signed by the database owner (not the third party).For example consider the MHT tree below. Now if the result set includes value 3 (node 1) then the freshness can be computed by verifying the path from node 1 to the root.This will reveal the root signature which can be compared with the public key of the database owner.
[edit] Why this is academically interesting
[edit] Intended audience
This work attracts data owners outsourcing their databases to third party. The immediate concern of the data owners is their data is not manipulated by the host. More importantly, when the outsourced database is queried the result should be from the original data and should include all the tuples matching the query predicate.
To sum up, the intended audiences would be third party hosting the database, data owners and related clients.
[edit] Why this is MS-level work
One of the important reasons for this to be at master's level is that none of the existing works/ models ensure completeness and freshness issue(though correctness has achieved to certain extent). I feel this is the right time to come up with a sophisticated model that ensures absolutely security of outsourced databases.There is yet a lot of scope for research concerning this area. Interestingly, none of the existing works/ technologies guarantee 100% security to outsourced databases.
Difficulties/ hardships involved: A method/ algorithm perfect for one kind of database (hierarchical, relational, network model, post relational, object, semantic etc.) may not really work well for another type of database. This is one of the main issues/ problems in developing a work that ensures the correctness and completeness guarantees.
To come up with a thesis paper describing a method that improves correctness, completeness and freshness issues, one has to have a clear knowledge & understanding of
- 1) Database : creating, maintaining, accessing , updating & deleting, replication, architecture and so on.
- 2) Cryptography : Security Attacks (snooping, masquerading,replay attack, repudiation. Security Mechanisms(digital signature, access control), encipher, decipher.
- 3) DES/ AES: Basic structure, building elements, round key generation process, analysis.
- 4) Networking : Various network layers, protocols, functions and operating mechanisms of each layer.
[edit] Previous work
There are two important works prior to DSAC that has been helpful to ensure security and integrity of outsourced databases,namely Authenticated Data Structures and VB Tree Approach.
Authenticated Data Structures : This approach 1)demonstrates how to construct efficient and compact verification objects if a pre-computed authenticated data structure for that query type exists.2)Instead of using standard MHTs (Merkle Hash Trees) as authenticated dictionaries, balanced and I/O efficient data structures, such as B-Trees are used.
Limitations :1)There is a need to pre-compute and store a potentially large number of authenticated data structures in order to efficiently answer queries.2) Without pre-computed trees, the AuthDS cannot provide small verification objects3) Without pre-computed treesfor each sort-order, it becomes impossible to provide completeness of query replies.
VB Tree Approach:In this approach, a trusted central server outsources parts of the database to a proxy servers situated at the edge of the network.Data structure used here is VB tree. A VB tree is modified MHT(Merkle Hash Tree) and is built using a B-Tree where instead of signing only the root the leaf nodes as well as internal nodes are also signed.
Limitations : 1)Only Provides a proof of correctness and does not address completeness. 2) Uses 16-byte signed digest to sign the data. 16 byte signed digest is insecure as there is not cryptographically efficient.3) VB tree approach is expensive in terms of VO(verification object) verification time for querier.
Previous works to ensure correctness issues:
- 1)"Encrypted Database Integrity in Database Service Provider Model(CECS'02 IFIP WCC, 2002) by H Hacigumus, B Iyer & S Mehrotra
- 2) Authenticating Query Results in Edge Computing by H Pang & K-L Tan
- 3) Authentic third party data publication by P. Devanbu, M. Gertz, C. Martel and S.G Stubblebine, examined integrity issues in outsourced databases and suggested solutions using AuthDS method. Another work
- 4) Authentication and integrity in outsourced databases by Myketlum, M. Narasimha and G Tsudik, investigated the notion signature aggregation which enables bandwidth and computation efficient integrity verification of query replies.
However, these works only ensures a part of the security issues concerned with outsourced databases. Only query correctness is ensured in these works.Another concept that is missing in the previous work is query result freshness. I intend to work on DSAC (digital signature aggregation and chaining) that ensures both correctness and completeness of data queries.
[edit] Extending Previous work
- 1) AuthDS approach addresses the issue of correctness to certain extent. However, it involves pre-computing & storing a large number of data structures. This would result in a large VO and becomes tedious to prove completeness.Hence forth, I would rather work on summing up all the signatures of each individual tuple in the result set. Now, authenticating the aggregate signature is equivalent to authenticating the signatures of each tuple.
- 2) Modifying the signature scheme used in (1) (shown above) to eliminate IPRs and use only the boundary values to ensure completeness.
- 3) Will work towards completeness and freshness issues related to outsourced databases. Effort is towards making the existing signature schemes more efficient in terms of computation costs, query time and bandwidth utility & efficiency.
- 4) The signature schemes and algorithms I would develop would be tested for different database types and under different test environments such as single attribute query versus multiple attribute query.Circumstances like frequency of updates>> frequency of query and vice versa.
- 5) I will emphasize more on dynamic databases and tuple inserts and delete operations. My main goal is to make the correctness and completeness and freshness issues more efficient in terms of computation and bandwidth for the querier.
- 6) Testing with more advanced query types.
[edit] Anticipated challenge(s)
[edit] Challenge(s)
- Aggregated Signature can be modified by the host by simply changing the values of cyclic multiplicative group or addtive groups G1 & G2.
- Tuples are constantly added and deleted in dynamic databases. So there is a possibility of changes in boundary tuples. Change in boundary values {R(a-1) & R(b+1)} will lead to incorrect signature chains.
- Newly added tuples may be not appear in the result set due to delay in updating the MHT.
- VO(verification object) size may exceed the maximum allowable VO size on the addition of the 4th entity (freshness MHT).
- Tuples that are not relevant to the query predicate may appear in the result set (such as boundary values). Now it is required that only the relevant tuples are displyaed in plain text and others as hash values.
- Correctness and completeness issues that works good with simple queries may not be the same with complicated query types such as range queries or multidimensional or multi attribute queries.
- The model developed to ensure security and integrity may show varying results with different kinds of databases hierarchical, relational, post relational, network, object.
- Providing correctness, completeness, freshness to static databases is very simple as compared to dynamic databases which includes tuple insert and delete operations.
- For Dynamic Databases, as tuples are inserted and deleted over time, it would involve additional operations such as re-balancing the B trees and re calculating the signatures of all trees.
[edit] Anticipated approach to each challenge
- Considering other signature schemes (instead of BGLS) such as condensed RSA signature scheme as the resulting output in RSA produces a fixed length output. This will avoid the VO size overflow.
- At this beginning phase, Testing the new security approach with all possible database types.
- Getting more deeper and understanding the previous works.
[edit] What I bring to this work
[edit] My relevant background and experience
- I have pursued CS 481 class in fall 2007. The concepts thought in CS481 forms the base to this thesis work. CS 481 deals with various security issues and different signature schemes. Concepts of MHT, RSA/ DSA signature shemes & SHA hash functions are very helpful with respect to this thesis work.
- CS481 class helps in understanding various types of security attacks (masquadering, replaying, repudiation, snooping and so on) and security mechanisms (enciphering,data integrity, digital signature, traffic padding, routing control, access control and so on) to overcome them.
- Have taken CS 386 class which teaches concepts of set operations, trees, nodes, automata, types of automata and so on.
- I have worked on database systems (postgres) with JDBC technology. This helps me to understand how a database is created and how it is accessed and maintained. The concepts of tuple, chains, signatures, aggregation concepts are well understood.
- Current(winter 2008) CS 590 class helps me building a thesis prospectus defining my goals & anticipated challenges.
[edit] What I find interesting about this work
My interest lies in crytography and related topics. I would like to develop a model that would produce a result set such that it verifies all the 3 important requirements of database security namely, correctness, completeness & freshness. If my thesis work happens to better than the previous works (aggregated signature to reduce verification time for the querier, including boundary tuples instead of IPRs) then this could bring in a considerable amount of relief to data owners.
[edit] References
- In regular consultation with thesis adviser Prof. Huiping Guo.
- I have referred to similar papers on correctness, completeness & freshness issues. Namely, 1)Encrypted Database Integrity in Database Service Provider Model(CECS'02 IFIP WCC, 2002) by H Hacigumus, B Iyer & S Mehrotra 2) Authenticating Query Results in Edge Computing by H Pang & K-L Tan 3) Authentication and integrity in outsourced databases by Myketlum, M. Narasimha and G Tsudik

