AFRL-IF-RS-TR-2007-9 Final Technical Report January 2007 GENESIS: A FRAMEWORK FOR ACHIEVING SOFTWARE COMPONENT DIVERSITY University of Virginia Sponsored by Defense Advanced Research Projects Agency DARPA Order No. S472 APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. STINFO COPY AIR FORCE RESEARCH LABORATORY INFORMATION DIRECTORATE ROME RESEARCH SITE ROME, NEW YORK NOTICE AND SIGNATURE PAGE Using Government drawings, specifications, or other data included in this document for any purpose other than Government procurement does not in any way obligate the U.S. Government. The fact that the Government formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation; or convey any rights or permission to manufacture, use, or sell any patented invention that may relate to them. This report was cleared for public release by the Air Force Research Laboratory Rome Research Site Public Affairs Office and is available to the general public, including foreign nationals. Copies may be obtained from the Defense Technical Information Center (DTIC) (http://www.dtic.mil). AFRL-IF-RS-TR-2007-9 HAS BEEN REVIEWED AND IS APPROVED FOR PUBLICATION IN ACCORDANCE WITH ASSIGNED DISTRIBUTION STATEMENT. FOR THE DIRECTOR: /s/ /s/ MICHAEL J. HENSON, Capt, USAF WARREN H. DEBANY, Jr. Work Unit Manager Technical Advisor, Information Grid Division Information Directorate This report is published in the interest of scientific and technical information exchange, and its publication does not constitute the Government’s approval or disapproval of its ideas or findings. Form Approved REPORT DOCUMENTATION PAGE OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington Headquarters Service, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC 20503. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) JAN 2007 Final Jun 04 – Aug 06 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER GENESIS: A FRAMEWORK FOR ACHIEVING SOFTWARE 5b. GRANT NUMBER COMPONENT DIVERSITY FA8750-04-2-0246 5c. PROGRAM ELEMENT NUMBER 62301E 6. AUTHOR(S) 5d. PROJECT NUMBER S472 J.C. Knight, J.W. Davidson, D. Evans, A. Nguyen-Tuong and C. Wang 5e. TASK NUMBER SR 5f. WORK UNIT NUMBER SP 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of Virginia REPORT NUMBER 151 Engineers Way Charlottesville VA 22904-4740 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S) Defense Advanced Research Projects Agency AFRL/IFGB 3701 North Fairfax Drive 525 Brooks Rd 11. SPONSORING/MONITORING AGENCY REPORT NUMBER Arlington VA 22203-1714 Rome NY 13441-4505 AFRL-IF-RS-TR-2007-9 12. DISTRIBUTION AVAILABILITY STATEMENT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. PA# 07- 017 13. SUPPLEMENTARY NOTES 14. ABSTRACT The Genesis project sought to provide security through the diversification of software. A major weakness with current information systems is that they use software applications that are clones of each other; a major exploitable flaw in one implies a flaw in all other similarly configured software packages. Breaking this software monoculture was the goal of the bio-inspired diversity area of DARPA’s self-regenerative systems program. The Genesis project exceeded the program’s goal of producing 100 functionally- equivalent versions of software such that no more than 33 exhibited the same deficiency. This report presents an overview of the Genesis project, the current status of the Genesis Diversity Toolkit, and future opportunities for technical transfer and research. 15. SUBJECT TERMS Cyber Operations, Information Warfare, Information Assurance, Software Diversity, Monoculture 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF RESPONSIBLE PERSON ABSTRACT OF PAGES Capt Michael Henson a. REPORT b. ABSTRACT c. THIS PAGE 19b. TELEPHONE NUMBER (Include area code) U U U UL 119 Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std. Z39.18 Table of Contents 1 Introduction................................................................................................1 2 Genesis Overview.......................................................................................2 2.1 Genesis Diversity Techniques................................................................2 2.2 Genesis: Strata Virtual Machine............................................................3 2.3 Strong Instruction Set Randomization...................................................4 2.4 Calling Sequence Diversity....................................................................4 2.5 Genesis Diversity Toolkit (GDT) Evaluation........................................5 2.6 Genesis Toolkit Enhancements..............................................................8 2.7 Recommended Configuration................................................................9 3 Summary of Results.................................................................................10 3.1 Security Benefits of Genesis................................................................10 3.2 Genesis Diversity Toolkit Status.........................................................10 3.3 Patent Applications..............................................................................10 3.4 Technology Transfer............................................................................11 3.5 Other Results........................................................................................11 4 List of Major Publications.......................................................................12 4.1 Website................................................................................................12 5 Technology Transfer & Future Opportunities.....................................13 5.1 Anti-Tampering Applications..............................................................13 5.2 Recovery..............................................................................................13 5.3 Finer-grained Diversity........................................................................13 6 Conclusion.................................................................................................14 7 References..................................................................................................14 Appendix A: Instruction Set Randomization......................................................15 Appendix B: Calling Sequence Diversity.............................................................30 Appendix C: Genesis Fault Tree Analysis...........................................................44 Appendix D: Tamper Proofing..............................................................................53 Appendix E: Secretless Security through Diversity............................................58 Appendix F: PHPrevent – Web Application Security through Diversity........76 Appendix G: Derandomizing Attacks..................................................................88 i List of Figures Figure 1. Genesis Diversity Toolkit Configuration Panel 3 Figure 2. Strata Virtual Machine Architecture 3 Figure 3. Sample Genesis Fault Tree 5 Figure 4. Strata and Strata+ISR Overhead Normalized to Native Execution (SPEC) 7 Figure 5. Apache Overhead Normalized to Native Execution 7 Figure 6. Bind Overhead Normalized to Native Execution 7 Figure 7. Number of Concurrent Calls 8 Appendix A. Instruction Set Randomization Figure 1. Strata virtual machine virtualizing an application. 19 Figure 2. Runtime decryption and verification 21 Figure 3. Workflow for the binary rewriter Diablo 22 Figure 4. Diablo extension to support ISR 23 Figure 5. SDT overhead and SDT-ISR overhead normalized to native execution 25 Figure 6. Apache overhead normalized to native execution 26 Figure 7. Bind overhead normalized to native execution 26 Appendix B. Calling Sequence Diversity Figure 1. Vulnerable function / Contents of the stack 31 Figure 2. Overflowing the stack 32 Figure 3. Returning to a legitimate call site 33 Figure 4. Key transformation 33 Figure 5. Returning to a legitimate call site with key transformation 34 Figure 6. Key transformation for an indirect function call 36 Figure 7. Intermediate language trees for foo(arg,100) 37 Figure 8. Modified intermediate language trees for foo (arg,100) 37 Figure 9. Modified intermediate language trees for (*fp)() 38 Figure 10. Problematic indirect call sequence 38 Figure 11. Saving call sequence values from Figure 10 for later 39 Figure 12. Key transformation for setjmp() and longjmp() 40 Figure 13. Overhead for SPEC benchmark suite normalized to native execution 42 Appendix E. Secretless Security through Diversity Figure 1. N-Variant System Framework 59 Figure 2. Typical shared system call wrapper 68 Appendix F. PHPrevent - Web Application Security Figure 1. Typical web application architecture 79 Appendix G. Derandomizing Attacks Figure 1. Return attack 93 Figure 2. Jump attack 94 Figure 3. Incremental jump attack 95 Figure 4. Eliminating false positives 97 Figure 5. Extended attack 102 Figure 6. Micro VM 104 Figure 7. Guessing strategies 106 Figure 8. Time to acquire key bytes 109 Figure 9. Attempts per byte 109 ii 1 Introduction The overall goal of phase I of the Self-Regenerative System (SRS) program (DARPA BAA 03-44) was to develop technology for building military computing systems that could provide critical functionality at all times, in spite of damage caused by unintentional errors or attacks. A major problem today is that of our software monoculture. Critical infrastructure software applications such as web servers, database servers, routers, and name resolution servers to name only a few, are all shipped identically. An exploitable vulnerability present in one deployed software application strongly implies an exploitable flaw in all copies of that application. This situation provides adversaries with an overwhelming advantage and is very serious because it multiplies the impact of any vulnerability by the number of machines running the software that contains the vulnerability. Once a vulnerability is exposed, adversaries seek out machines that are using the software with which the vulnerability is associated and proceed to exploit the vulnerability. Thus, the software monoculture enables the spread of both worms, i.e., self- replicating malicious code, and attacks that target specific servers. Drawing inspiration from biological systems in which genetic diversity provides immunity against a broad range of disease, the Genesis project sought to reproduce the genetic diversity found in nature by deliberately and systematically introducing diversity into software components. The basic idea was that while the phenotype (functional behavior) of software components would be similar, the resulting genotypes would contain enough variations to protect software applications against a broad class of attacks, including both self-replicating and directed attacks. In the past, the application of diversity for critical systems has been severely limited by the fact that creating diverse versions has been attempted, for the most part, by producing the versions using traditional, resource intensive methods. Creating two diverse web servers, for example, involved actually writing both implementations. Clearly, this approach would not yield a large number of diverse versions unless unrealistic amounts of resources were available. The Genesis project sought machine transformation techniques to automate the task of creating large number of program variants. The success metric as specified in the SRS program was that of automatically producing 100 diverse but functionally equivalent versions of a software component such that no more than thirty-three versions of a component shared the same deficiency. We exceeded this goal through the use of novel program transformation techniques coupled with advances in virtual machine technology, with demonstrated good performance on a range of real-world and critical applications. 1 2 Genesis Overview In the Genesis approach, we took a biologically inspired approach to diversity in which we investigated the two fundamental aspects of computation, state and state change, and we introduce diversity systematically and comprehensively to both. In practice by “state” we mean the data upon which a computation operates and by “state change” we mean the changes effected by some interpreter (a hardware entity or a software interpreter) in response to a set of instructions. We took a very general view of these two notions so that some entities were viewed as part of a state at one point and as being involved in state change at a different point. For example, machine instructions were part of the operating state of a compiler, i.e., data, but they controlled an interpreter during program execution, i.e., instructions. Furthermore, we took a multi-hierarchical and composable view of diversity in which we combined transformations from different phases of a program’s lifecycle, from compile-time all the way to execution-time. The Genesis project was implemented as the Genesis Diversity Toolkit henceforth called the GDT. The GDT was a collection of compile-time, link-time, run-time, and post-processing tools that allowed diversification of C and C++ software. The Genesis toolkit included the following components: • Zephyr, a compiler infrastructure developed at the University of Virginia. • Diablo, an open source static binary rewriter developed at Ghent University in Belgium. • Strata, an application-level virtual machine developed at the University of Virginia, along with several modules to effect dynamic diversity techniques. 2.1 Genesis Diversity Techniques The GDT supported the following diversity techniques: • Address Space Randomization (ASR). ASR was a link-time option, whereby the static (uninitialized and initialized) data segments were offset by a random amount. This coarse- grained technique obfuscated the location of critical variables. • Stack Space Randomization (SSR). This technique randomized the padding between stack frames. • Simple Execution Randomization (SER). This technique used a simple XOR encoding of a binary executable. This was mainly a proof-of-concept implementation that has been deprecated by the development of Strong Instruction Set Randomization. • Strong Instruction Set Randomization (SISR). This technique protected applications against both known and unknown code-injection attacks. • Calling Sequence Diversity (CSD). This technique modified the calling convention of functions to incorporate a hidden extra argument whose value is both generated at run-time and dependent on the history of the calling context. This technique defended against return-to- libc attacks [Nergal01]. The GDT provided defense-in-depth by allowing application developers to select and compose among various techniques. Note that the first three techniques, ASR, SSR, and SER provided only a limited amount of entropy relative to SISR and CSD. However, attack code tends to be fragile and even small perturbations in the execution environment will thwart attacks. Figure 1 illustrates the various configuration options for the Genesis toolkit. Developers could compose various techniques, specify various configuration parameters, and generate an arbitrary number of software variants. In practice, these various options were set via standard build scripts, e.g., makefiles. 2 Figure 1. Genesis Diversity Toolkit Configuration Panel Next we provide an overview of the Strata Virtual Machine and its role in the implementation of Strong Instruction Set Randomization and Calling Sequence Diversity. 2.2 Genesis: Strata Virtual Machine At the core of our approach was Strata, a software Context SDT Virtual Machine dynamic translator (SDT) that implemented an Capture application-level virtual machine. Strata was a small, efficient run-time execution environment that hosted, New Cached? New PC Fragment monitored and ran applications. Strata could affect an executing program by injecting new code, modifying Yes some existing code, or controlling the execution of Fetch the program in some way. Decode Translate Strata dynamically loads an application and Context Finished? Next PC mediates application execution by examining and Switch Yes translating an application’s instructions before they No execute on the host CPU (Figure 2). Strata essentially Host CPU (Executing Translated Code from Cache) operates as a co-routine with the application that it is Figure 2. Strata Virtual Machine Architecture protecting. Translated application instructions are held in a Strata-managed cache called the fragment cache. The Strata virtual machine (VM) is first entered by capturing and saving the application context (e.g., program counter (PC), condition codes, registers, etc.). Following context capture, Strata processes the next application instruction. If a translation for this instruction has been cached, a context switch restores the application context and begins executing cached translated instructions on the host CPU. In the case of the GDT, Strata was used to support important run-time features of software diversity, including dynamic code encryption/decryption and calling sequence diversity. 3 2.3 Strong Instruction Set Randomization We provide a general overview of Instruction Set Randomization (ISR). A detailed description is provided in Appendix A. The main idea behind ISR for defending against any type of code-injection attack is to create and use a process-specific instruction set that is created by a randomization algorithm. Code injected by an attacker who does not know the randomization key will be invalid for the randomized processor thereby thwarting the attack. Such an approach is known as randomized instruction-set emulation (RISE) or instruction-set randomization (ISR) [Barrantes05, Kc03]. The basic operation of an ISR system is as follows. An encryption algorithm (typically XOR’ing the instruction with a key) is applied statically to an application binary to encrypt the instructions. The encrypted application is executed by an augmented emulator (e.g., Valgrind [Nethercote04] or Bochs [Lawton96]. The emulator is augmented to decrypt the application’s instructions before they are executed. When an attacker exploits a vulnerability to inject code, the injected code is also decrypted before emulation. Unless the attacker knows the encryption key/process, the resulting code will be transformed into, in essence, a random stream of bytes that, when executed, will raise an exception (e.g., invalid opcode, illegal address, etc.). The security of ISR in general depends on several key factors: the strength of the encryption process, protection of the encryption key, the security of the underlying execution process, and that the decrypted code will, when executed, raise an exception. The practicality of the approach is affected by the overheads in execution time and space introduced by the encryption and decryption process. Our implementation of ISR using the Strata Virtual Machine improved upon the prior art in three important ways: • We used a strong randomization algorithm—the Advanced Encryption Standard (AES). • We demonstrated that ISR using AES could be implemented practically and efficiently without requiring special hardware support. • Our approach detected malicious code before its execution. Previous approaches had relied on probabilistic arguments that execution of non-randomized foreign code would eventually cause a fault or runtime exception. 2.4 Calling Sequence Diversity While code-injections attacks constitute the overwhelming majority of attacks today, other forms of attacks exist that do not require the execution of foreign exploit code. For example, in a return-to-libc attack, an attacker supplies malicious arguments to existing library functions with disastrous consequences. For example, supplying “bin/sh” to the system() function will execute a shell and provide an attacker with full-featured access to the target host. The typical return-to-libc exploit is possible because an attacker is able to disrupt the intended control flow of the target program through manipulation of the return address (often through a buffer overflow vulnerability). Note that such an attack may be thwarted by the Address Space Randomization or Stack Space Randomization techniques. However, this style of attack critically depends on the attacker’s knowledge of the calling convention. Calling Sequence Diversity provides a secure calling convention that prevents unauthorized invocation of potentially malicious functions. Our approach to developing such a 4 calling convent was to require a hidden parameter that was checked by the called function. Since attackers do not know the value of this parameter, they cannot execute the function successfully. Strata was used to automatically and dynamically insert and check this random key to thwart return-to-libc attacks. For more details, refer to Appendix B, which incorporates a writeup of this technique. 2.5 Genesis Diversity Toolkit (GDT) Evaluation This section presents an overview of the security and performance evaluation of the Genesis toolkit. 2.5.1 Security Evaluation 1 Integrity Violation of code stream allows ‘injected code’ from an attack to achieve goal G in the Strata system and its application Integrity violation of code stream accomplishes G for attacker in SP Logical AND Integrity of code stream is Unintended compromised insatecrulcecomtmioepnnl tisssth ro eGfa m LoOgRical Inisje ecxteedc uctoedde Inejexcetecudt ecodd aes is Subtree B Subtree application level Reference Injected code is code executed as system Code is injected level code. OS/ into ~SP space Hardware fault in and executed by protected system SP code stream space integrity and/ or authorization Subtree A OS-Hardware CeoSxdcePeoc dsuinpetejasedtccr eetbe ayadm n SidnP to inEteirsrrLoo-eplnavretoeioocl uness s Code injected into SP space Subtree A inPj’esCc stoepddae ci netso inCtood Se’ sin sjepcatceeds iSninvcfttrorushaar elegvnutrcaumentk lrtcdne(iamosoenb)rertei ral(fsei sotbncy)rl’t et DoouMSvbeatlaerllcro-uFkcn ree vuOplntahaeItnephrPaprpwr bulciainictoly tiabmtfs ytav i moneusnoelxon tidnP sel itrsisnat epbinduil ittP yricmvpoicrmclfoeonaroIidacagnctncdeaigelehsjfieenicmcisdeed cthisi ssoeentOe si unrnt oS sftjSre acCctagrtoaecmRdtdhaPeoe e’ing s nt tuht oere Sad or upScDnaatrriey‘tecnaamolshsottd eiaapetgadod lo ndfiii etrcnni ianfpodijdoaegjmnr tesuucorcmacia oeslStFgnttlitneo ’dco jat. ier snn henuPaclt a .al t t tanit oii’ionssnn m POFm/poraSroanougl itlPic tencP:iec asoI rttnccumirohccousnfooienmc pOsrc nrtactsrbioraouootagieiediodttnccnmcssnenieh sft iteid se senes sot f Eprrreooxnceeecosus s wSFnsttreoerhitacttfaeesp rstaita s a nrhuICpucuoBttg c arohrhsepfocmeaoTrrrreex deei ceegSCn eei eqsshifdantgnco s reo di uatri sn r tt ea uor s vCeedoritw d biersei tfitose rne (ex. Short string Malicious literals stored by insider P in instruction cache contain attack code sequences, or help reverse engineer the encryption key) Figure 3. Sample Genesis Fault Tree To analyze and demonstrate the strength and soundness of Genesis, we performed several experiments in which we ran applications with known-vulnerabilities under control of Genesis. We then ran the associated exploits on hundreds of variants generated by the GDT. Example vulnerabilities included buffer overflows and format string vulnerabilities targeted towards both the heap and stack. The success rate for ISR (for code-injection attacks) and for Calling Sequence Diversity (for return-to-libc 5

DTIC ADA462289: Genesis: A Framework for Achieving Software Component Diversity PDF

0.76 MB

English

by Defense Technical Information Center

#additional_collections #dticarchive

Checking for file health...

Preview DTIC ADA462289: Genesis: A Framework for Achieving Software Component Diversity

AFRL-IF-RS-TR-2007-9 Final Technical Report January 2007 GENESIS: A FRAMEWORK FOR ACHIEVING SOFTWARE COMPONENT DIVERSITY University of Virginia Sponsored by Defense Advanced Research Projects Agency DARPA Order No. S472 APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. STINFO COPY AIR FORCE RESEARCH LABORATORY INFORMATION DIRECTORATE ROME RESEARCH SITE ROME, NEW YORK NOTICE AND SIGNATURE PAGE Using Government drawings, specifications, or other data included in this document for any purpose other than Government procurement does not in any way obligate the U.S. Government. The fact that the Government formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation; or convey any rights or permission to manufacture, use, or sell any patented invention that may relate to them. This report was cleared for public release by the Air Force Research Laboratory Rome Research Site Public Affairs Office and is available to the general public, including foreign nationals. Copies may be obtained from the Defense Technical Information Center (DTIC) (http://www.dtic.mil). AFRL-IF-RS-TR-2007-9 HAS BEEN REVIEWED AND IS APPROVED FOR PUBLICATION IN ACCORDANCE WITH ASSIGNED DISTRIBUTION STATEMENT. FOR THE DIRECTOR: /s/ /s/ MICHAEL J. HENSON, Capt, USAF WARREN H. DEBANY, Jr. Work Unit Manager Technical Advisor, Information Grid Division Information Directorate This report is published in the interest of scientific and technical information exchange, and its publication does not constitute the Government’s approval or disapproval of its ideas or findings. Form Approved REPORT DOCUMENTATION PAGE OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington Headquarters Service, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC 20503. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) JAN 2007 Final Jun 04 – Aug 06 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER GENESIS: A FRAMEWORK FOR ACHIEVING SOFTWARE 5b. GRANT NUMBER COMPONENT DIVERSITY FA8750-04-2-0246 5c. PROGRAM ELEMENT NUMBER 62301E 6. AUTHOR(S) 5d. PROJECT NUMBER S472 J.C. Knight, J.W. Davidson, D. Evans, A. Nguyen-Tuong and C. Wang 5e. TASK NUMBER SR 5f. WORK UNIT NUMBER SP 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of Virginia REPORT NUMBER 151 Engineers Way Charlottesville VA 22904-4740 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S) Defense Advanced Research Projects Agency AFRL/IFGB 3701 North Fairfax Drive 525 Brooks Rd 11. SPONSORING/MONITORING AGENCY REPORT NUMBER Arlington VA 22203-1714 Rome NY 13441-4505 AFRL-IF-RS-TR-2007-9 12. DISTRIBUTION AVAILABILITY STATEMENT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. PA# 07- 017 13. SUPPLEMENTARY NOTES 14. ABSTRACT The Genesis project sought to provide security through the diversification of software. A major weakness with current information systems is that they use software applications that are clones of each other; a major exploitable flaw in one implies a flaw in all other similarly configured software packages. Breaking this software monoculture was the goal of the bio-inspired diversity area of DARPA’s self-regenerative systems program. The Genesis project exceeded the program’s goal of producing 100 functionally- equivalent versions of software such that no more than 33 exhibited the same deficiency. This report presents an overview of the Genesis project, the current status of the Genesis Diversity Toolkit, and future opportunities for technical transfer and research. 15. SUBJECT TERMS Cyber Operations, Information Warfare, Information Assurance, Software Diversity, Monoculture 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF RESPONSIBLE PERSON ABSTRACT OF PAGES Capt Michael Henson a. REPORT b. ABSTRACT c. THIS PAGE 19b. TELEPHONE NUMBER (Include area code) U U U UL 119 Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std. Z39.18 Table of Contents 1 Introduction................................................................................................1 2 Genesis Overview.......................................................................................2 2.1 Genesis Diversity Techniques................................................................2 2.2 Genesis: Strata Virtual Machine............................................................3 2.3 Strong Instruction Set Randomization...................................................4 2.4 Calling Sequence Diversity....................................................................4 2.5 Genesis Diversity Toolkit (GDT) Evaluation........................................5 2.6 Genesis Toolkit Enhancements..............................................................8 2.7 Recommended Configuration................................................................9 3 Summary of Results.................................................................................10 3.1 Security Benefits of Genesis................................................................10 3.2 Genesis Diversity Toolkit Status.........................................................10 3.3 Patent Applications..............................................................................10 3.4 Technology Transfer............................................................................11 3.5 Other Results........................................................................................11 4 List of Major Publications.......................................................................12 4.1 Website................................................................................................12 5 Technology Transfer & Future Opportunities.....................................13 5.1 Anti-Tampering Applications..............................................................13 5.2 Recovery..............................................................................................13 5.3 Finer-grained Diversity........................................................................13 6 Conclusion.................................................................................................14 7 References..................................................................................................14 Appendix A: Instruction Set Randomization......................................................15 Appendix B: Calling Sequence Diversity.............................................................30 Appendix C: Genesis Fault Tree Analysis...........................................................44 Appendix D: Tamper Proofing..............................................................................53 Appendix E: Secretless Security through Diversity............................................58 Appendix F: PHPrevent – Web Application Security through Diversity........76 Appendix G: Derandomizing Attacks..................................................................88 i List of Figures Figure 1. Genesis Diversity Toolkit Configuration Panel 3 Figure 2. Strata Virtual Machine Architecture 3 Figure 3. Sample Genesis Fault Tree 5 Figure 4. Strata and Strata+ISR Overhead Normalized to Native Execution (SPEC) 7 Figure 5. Apache Overhead Normalized to Native Execution 7 Figure 6. Bind Overhead Normalized to Native Execution 7 Figure 7. Number of Concurrent Calls 8 Appendix A. Instruction Set Randomization Figure 1. Strata virtual machine virtualizing an application. 19 Figure 2. Runtime decryption and verification 21 Figure 3. Workflow for the binary rewriter Diablo 22 Figure 4. Diablo extension to support ISR 23 Figure 5. SDT overhead and SDT-ISR overhead normalized to native execution 25 Figure 6. Apache overhead normalized to native execution 26 Figure 7. Bind overhead normalized to native execution 26 Appendix B. Calling Sequence Diversity Figure 1. Vulnerable function / Contents of the stack 31 Figure 2. Overflowing the stack 32 Figure 3. Returning to a legitimate call site 33 Figure 4. Key transformation 33 Figure 5. Returning to a legitimate call site with key transformation 34 Figure 6. Key transformation for an indirect function call 36 Figure 7. Intermediate language trees for foo(arg,100) 37 Figure 8. Modified intermediate language trees for foo (arg,100) 37 Figure 9. Modified intermediate language trees for (*fp)() 38 Figure 10. Problematic indirect call sequence 38 Figure 11. Saving call sequence values from Figure 10 for later 39 Figure 12. Key transformation for setjmp() and longjmp() 40 Figure 13. Overhead for SPEC benchmark suite normalized to native execution 42 Appendix E. Secretless Security through Diversity Figure 1. N-Variant System Framework 59 Figure 2. Typical shared system call wrapper 68 Appendix F. PHPrevent - Web Application Security Figure 1. Typical web application architecture 79 Appendix G. Derandomizing Attacks Figure 1. Return attack 93 Figure 2. Jump attack 94 Figure 3. Incremental jump attack 95 Figure 4. Eliminating false positives 97 Figure 5. Extended attack 102 Figure 6. Micro VM 104 Figure 7. Guessing strategies 106 Figure 8. Time to acquire key bytes 109 Figure 9. Attempts per byte 109 ii 1 Introduction The overall goal of phase I of the Self-Regenerative System (SRS) program (DARPA BAA 03-44) was to develop technology for building military computing systems that could provide critical functionality at all times, in spite of damage caused by unintentional errors or attacks. A major problem today is that of our software monoculture. Critical infrastructure software applications such as web servers, database servers, routers, and name resolution servers to name only a few, are all shipped identically. An exploitable vulnerability present in one deployed software application strongly implies an exploitable flaw in all copies of that application. This situation provides adversaries with an overwhelming advantage and is very serious because it multiplies the impact of any vulnerability by the number of machines running the software that contains the vulnerability. Once a vulnerability is exposed, adversaries seek out machines that are using the software with which the vulnerability is associated and proceed to exploit the vulnerability. Thus, the software monoculture enables the spread of both worms, i.e., self- replicating malicious code, and attacks that target specific servers. Drawing inspiration from biological systems in which genetic diversity provides immunity against a broad range of disease, the Genesis project sought to reproduce the genetic diversity found in nature by deliberately and systematically introducing diversity into software components. The basic idea was that while the phenotype (functional behavior) of software components would be similar, the resulting genotypes would contain enough variations to protect software applications against a broad class of attacks, including both self-replicating and directed attacks. In the past, the application of diversity for critical systems has been severely limited by the fact that creating diverse versions has been attempted, for the most part, by producing the versions using traditional, resource intensive methods. Creating two diverse web servers, for example, involved actually writing both implementations. Clearly, this approach would not yield a large number of diverse versions unless unrealistic amounts of resources were available. The Genesis project sought machine transformation techniques to automate the task of creating large number of program variants. The success metric as specified in the SRS program was that of automatically producing 100 diverse but functionally equivalent versions of a software component such that no more than thirty-three versions of a component shared the same deficiency. We exceeded this goal through the use of novel program transformation techniques coupled with advances in virtual machine technology, with demonstrated good performance on a range of real-world and critical applications. 1 2 Genesis Overview In the Genesis approach, we took a biologically inspired approach to diversity in which we investigated the two fundamental aspects of computation, state and state change, and we introduce diversity systematically and comprehensively to both. In practice by “state” we mean the data upon which a computation operates and by “state change” we mean the changes effected by some interpreter (a hardware entity or a software interpreter) in response to a set of instructions. We took a very general view of these two notions so that some entities were viewed as part of a state at one point and as being involved in state change at a different point. For example, machine instructions were part of the operating state of a compiler, i.e., data, but they controlled an interpreter during program execution, i.e., instructions. Furthermore, we took a multi-hierarchical and composable view of diversity in which we combined transformations from different phases of a program’s lifecycle, from compile-time all the way to execution-time. The Genesis project was implemented as the Genesis Diversity Toolkit henceforth called the GDT. The GDT was a collection of compile-time, link-time, run-time, and post-processing tools that allowed diversification of C and C++ software. The Genesis toolkit included the following components: • Zephyr, a compiler infrastructure developed at the University of Virginia. • Diablo, an open source static binary rewriter developed at Ghent University in Belgium. • Strata, an application-level virtual machine developed at the University of Virginia, along with several modules to effect dynamic diversity techniques. 2.1 Genesis Diversity Techniques The GDT supported the following diversity techniques: • Address Space Randomization (ASR). ASR was a link-time option, whereby the static (uninitialized and initialized) data segments were offset by a random amount. This coarse- grained technique obfuscated the location of critical variables. • Stack Space Randomization (SSR). This technique randomized the padding between stack frames. • Simple Execution Randomization (SER). This technique used a simple XOR encoding of a binary executable. This was mainly a proof-of-concept implementation that has been deprecated by the development of Strong Instruction Set Randomization. • Strong Instruction Set Randomization (SISR). This technique protected applications against both known and unknown code-injection attacks. • Calling Sequence Diversity (CSD). This technique modified the calling convention of functions to incorporate a hidden extra argument whose value is both generated at run-time and dependent on the history of the calling context. This technique defended against return-to- libc attacks [Nergal01]. The GDT provided defense-in-depth by allowing application developers to select and compose among various techniques. Note that the first three techniques, ASR, SSR, and SER provided only a limited amount of entropy relative to SISR and CSD. However, attack code tends to be fragile and even small perturbations in the execution environment will thwart attacks. Figure 1 illustrates the various configuration options for the Genesis toolkit. Developers could compose various techniques, specify various configuration parameters, and generate an arbitrary number of software variants. In practice, these various options were set via standard build scripts, e.g., makefiles. 2 Figure 1. Genesis Diversity Toolkit Configuration Panel Next we provide an overview of the Strata Virtual Machine and its role in the implementation of Strong Instruction Set Randomization and Calling Sequence Diversity. 2.2 Genesis: Strata Virtual Machine At the core of our approach was Strata, a software Context SDT Virtual Machine dynamic translator (SDT) that implemented an Capture application-level virtual machine. Strata was a small, efficient run-time execution environment that hosted, New Cached? New PC Fragment monitored and ran applications. Strata could affect an executing program by injecting new code, modifying Yes some existing code, or controlling the execution of Fetch the program in some way. Decode Translate Strata dynamically loads an application and Context Finished? Next PC mediates application execution by examining and Switch Yes translating an application’s instructions before they No execute on the host CPU (Figure 2). Strata essentially Host CPU (Executing Translated Code from Cache) operates as a co-routine with the application that it is Figure 2. Strata Virtual Machine Architecture protecting. Translated application instructions are held in a Strata-managed cache called the fragment cache. The Strata virtual machine (VM) is first entered by capturing and saving the application context (e.g., program counter (PC), condition codes, registers, etc.). Following context capture, Strata processes the next application instruction. If a translation for this instruction has been cached, a context switch restores the application context and begins executing cached translated instructions on the host CPU. In the case of the GDT, Strata was used to support important run-time features of software diversity, including dynamic code encryption/decryption and calling sequence diversity. 3 2.3 Strong Instruction Set Randomization We provide a general overview of Instruction Set Randomization (ISR). A detailed description is provided in Appendix A. The main idea behind ISR for defending against any type of code-injection attack is to create and use a process-specific instruction set that is created by a randomization algorithm. Code injected by an attacker who does not know the randomization key will be invalid for the randomized processor thereby thwarting the attack. Such an approach is known as randomized instruction-set emulation (RISE) or instruction-set randomization (ISR) [Barrantes05, Kc03]. The basic operation of an ISR system is as follows. An encryption algorithm (typically XOR’ing the instruction with a key) is applied statically to an application binary to encrypt the instructions. The encrypted application is executed by an augmented emulator (e.g., Valgrind [Nethercote04] or Bochs [Lawton96]. The emulator is augmented to decrypt the application’s instructions before they are executed. When an attacker exploits a vulnerability to inject code, the injected code is also decrypted before emulation. Unless the attacker knows the encryption key/process, the resulting code will be transformed into, in essence, a random stream of bytes that, when executed, will raise an exception (e.g., invalid opcode, illegal address, etc.). The security of ISR in general depends on several key factors: the strength of the encryption process, protection of the encryption key, the security of the underlying execution process, and that the decrypted code will, when executed, raise an exception. The practicality of the approach is affected by the overheads in execution time and space introduced by the encryption and decryption process. Our implementation of ISR using the Strata Virtual Machine improved upon the prior art in three important ways: • We used a strong randomization algorithm—the Advanced Encryption Standard (AES). • We demonstrated that ISR using AES could be implemented practically and efficiently without requiring special hardware support. • Our approach detected malicious code before its execution. Previous approaches had relied on probabilistic arguments that execution of non-randomized foreign code would eventually cause a fault or runtime exception. 2.4 Calling Sequence Diversity While code-injections attacks constitute the overwhelming majority of attacks today, other forms of attacks exist that do not require the execution of foreign exploit code. For example, in a return-to-libc attack, an attacker supplies malicious arguments to existing library functions with disastrous consequences. For example, supplying “bin/sh” to the system() function will execute a shell and provide an attacker with full-featured access to the target host. The typical return-to-libc exploit is possible because an attacker is able to disrupt the intended control flow of the target program through manipulation of the return address (often through a buffer overflow vulnerability). Note that such an attack may be thwarted by the Address Space Randomization or Stack Space Randomization techniques. However, this style of attack critically depends on the attacker’s knowledge of the calling convention. Calling Sequence Diversity provides a secure calling convention that prevents unauthorized invocation of potentially malicious functions. Our approach to developing such a 4 calling convent was to require a hidden parameter that was checked by the called function. Since attackers do not know the value of this parameter, they cannot execute the function successfully. Strata was used to automatically and dynamically insert and check this random key to thwart return-to-libc attacks. For more details, refer to Appendix B, which incorporates a writeup of this technique. 2.5 Genesis Diversity Toolkit (GDT) Evaluation This section presents an overview of the security and performance evaluation of the Genesis toolkit. 2.5.1 Security Evaluation 1 Integrity Violation of code stream allows ‘injected code’ from an attack to achieve goal G in the Strata system and its application Integrity violation of code stream accomplishes G for attacker in SP Logical AND Integrity of code stream is Unintended compromised insatecrulcecomtmioepnnl tisssth ro eGfa m LoOgRical Inisje ecxteedc uctoedde Inejexcetecudt ecodd aes is Subtree B Subtree application level Reference Injected code is code executed as system Code is injected level code. OS/ into ~SP space Hardware fault in and executed by protected system SP code stream space integrity and/ or authorization Subtree A OS-Hardware CeoSxdcePeoc dsuinpetejasedtccr eetbe ayadm n SidnP to inEteirsrrLoo-eplnavretoeioocl uness s Code injected into SP space Subtree A inPj’esCc stoepddae ci netso inCtood Se’ sin sjepcatceeds iSninvcfttrorushaar elegvnutrcaumentk lrtcdne(iamosoenb)rertei ral(fsei sotbncy)rl’t et DoouMSvbeatlaerllcro-uFkcn ree vuOplntahaeItnephrPaprpwr bulciainictoly tiabmtfs ytav i moneusnoelxon tidnP sel itrsisnat epbinduil ittP yricmvpoicrmclfoeonaroIidacagnctncdeaigelehsjfieenicmcisdeed cthisi ssoeentOe si unrnt oS sftjSre acCctagrtoaecmRdtdhaPeoe e’ing s nt tuht oere Sad or upScDnaatrriey‘tecnaamolshsottd eiaapetgadod lo ndfiii etrcnni ianfpodijdoaegjmnr tesuucorcmacia oeslStFgnttlitneo ’dco jat. ier snn henuPaclt a .al t t tanit oii’ionssnn m POFm/poraSroanougl itlPic tencP:iec asoI rttnccumirohccousnfooienmc pOsrc nrtactsrbioraouootagieiediodttnccnmcssnenieh sft iteid se senes sot f Eprrreooxnceeecosus s wSFnsttreoerhitacttfaeesp rstaita s a nrhuICpucuoBttg c arohrhsepfocmeaoTrrrreex deei ceegSCn eei eqsshifdantgnco s reo di uatri sn r tt ea uor s vCeedoritw d biersei tfitose rne (ex. Short string Malicious literals stored by insider P in instruction cache contain attack code sequences, or help reverse engineer the encryption key) Figure 3. Sample Genesis Fault Tree To analyze and demonstrate the strength and soundness of Genesis, we performed several experiments in which we ran applications with known-vulnerabilities under control of Genesis. We then ran the associated exploits on hundreds of variants generated by the GDT. Example vulnerabilities included buffer overflows and format string vulnerabilities targeted towards both the heap and stack. The success rate for ISR (for code-injection attacks) and for Calling Sequence Diversity (for return-to-libc 5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.