Project Summary (CBOOP/X)

[b][red]This message was edited by 684867 at 2005-4-3 9:57:45[/red][/b][hr]
[hr]
Project: [blue] Component Based Object Oriented Programming
(CBOOP)[black]

SubProject: [blue]X Programming Language[black]

Author: [blue]Sam Caldwell
(c) March 2005.[black]
[hr]

[b][blue]INTRODUCTION: Problem Statement.[/b][black]

We begin with the intent to create a new programming paradigm that extends and improves on the Object Oriented Programming (OOP) paradigm and replaces the flawed Component Object Model (COM). What we aim to do is to permenantly merge code and data into a new component structure that is stored in a single file format (called the Component Executable, or CXE, format).

Over time programmers have evolved languages of more complex grammars and made more and more flexible use of libraries of existing code to leverage the work of each other. CBOOP is intended to step further in this same direction, making the component an easy to construct network of modular, interdependent structures. The division between library, executable and data file will be eliminated.

The trend toward modular, reusable, interoperable components has gained ground with the Component Object Model (COM). This allowed programmers to create software then replace one piece at a time without rebuilding the entire house. As COM evolved, programmers sought to make code which could run on multiple platforms, sharing resources. The result was Distributed Component Object Model (DCOM), a new dimension in scalability. Unfortunately COM and DCOM fail to consider the security risks associated with their usage. Further they lack any native language support and are cumbersome at best.

Though COM and DCOM may be implemented in virtually any language, the hoops a programmer must jump through are unnecessarily extensive and increase the chance of error. Further, since DCOM allows programs on one machine to execute via Remote
Procedure Call (RPC) code on another machine, the lack of effective security has lead to many serious, recurring security exploits over the years. Successive patches have failed to resolve these issues, due to fundamental flaws in the model itself.

[b][blue]INTRODUCTION: The New Paradigm[/b][black]
CBOOP aims to approach the same component-software end from a new angle. Component-Based Object-Oriented Programming (CBOOP) is a language-based paradigm that extends the Object Oriented Programming (OOP) in a way which is more consistent and powerful. In fact components are called "public class," while internal OOP classes are "private classes."

Software developed with CBOOP will not be created as one executable, supported by libraries with separate data files. Rather, CBOOP is a network of small, reusable, interdependent component-objects. Each is contained in a Component Executable (CXE) file maintaining its own data. Code and data are merged permenantly to improve portability and security.

In theory a software package could create peripheral tools and a core component. Each stores its own parameter information internally, where
corruption and misuse are less likely. One such "peripheral" might store the software package's product data, the actual result of its operations. This is akin to a Microsoft Word document file containing both document data and the routines
necessary to store and retrieve that data. Other components then link to this component-document at run-time link to provide other features--such as editing and printing.

CBOOP implements extensive security through an Object Communication Protocol (OCP) handled by the OSloader. This protocol maintains a working relationship between components, authenticating components by their Universal Identifiers (UIDs), an MD5 hash of a given component, interface or object.

Writing CBOOP software will be more simple than in OOP. OOP programmers have had to fit a round peg (OOP classes) into square holes (structured,linear executables). The CBOOP model eliminates the traditional main() routine. Instead a CXE file can then contain multiple components, the construtor for each being a separate launch point for the file. But, to do this requires a new language.

[b][blue]Introduction: The New Language (X)[/b][black]

The language is based on C++. Originally dubbed "c-squared," the language has been streamlined and cut down warranting a more appropriate name, "X," since the language hass been reduced to a system of expressions. X is a very simple and very scalable language; its
compiler is likewise to be extremely modular and scalable.

X is actually two languages: algorithmic X and directive X. The first produces executable code, i.e. algorithms, to solve problems. The second language directs the compiler on how to manipulate and process the algorithmic code, so as to create new levels of abstraction...such as templates.

For this discussion we are concerned with "algorithmic X," to which we refer simply as X. Under X, there are no statements. Every structure is an expression. Components are defined as "public classes," as opposed to private OOP classes which are internal to the components that use them.

As classes, components are similar to OOP classes. Both have
constructors and destructors. Both have functions to avail their features. Both provide for encapsulation, inheritance and overloading through their respective mechanisms. Yet the operation and terminology is a bit different. Publicly exposed component functions are called "interfaces"; whereas privately scoped funtions are called "methods." Further, though private classes can expose data directly to other classes, by declaring the data objects public "properties" of the class, components are not allowed to expose data directly. Instead, a component may only expose its data through interfaces (public functions).

An example of the X language component definition would appear as follows--

[code]
public class somecomponent;
[/code]

Consitently, an internal class would appear as--

[code]
private class someOOPclass;
[/code]

Both are complete declarations. somecomponent and someOOPclass are identifiers now known to the compiler as component and internal class, respectively. However both are unusable at this point, since they lack definition. We define the class using the link (<<) operator. This could occur during the declaration, as follows:

[code]
public class somecomponent
<< {...[code block]...};
[/code]

to fully define somecomponent. Or, the programmer
could state a declaration first, as follows:

[code]
public class somecomponent;
[/code]

then later specify its structure:

[code]
somecomponent << {...[code block]...};
[/code]

Notice here that the link operator connects a code block and class identifier. Both means of doing this achieve the same end. Class and << are operators in these expressions. {...} is also an operator. All other identifiers in these expressions are considered to be operands (data manipulated by the operators).

One feature of X is its consistency. As everything is an expression we know that the compiler will find only two types of tokens: delimiters and identifiers. Anything that is not a delimiter is an identifier.
Delimiters are symbols which separate identifiers into discrete units. There are literal and contextual delimiters. Literal delimiters include line remarks ("//...[CRLF]"), block remarks ("/*...*/"), quotation
marks, semicolons (";") and whitespace. Contextual delimiters are less clear. These are the instances where grammar causes the separation of two character sequences into separate tokens. For example, if "+"
is an operator and the compiler sees "a+b", "a" and "b" are separated into separate tokens from "+" due to context.

Identifiers in X are either operators and operands. In the earlier example, "a+b" was delimited because "+" was defined as a known operator-identifier. Even if "a" and "b" are unknown, because "+" is known, the two unknowns can be parsed as the Left Hand and Right
Hand sides (LHS and RHS, respectively).

Operators are identifiers which manipulate values passed by operands. Operands are objects (public or private) which are acted upon by operators. Every operator then has two operands (explicit or implicit),
they are the LHS and RHS. The result of an operation is the return value and is passed to the left by default, but does not by default change the LHS operand value. Accordingly in X, the expression

[code]
x = a + b;
[/code]

the addition of a and b does not alter a. Instead,
the sum is the RHS operand for the assignment operator
("="); whereas the assignment operator is coded to
first assign the RHS value to its LHS operand then
return the RHS value to the left (if possible).


[code]
[NOTE:
For the compiler design this means X source
code can be parsed into a binary tree structure!

An identifier is then represented by a node in
the tree, its LHS and RHS are the left and
right downward branches and the return value
is implied as being passed up the tree through
the PREV link.
]
[/code]

There is another neat aspect of X: objects are
represented by identifiers, not bound by them. An
object identifier (whether operator or operand)
represents the object only so long as it is linked to
the object. An object identifier tracks the ADDRESS
in memory where the object is stored and the size of
the object's footprint in memory. X is strongly typed
but not strongly linked. Invoking an operator, then,
causes the operator to take as input the references to
its operands (LHS and RHS), NOT THE VALUES! The
operator can then directly affect or protect the
operands. The operator's return value is likewise a
reference to some temporary memory location allocated
for the purpose and later recovered through garbage
collection.

Since object identifiers represent the memory location
of their respective operators and operands, X permits
class and operator definitions a new level of freedom.
For instance, a class could be declared as follows:

[code]
public class foo;
[/code]

Then it could be defined--

[code]
foo << {...[code block]...};
[/code]

as we witnessed earlier. But X also allows the
programmer to later in code state

[code]
foo << {...[more code]...};
[/code]

and assumming foo was not deleted (i.e. the original
definition was not erased), "more code" would add
features to that given by "code block." Moreover,
since {...} is a "group operator" (which we haven't
discussed) that returns the memory location for the
contained code block, we could also define another
class--

[code]
public class moo << {...[something else]...};
[/code]
and link moo and foo together to like this--

[code]
moo << foo;
[/code]

The result is a moo class which inherits foo class at
the point of its last definition before moo linked
into its identity.

This brings another interesting point about the X
language, class definitions contain expressions.
Therefore, they are algorithic descriptions about the
construction of a class instance. When a class is
instantiated its constructor is launched. Implicitly
the code produced by the class definition blocks are
the constructor's preamble. Its new and delete
operators are constructor and destructor,
respectively.

Class instances in X are created using the following
format--

[code]
instance new classname;
[/code]

A class may also have a special array operator that is
invoked with new, as follows--

[code]
instance new classname array somenumber;
[/code]

Example:
[code]
n new int array 5
[/code]

creates a 5-element integer array whose memory
location is stored in some point represented by n. We
can retrieve that location by invoking the ampersand
("&") operator as follows:

[code]
p = &n;
[/code]

where p represents some properly declared object.

[Note: p is declared as follows...
array p;
to be an unknown array object.
]

Inheritance of classes, demonstrated earlier, with the
link operator << also affects the definition of
operators in X. Remember, X has NO FUNCTIONS OR
PROCEDURES, only operators. An operator is defined
as--

[returntype]operator([LHStype][operatorid][RHStype]);


Example:
[code]
int operator (int + int);
[/code]

where int (an integer data type) is the LHS and RHS of
a new operator (+) having an integer return type.
Here parenthesis is a special operator in the context
of the operator operator.

But the above example, like in the case of a class
declaration, fails to provide any real definition for
the new operation. This may be done concurrently, as
below:

[code]
int operator (int + int) << {...[code block]...};
[/code]

or definition may occur separately--

[code]
int operator (int + int);
.
.
int operator (int + int) << {...[code block]...};
.
.
[/code]

However, reference to the + operator must include its
type signature, since operators can be overloaded and
code must be linked only to specific instances of the
operator.

Linking code and class definitions in X also means
that code blocks can be defined separately and linked
later, during run-time. Thus, a programmer in X can
state--

[code]
a = &{...[Code A]...};
[/code]

to store the address of code block A in the object a,
presumably a properly typed object. Later the
programmer could define--

[code]
b = &{...[Code B]...};
[/code]

to the same end. Then the programmer could merge the
two into a single routine by stating--

[code]
c = &{{&a}<<{&b}};
[/code]

where {...} accepts the references to the code and <<
links the two code structures together before storing
the address in c.

[Note: In this example a,b and c are first
declared as unknown operators.
operator a;
operator b;
operator c;
But none of the three can be used
as operators in this context, they
are only references to objects of
the type since they have no type
signature.
]


****************************************
Excellence Breeds! Go Hard or Go Home.

Let Penguins rule the earth.
Break some windows today.

Comments

  • [b][red]This message was edited by 684867 at 2005-4-3 10:10:37[/red][/b][hr]
    Project: [blue] Component Based Object Oriented Programming
    (CBOOP)[black]

    SubProject: [blue]Compiler Design[black]

    Author: [blue]Sam Caldwell
    (c) March 2005.[black]
    [hr]


    The X compiler is to be modular, capable of later revision into the CBOOP scheme as separate interoperating components. Each module of the compiler should provide either a service to the compiler or represent a phase of the compile process. We define the compiler using the following schematic:

    [code]

    INPUT: SOURCE CODE
    |
    v
    Source Stream----------------------HTML Stripper
    |
    v
    Tokenizer---------<<symbol data<<----Symbol Table
    |
    v
    Lexical Analyzer--+--<<-->>symboldata>>---Symbol Table
    | ^ |
    | | v
    | +---<<------<<-+
    v
    +---->>Compiler directives>>-----Preprocessor
    |
    v
    +---->>Tokens>>--------Parser
    |
    v
    +----------------------------------------+
    |
    v
    Parser-<<->>--+---->>symboldata>>---------------Symbol Table
    | ^ |
    | | v
    | +----<<symboldata<<---------+
    v
    +---<<->>---+----->>Tokens>>-------ParseTree
    | ^ |
    | | v
    | +-----<<Tokens<<------------+
    |
    v
    Code Generator----??
    |
    ???
    |
    |
    v
    OUTPUT: CXE FILE
    [/code]

    Each component-role is defined as follows:
    [red][b]"INPUT: SOURCE CODE"[/b][black] -- We assume here that this is an arbitrary stream of ASCII text. In the prototype this stream will naturally be a text file with the extension .xxx. However, since CBOOP merges code and data the eventual source code storage point will be in the sourcestream component object itself.

    [red][b]Source Stream[/b][black] -- This component represents the source input in its native form. This is the storage/retrival mechanism for X source code and will be added onto considerably with time. As the schema above depicts, the SourceStream will eventually call on the services of the HTML stripper (see discussion of hypersource code elsewhere), which will filter out characters unnecessary to compilation of the code. SourceStream provides two basic interfaces to the compiler SS.scan() and SS.lookahead(). These interfaces allow client modules to scan a single character from the buffer and lookahead at the next character which will be scanned. scan() and lookahead() will never allow filtered characters to be viewed.



    [red][b]HTML Stripper[/b][black] -- This stripper will be used by source stream to

    (1) strip HTML tag from hypersource code streams,
    (2) convert TAB and CRLF characters into SPACE characters,
    (3) reduce repeating SPACE characters into a single SPACE
    character, and
    (4) remove any traditional "//" and "/*...*/" remarks from
    the code stream.

    This stripper module will analyze two characters at a time and return a pass-fail result for the first character in a given test. Its primary interface is Stripper.test().

    [red][b]Tokenizer[/b][black] -- The tokenizer scans characters from the source stream to determine whether the character is a delimiter or token. Its input is a single character and its output is a token object. This object is loaded with known information from the Symbol Table (which creates and issues new tokens) and the lexeme Tokenizer recognizes from the code input. Tokenizer.scan() scans characters until it can produce a single token. Tokenizer.lookahead() returns the next token to be scanned (which is waiting in the scan buffer.

    [red][b]Token[/b][black] -- The token object is a medium. This is the smallest logical unit in the compiler's view of a language. Tokens are either identifiers or delimiters. The token must be capable of fitting into the structure of the ParseTree and the SymbolTable.

    [red][b]SymbolTable[/b][black] -- This is the mechanism which tracks identifiers as they are declared and allocates space therefor in the final product. The SymbolTable resolves and maintains the definition of identifiers representing objects (operators and operands) as well as types. Scope tracking is performed by the SymbolTable.

    [red][b]LexicalAnalyzer[/b][black] -- This is the hub of the compiler. Here the two languages (algorithmic and organizational) meet. The organizational language of X, traditionally known as compiler directives, are passed to the preprocessor while the algorithmic language of X is recognized and sent on to the parser.

    [red][b]Preprocessor[/b][black] -- This is an interpreter which acts as a command shell for the compiler, accepting scripted commands in the source stream which can alter the source stream and the compiler's operation thereupon.

    [red][b]Parser [/b][black]-- The Parser has three responsibilities: (1) Create a diagrammatic representation (ParseTree) of the source stream which is language neutral. This diagram is a binary tree representing each expression of the source code in a linear manner. (2) Verify the syntactical and semantic correctness of the diagrammatic language. (3) Reduce the ParseTree to its smallest size, elminating redundancy and waste.

    [red][b]ParseTree[/b][black] -- The ParseTree is a binary tree with special properties. Each node of the tree is a token. Each token is first defined as legitimate in the symbol table. Each token has an assigned scope identifier which is recognized in the symbol table. The two nodes of each token in the tree is the LHS and RHS, respectively. These are the source of operand values for a given node. Likewise each node is either an LHS or RHS to some root node, and accordingly, it will return its resulting value to the PREV node.

    [red][b]Code Generator [/b][black] -- The code generator will accept a ParseTree as input an produce two output streams (1) an assembly language version of the output and (2) a binary component executable (CXE) file.


    [hr]
    [blue][b]Author's note:[/b][black]

    Please understand that this is a very brief outline of the compiler design. Some things have yet to be worked out. Others are pretty well set.

    The primary weakness in this design is the lack of an error handling subsystem. I will add that with time. The second weakness is lack of a code generator plan. It is my intent that the code generator should be so modular that the same parsetree can be passed to multiple code generators to yield binaries for multiple machine language platforms.


    ****************************************
    Excellence Breeds! Go Hard or Go Home.

    Let Penguins rule the earth.
    Break some windows today.













  • [b][red]This message was edited by 684867 at 2005-4-3 10:15:9[/red][/b][hr]

    Project: [blue] Component Based Object Oriented Programming
    (CBOOP)[black]

    SubProject: [blue]ParseTree[black]

    Author: [blue]Sam Caldwell
    (c) March 2005.[black]
    [hr]




    [blue][b]Problem Statement:[black][/b]

    Establish parameters for a diagrammatic structure capable of representing the meaning of processed source code which is neutral to the source language in preparation for conversion to an arbitrary object language.

    [blue][b]Solution:[black][/b]

    Implement ParseTrees using a modified binary tree structure in which each node is a single token having three links. The first link is the PREV pointer to some root node, the second link is the LHS pointer to some token which was detected to the left of the given token, and the third link is the RHS pointer to some token which was detected to the right of the given token.

    A ParseTree is sorted by order of appearence and order of precedence. Order of appearence is the order in which a token appears in the source stream sequence. Order of precedence is the sequence in which a token should be evaluated (e.g. multiplication should precede addition).

    Thus the basic tree, below,

    [code]
    PREV
    |
    v
    TOKEN
    /
    LHS RHS
    [/code]

    is implemented as

    [code]
    ROOT
    |
    v
    ASSIGNMENT
    /
    op(a) op(b)
    [/code]

    to represent the expression "a=b", where "ROOT" is some unknown preceding expression, = is the assignment operator and both a and b are the operands.

    Expressions represented by ParseTrees are continually sifted for precedence. However, precedence is limited by the semicolon delimiter. Once a semicolon is encountered, evaluation of precedence stops, since all terms before the semicolon are considered to be a separate expression. Likewise, tokens contained in group operators (such as braces, brackets and parenthesis) are evaluated only within the boundaries of the group operator.

    The ParseTree is a closed environment. Tokens can be passed into the ParseTree structure blindly using ParseTree->submit() and later extracted using ParseTree->extract(). However, all other data operations are internal other than navigation.


    ParseTree methods
    [hr]
    Method Description
    [hr]
    submit() Adds a token to the tree and sifts the tree for
    proper positioning.

    extract() Copy-returns the current token (eliminating all
    links to the tree).

    Left() Shifts current token pointer to LHS.

    Right() Shifts current token pointer to RHS.

    Prev() Shifts current token pointer to PREV.

    delete() Deletes current token and all sub-token trees.
    [hr]



    ****************************************
    Excellence Breeds! Go Hard or Go Home.

    Let Penguins rule the earth.
    Break some windows today.



  • [b][red]This message was edited by 684867 at 2005-4-3 10:17:50[/red][/b][hr]

    Project: [b][blue]Component Based Object Oriented Programming
    (CBOOP)[/b][black]

    SubProject: [b][blue]Code Generator[/b][black]

    Author: [b][blue]Sam Caldwell
    (c) March 2005.[/b][black]
    [hr]


    [blue][b]Problem Statement:[black][/b]

    Develop a C++ class which analyzes a ParseTree produced from some arbitrary source code stream and outputs equivalent assembly language and machine language files.

    Maintain the modularity of the code generator, such that its basic parameters and framework can easily be adapted to other CPU instruction sets and possibly a machine-independent pcode.

    [blue][b]Solution:[black][/b]

    [No solution developed at this time]

    [blue][b]Parameters:[black][/b]

    Code generator must produce Component Executable (CXE) files. These files are to replace DLL and EXE files and will further require a loader for the target OS.

    The Code generator must provide basic error handling and reporting services specific to its own needs, using a separate error handler class. The error handler class should perform its basic I/O using a dummy class named "errorhandler," which will be replaced later in the project with a full-fledged error handling facility.



    ****************************************
    Excellence Breeds! Go Hard or Go Home.

    Let Penguins rule the earth.
    Break some windows today.



  • [b][red]This message was edited by 684867 at 2005-4-3 2:50:52[/red][/b][hr]
    : ------------------------------------------------------
    : Project: Component Based Object Oriented Programming
    : (CBOOP)
    :
    : SubProject: Component Executable (CXE) File Format
    :
    : Author: Sam Caldwell
    : (c) March 2005.
    : ------------------------------------------------------
    :
    ***Problem Statement:

    The new programming paradigm requires a format for the component image, both in memory and on disk.

    ***Solution:

    Discussion is limited to the component image's disk state. Its memory state will be taken up later. We have two priorities in defining this format: (1) maintaining a small disk footprint and to establish a high degree of scalability. CXE files must be able to contain multiple interdependent components efficiently.

    We illustrate the component CXE file, below, and note that it consists of a 160 byte header and variable-length body of up to 4GB.

    4.1: CXE File format
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 007 08 FileVersion CXE file format version code.
    008 - 015 08 FileAttributes (Reserved)
    016 - 031 16 Count Number of components in CXE file.
    032 - 063 32 TimeStamp File Creation Timestamp
    064 - 095 32 CDTAddress Component Descriptor Table Address
    096 - 127 32 Dptr First Data Page Address
    128 - 159 32 Cptr First Code Page Address

    160 - nnn xx CodePage Block of executable code.
    aaa - bbb xx DataPage Block of persistent/volatile data.
    ccc - ddd xx CDT Component Descriptor Table.
    -----------------------------------------------------------------------

    The 160-byte header begins with the FileVersion field specifying the exact format which follows. Reading the file accurately depends on the value of this field, which is followed by a reserved FileAttributes field. At present this field is reserved for use under the AMI-OS project.

    The header also specifies a number of components contained in the file. This is a time saver and redundancy for integrity verification. Remember: a single CXE can contain multiple independent or interdependent components. (This allows enormous next-gen op/sys capabilities, as described in the AMI-OS design notes.)

    A creation timestamp is also included as a security feature and a part of the AMI-OS design to be gradually introduced over the coming years.

    Finally, the header references to the major sections of the file body. These include the addresses of the first code page (Cptr), data page (Dptr) and Component Descriptor Table (CDTAddress). Each address (like all addresses in the CXE file) are measured relative to byte 160 of the CXE file.

    Code pages, data pages and CDTs are all connected into separate linked lists. The header reference to the first node in these lists maintains the integrity of the file. These lists are doubly linked to minimize the chance of fragmentation. Where these pages and tables are maintained as linked lists by the CXE file for tracking allocated memory, however, they are also linked as parallel linked lists to build component assemblies which are defined by the Component Descriptor Table (CDT).

    The CDT references data and code pages. These pages are blocks of disk memory linked together into scalable lists. Each data and code page is defined as either persistent or volatile. In the disk state, a persistent page is one which cannot be altered during run-time (i.e. read-only); whereas a volatile page is that which may be altered during run-time (i.e. read-write).

    The CDT acts as a blueprint for the component. It references the code and data pages needed by that blue print to assemble the required component. A single CDT defines one component. It may use any number of code and data pages to accomplish this mission. These same code and data pages may be cross-linked by several CDTs in separate linked lists defining separate components, where the contents of the code or data page is identical. Yet this is only to compress the size of the CXE file and does not imply that the actual data and code are shared, or that the boundaries of a component are in any way compromised.

    Below, we define the Component Descriptor Table (CDT). In doing so we reference byte positions from the start of the CDT block, rather than from the 160th byte of the file. This is done since the actual position would depend on the placement of the CDT in the file, a matter known at this time to be variable.

    4.2: Component Descriptor Table (CDT).
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 007 008 CDTVersion CDT Format
    008 - 263 256 UID Universal Identifier (MD5 hash)
    264 - 272 008 CVS Version User-Defined Component Version
    273 - 277 004 ThreadModel Processing Model
    278 - 282 004 Attributes Loading/Execution Attribues
    283 - 314 032 CTime Creation TimeStamp
    315 - 346 032 UTime Update Timestamp
    347 - 362 016 Language Instruction Set Code

    363 - 394 032 Depptr Dependency Table Address
    395 - 416 032 BCTptr Base Component Table Address
    417 - 448 032 Instanceptr Instance vTable Address
    449 - 480 032 Nextptr Next CDT Address
    481 - xxx xxx LogicalName Component Logical Name (String)
    -----------------------------------------------------------------------

    The CDT is a table of tables. We start with a CDTversion field which denotes the version of the Component Descriptor Table we are about to read. This will allow newer CDT formats to be stored in older CXE formats, and vis-a-versa. CDTversion is followed by a UID or universal identifier. The UID is an MD5 checksum and the core to CBOOP security. Each component and component instance (object) is uniquely identified by its UID.

    UID is followed by the CVSversion. CVS version refers to the programmer's version. A UID cannot accurately serve this function due to its higher sensitivity to a component's data. Rather CVSversion allows the programmer to track the component by the source code from which it was produced.

    Threadmodel and language are fields used by Op/sys loaders to determine how a component will run. Will the component run within the same process space as the owner component or separately? Is the binary language native to the CPU, or is an intepreter necessary? Likewise, component Attributes are a matter for OSloaders and are left for discussion elsewhere.

    CTime and UTime maintain the creation and update timestamps for a component. These timestamps affect the UID and increase security around the system.

    Thus far, the CDT has given general parameter information. What follows this is a set of tables which define the actual component. The first is the dependency table, which defines the components upon which the defined component depends. That is, if component "foo" uses the services of component "moo," then moo appears in the dependency table for foo.

    4.3: CDT Dependencies Table
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize Size (in Bytes).
    032 - 287 032 UID Universal Identifier
    288 - xxx xxx FileName CXE Filename
    -----------------------------------------------------------------------

    Dependency Tables define the filename and UID for components used by a given component. However, CBOOP does allow for component inheritance. Inherited components are defined in the Base Component Tables (BCT), referenced by the BCTptr field in the CDT. This table is similar in format to the CDT Dependency Table, as illustrated below:

    4.4: Base Component Table (BCT)
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize Size of the BCT (in bytes).
    032 - 287 032 UID Universal Identifier
    288 - 289 002 Scope Inheritance Scope (enumerated)
    290 - xxx xxx Filename Component CXE filename (string)
    -----------------------------------------------------------------------

    The BCT contains basic information about the component's storage location (filename), identity (UID) and inheritance scope (public, private, friend or protected). This table is of variable length, as determined by TableSize. It may range up to 4GB in size.

    BCT and the Dependency Table allow the CDT to describe the foundation upon which a component will be assembled. However the "meat and potatoes" is the Instance Vector Table (IVT). This IVT serves two functions: (1) it defines the prototype component object and (2) it stores the persistent definition of all objects of a given component type.

    4.5: Instance Vector Table.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 255 256 UID Universal Identifier
    256 - 287 032 Nextptr Next IVT Record Pointer
    288 - 319 032 Dataptr Data Area Pointer
    320 - 351 032 Iptr Interface Vtable Pointer
    352 - 383 032 Mptr Method Vtable Pointer
    -----------------------------------------------------------------------

    IVTs are another level of detail for the component. Here the component data area is referenced, the component-object is identified by unique hash and interface and method vector tables are defined. But, further, the IVT is revealed to be part of a linked list of records. The table is scalable. As new instances are created the file can grow. Likewise, instances can be deleted from the table and the table can contract without much effort.

    UIDs are digital signatures identifying each component-object. The component-prototype from which all other instances are created has the same UID as that stored elsewhere in the CDT. This is always the first record in the IVT, and though it is possible to change this prototype image under CBOOP, doing so changes the MD5 checksum for the component-- and thus the component identity itself. This is core to the security of the CBOOP architecture.

    From the component-prototype component objects are copy-created. That is the prototype IVT is copies, as are the related tables and data area. The first to be copies is the data area with the default data image, this is followed by the method vtable and interface vtable last. the interface vtable then contains the component constructor that is executed immediately following its copy creation to the new IVT.

    4.6: IVT Data Area Table.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize Size of Table (in Bytes).
    008 - 063 032 PageNumber CXE Page Number (Code/Data).
    064 - 095 032 Offset Address relative to top of data page.
    096 - 127 032 Size Size of the data area.
    -----------------------------------------------------------------------

    The IVT-DAT format is straight forward. A CXE Page number identifies each data and code page in the CXE. This number is used to address the proper data/code page in the file. From this number, the data area can be calculated as an offset within the page having a given size.


    4.7: IVT Interface vTable.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 Nextptr Reference to next vTable record.
    008 - 063 032 Chainptr Reference to the code chain.
    064 - 319 256 UID Universal Identifier
    320 - 336 016 Plen Argument Length
    337 - xxx xxx Arglist Parameters
    xxx - xxx xxx LogicalName Logical Interface name
    -----------------------------------------------------------------------

    The IVT Interface vTable is a linked list of descriptors which cover each interface exposed by a component. Each interface has its own UID, strongly typed parameter list and logial name. The interface itself is a chain of executable binary code represented by a linked list of code pages referenced by the Chainptr field.

    4.8: IVT Method vTable.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize TableSize (in bytes)
    008 - 063 032 Chainptr Reference to the code chain.
    -----------------------------------------------------------------------

    Because methods are internal structures, they may be more rigidly defined by the compiler and require less information at run-time. Thus the method vtable only references the codechain where a component lies.

    4.9: IVT Code Chain Table
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 FilePage Code Page Reference
    008 - 063 032 Offset Offset within code page
    064 - 095 032 Size Size of code chain node
    096 - 127 032 Nextptr Reference to next code chain node.
    -----------------------------------------------------------------------

    5.0: Code Pages
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 32 FilePage Page ID (Internal Address)
    032 - 063 32 Attributes Page Attributes (such as read-only)
    064 - 095 32 Page Size
    096 - 127 32 Nextptr Address of next code page
    128 - xxx xx Undefined code area
    -----------------------------------------------------------------------

    5.1: Data Pages
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 32 FilePage Page ID (Internal Address)
    032 - 063 32 Attributes Page Attributes (such as read-only)
    064 - 095 32 Page Size
    096 - 127 32 Nextptr Address of next data page
    128 - xxx xx Undefined data area
    -----------------------------------------------------------------------

    Code and data pages use the same format. They differ only in the type of information they contain. Even their attributes should be compatible. Primarily the attributes reflect persistence/volatility. Otherwise they are reserved for later specification under project AMI-OS.
  • [hr]
    Project: [blue] Component Based Object Oriented Programming
    (CBOOP)[black]

    SubProject: [blue]Component Memory Image-State[black]

    Author: [blue]Sam Caldwell
    (c) March 2005.[black]
    [hr]


    [blue][b]Problem Statement.[/b][black]

    In an earlier posting I defined the disk-state image of a component, i.e. the Component Executable (CXE) file format. Here I continue that discussion and define the memory-state image of the component. There are two priorities here: (1) maximize potential execution speeds and (2) reduce the memory overhead without sacrificing scalability and functionality.

    [blue][b]Solution[/b][black]

    The component-object in memory consists of two parts: the component-prototype and the component-object. Every component is prototyped initially. Components of the same class (type) share the prototype--a blueprint of the component object.

    Component prototypes maintain all constant structure. That is, anything of the component which will not change is a part of the prototype. When new component instances are required, these areas are copy-created by the component constructor to new memory areas for the component-object. The component-prototype is similar to its disk-state image counterpart.

    Components appear in memory as a collection of tables. The first table is the Component Descriptor Table (CDT).

    5.2: Component Descriptor Table (CDT).
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 007 008 CDTVersion CDT Format
    008 - 263 256 UID Universal Identifier (MD5 hash)
    264 - 295 032 Instanceptr Instance vTable Address
    -----------------------------------------------------------------------

    Much of the general information contained in the CXE image CDTs is not present in the memory version. Rather this information is utilized and if needed tracked by the OSLoader. The CDT merely contains the CDTVersion, UID and a pointer to the Instance vTable.


    5.3: CDT Dependencies Table
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize Size (in Bytes).
    032 - 287 032 UID Universal Identifier
    288 - xxx xxx FileName CXE Filename
    -----------------------------------------------------------------------

    5.4: Base Component Table (BCT)
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize Size of the BCT (in bytes).
    032 - 287 032 UID Universal Identifier
    288 - 289 002 Scope Inheritance Scope (enumerated)
    290 - xxx xxx Filename Component CXE filename (string)
    -----------------------------------------------------------------------

    The BCT contains basic information about the component's storage location (filename), identity (UID) and inheritance scope (public, private, friend or protected). This table is of variable length, as determined by TableSize. It may range up to 4GB in size.

    BCT and the Dependency Table allow the CDT to describe the foundation upon which a component will be assembled. However the "meat and potatoes" is the Instance Vector Table (IVT). This IVT serves two functions: (1) it defines the prototype component object and (2) it stores the persistent definition of all objects of a given component type.

    4.5: Instance Vector Table.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 255 256 UID Universal Identifier
    256 - 287 032 Nextptr Next IVT Record Pointer
    288 - 319 032 Dataptr Data Area Pointer
    320 - 351 032 Iptr Interface Vtable Pointer
    352 - 383 032 Mptr Method Vtable Pointer
    -----------------------------------------------------------------------

    IVTs are another level of detail for the component. Here the component data area is referenced, the component-object is identified by unique hash and interface and method vector tables are defined. But, further, the IVT is revealed to be part of a linked list of records. The table is scalable. As new instances are created the file can grow. Likewise, instances can be deleted from the table and the table can contract without much effort.

    UIDs are digital signatures identifying each component-object. The component-prototype from which all other instances are created has the same UID as that stored elsewhere in the CDT. This is always the first record in the IVT, and though it is possible to change this prototype image under CBOOP, doing so changes the MD5 checksum for the component-- and thus the component identity itself. This is core to the security of the CBOOP architecture.

    From the component-prototype component objects are copy-created. That is the prototype IVT is copies, as are the related tables and data area. The first to be copies is the data area with the default data image, this is followed by the method vtable and interface vtable last. the interface vtable then contains the component constructor that is executed immediately following its copy creation to the new IVT.

    4.6: IVT Data Area Table.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize Size of Table (in Bytes).
    008 - 063 032 PageNumber CXE Page Number (Code/Data).
    064 - 095 032 Offset Address relative to top of data page.
    096 - 127 032 Size Size of the data area.
    -----------------------------------------------------------------------

    The IVT-DAT format is straight forward. A CXE Page number identifies each data and code page in the CXE. This number is used to address the proper data/code page in the file. From this number, the data area can be calculated as an offset within the page having a given size.


    4.7: IVT Interface vTable.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 Nextptr Reference to next vTable record.
    008 - 063 032 Chainptr Reference to the code chain.
    064 - 319 256 UID Universal Identifier
    320 - 336 016 Plen Argument Length
    337 - xxx xxx Arglist Parameters
    xxx - xxx xxx LogicalName Logical Interface name
    -----------------------------------------------------------------------

    The IVT Interface vTable is a linked list of descriptors which cover each interface exposed by a component. Each interface has its own UID, strongly typed parameter list and logial name. The interface itself is a chain of executable binary code represented by a linked list of code pages referenced by the Chainptr field.

    4.8: IVT Method vTable.
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 TableSize TableSize (in bytes)
    008 - 063 032 Chainptr Reference to the code chain.
    -----------------------------------------------------------------------

    Because methods are internal structures, they may be more rigidly defined by the compiler and require less information at run-time. Thus the method vtable only references the codechain where a component lies.

    4.9: IVT Code Chain Table
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 032 FilePage Code Page Reference
    008 - 063 032 Offset Offset within code page
    064 - 095 032 Size Size of code chain node
    096 - 127 032 Nextptr Reference to next code chain node.
    -----------------------------------------------------------------------

    5.0: Code Pages
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 32 FilePage Page ID (Internal Address)
    032 - 063 32 Attributes Page Attributes (such as read-only)
    064 - 095 32 Page Size
    096 - 127 32 Nextptr Address of next code page
    128 - xxx xx Undefined code area
    -----------------------------------------------------------------------

    5.1: Data Pages
    -----------------------------------------------------------------------
    Byte Pos. Bits Field Name Description
    -----------------------------------------------------------------------
    000 - 031 32 FilePage Page ID (Internal Address)
    032 - 063 32 Attributes Page Attributes (such as read-only)
    064 - 095 32 Page Size
    096 - 127 32 Nextptr Address of next data page
    128 - xxx xx Undefined data area

    ****************************************
    Excellence Breeds! Go Hard or Go Home.

    Let Penguins rule the earth.
    Break some windows today.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

In this Discussion