Record (computer science)
|
|
This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (September 2009) |
In computer science, a record (also called tuple or struct) is one of the simplest data structures, consisting of two or more values or variables stored in consecutive memory positions; so that each component (called a field or member of the record) can be accessed by applying different offsets to the starting address.
For example, a date may be stored as a record containing a 16-bit numeric field for the year, a three-letter field for the month, and an 8-bit numeric field for the day-of-month. As this example shows, the fields of a record need not all have the same size and encoding; therefore, in general one cannot easily obtain the field which has a run-time computed index in the field sequence, as one can do in an array.
A record type is a data type that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field, its position in the record, and an identifier (name or label) by which it can be accessed. In type theory, product types are generally preferred due to their simplicty, but proper record types are studied in languages such as System F-sub. Since type-theoretical records may contain first-class function-typed fields in addition to data, they can express many features of object-oriented programming.
Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tapes or hard disks. Records are a fundamental component of most data structures, especially linked data structures. Many computer files are organized as arrays of logical records, often grouped into larger physical records or blocks for efficiency.
The parameters of a function or procedure can often be viewed as the fields of a record variable; and the arguments passed to that function can be viewed as a record value that gets assigned to that variable at the time of the call. Also, in the call stack that is often used to implement procedure calls, each entry is an activation record or call frame, containing the procedure parameters and local variables, the return address, and other internal fields.
An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object data types (often called object classes) are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects.
A record can be viewed as the computer analog of a mathematical tuple. In the same vein, a record type can be viewed as the computer language analog of the Cartesian product of two or more mathematical sets, or the implementation of an abstract product type in a specific language.
Contents |
History
The concept of record can be traced to various types of tables and ledgers used in accounting since remote times. The modern notion of records in computer science, with fields of well-defined type and size, was already implicit in 19th century mechanical calculators, such as Babbage's Analytical Engine.[citation needed]
Records were well established in the first half of the 20th century, when most data processing was done using punched cards. Typically each records of a data file would be recorded in one punched card, with specific columns assigned to specific fields.
Most machine language implementations and early assembly languages did not have special syntax for records, but the concept was available (and extensively used) through the use of index registers, indirect addressing, and self-modifying code. Some early computers, such as the IBM 1620, had hardware support for delimiting records and fields, and special instructions for copying such records.
The concept of records and fields was central in some early file sorting and tabulating utilities, such as IBM's Report Program Generator (RPG).
COBOL was the first widespread programming language to support record types,[citation needed] and indeed its record definition facilities were quite complex for the times. The language allowed the definition of nested records with alphanumeric, integer, and fractional fields of arbitrary size and precision, as well as fields that would automatically format any value assigned to them (such as inserting a dollar sign and a decimal period). Each file had to be associated with a record variable where data was read into or written from. COBOL also provided a MOVE CORRESPONDING command that would assign corresponding fields of two record, paired by their names.
The early languages developed for numeric computing, such as FORTRAN (up to FORTRAN IV) and Algol 60, did not have support for record types; but latter versions of those languages, such as Fortran 77 and Algol 68 did add them. The original Lisp programming language too was lacking records (except for the built-in cons cell), but its S-expressions provided an adequate surrogate. The Pascal programming language was one of the first languages to fully integrate record types with other basic types into a logically consistent type system. IBM's PL/1 programming language provided for COBOL-style records. The C programming language initially provided the record concept as a kind of template (struct) that could be laid on top of a memory area, rather than a true record data type. The latter were provided eventually (by the typedef declaration), but the two concepts are still distinct in the language. Most languages designed after Pascal (such as Ada, Modula, and Java) also supported records.
Operations
A programming language that supports record types usually provides special syntax for some or all of the following:
- Declaring a new record type, including the position, type, and (possibly) name of each field;
- Declaring variables and values as having a given record type;
- Constructing a record value from given field values and (sometimes) with given field names;
- Selecting a field of a record with an explicit name;
- Assigning a record value to a record variable;
- Comparing two records for equality;
- Computing a standard hash value for the record.
Some languages may provide facilities that enumerate all fields of a record, or at least the fields that are references. This facility is needed to implement certain services such as debuggers, garbage collectors, and serialization. It requires some degree of type polymorphism.
The selection of a field from a record value yields a value. The selection of a field from a record variable usually yields a variable. This variable has the field's data type, and usually can be assigned to and otherwise handled as any ordinary variable of that type.
Assignment and comparison
Most languages allow assignment between records that have exactly the same record type (including same field types and names, in the same order). Depending on the language, however, two record data types defined separately may be regarded as distinct types even if they have exactly the same fields.
Some languages may also allow assignment between records whose fields have different names, matching each field value with the corresponding field variable by their positions within the record; so that, for example, a complex number with fields called real and imag can be assigned to a 2D point record variable with fields X and Y. In this alternative, the two operands are still required to have the same sequence of field types. Some languages may also require that corresponding types have the same size and encoding as well, so that the whole record can be assigned as an uninterpreted bit string. Other languages may be more flexible in this regard, and require only that each value field can be legally assigned to the corresponding variable field; so that, for eaxmple, a short integer field can be assigned to a long integer field, or vice-versa.
Other languages (such as COBOL) may match fields and values by their names, rather than positions.
These same possibilities apply to the comparison of two record values for equality. Some languages may also allow order comparisons ('<'and '>'), using the lexicographic order based on the comparison of individual fields.[citation needed]
Algol 68's distributive field selection
In Algol 68, if Pts was an array of records, each with integer fields X and Y, one could write Pts.Y to obtain an array of integers, consisting of the Y fields of all the elements of Pts. As a result, the statements Pts[3].Y := 7 and Pts.Y[3] := 7 would have the same effect.
Pascal's "with" statement
In the Pascal programming language, the command with R do S end would execute the command sequence S as if all the fields of record R had been declared as variables. So, instead of writing Pt.X := 5; Pt.Y := Pt.X + 3 one could write with Pt do X := 5; Y := X + 3 end.
Representation in memory
The representation of records in memory varies depending on the programming languages. Usually the fields are stored in consecutive positions in memory, in the same order as they are declared in the record type. This may result in two or more fields stored into the same word of memory; indeed, this feature is often used in systems programming to access specific bits of a word. On the other hand, most compilers will add padding fields, mostly invisible to the programmer, in order to comply with alignment constraints imposed by the machine-say, that a floating point field must occupy a single word.
Some languages may implement a record as an array of addresses pointing to the fields (and, possibly, to their names and/or types). Objects in object-oriented languages are often implemented in rather complicated ways, especially in languages that allow multiple class inheritance.
See also
- struct (C programming language)
- Composite data type
- Object composition
- Cons cell
- Storage record
- Block (data storage)
- Hashed linking
- Data structure alignment
- Data hierarchy
|
|||||||||||||||||||||||