Wednesday Week 9


Nerds You Should Know #81/34

The next in a series on famous computer scientists ...
 
    

 
They developed one of the most useful Web tools ...


... Nerds You Should Know #82/34

Larry Page

Sergey Brin

  
  • Co-founders of   
  • Page: BSc/BE University of Michigan
  • Brin: BSc University of Maryland
  • Both moved to Stanford for PhD in mid-1990's
  • PhD work led to new ideas on Web searching
    • use keywords like "normal" search engines
    • augment document ranking by "credibility"
    • credibility related to inbound links
  • Ideas led to prototype, then to company
  • Google Inc. founded in 1998


Files3/34

A file is a sequence of bytes on a storage device.

Files are normally persistent ...

Files are named   (e.g. /home/mit/notes.txt) ...


Exercise: Unix file permissions4/34

Each file on Unix has:

Investigate how each of these


File Data/Operations5/34

What kind of data is held in files?

Text files contain ASCII characters ...

Binary files contain arbitrary bytes ... Type of data in a file is determined by suffix or by content
(e.g. .c .o .txt .tex .doc .jpg .mp3 .wmv   vs   Unix file command )


... File Data/Operations6/34

Standard operations on files:

Other common operations (in Unix):


... File Data/Operations7/34

Unix uses a file-like interface for many kinds of objects:

Some of these (e.g. /dev/tty1) are not persistent.

Instead, they give an infinite stream of incoming/outgoing data.


Streams8/34

Input/output to/from programs occurs via streams

All input/output in programs so far ...


... Streams9/34

Examples of redirection of standard streams:


The stdio.h Library10/34

stdio.h gives an interface for manipulating text files

Note:  printf(fmt,...)  =  fprintf(stdout,fmt,...)


... The stdio.h Library11/34

For C programs using the stdio.h library:

The stdin/stdout/stderr streams Other streams must be opened/closed by the programmer

(C programs have a limit on number of simultaneously open streams (e.g. 1024))


The FILE* Type12/34

FILE* is the type used to interact with files (a handle)

Conceptually, a FILE* represents a stream

FILE* values are created by fopen()

FILE* values are deleted by fclose()


... The FILE* Type13/34

Common operations on FILE* objects:

int fgetc(FILE *inf) ... read next character (cast to an int) from inf

int fputc(int ch, FILE *outf) ... write ch to outf

char *fgets(char *buf, int size, FILE *inf);

int fputs(char *buf, FILE *outf); int fclose(FILE *fp);


The fopen Function14/34

FILE *fopen(char *name, char *mode);


Iterating over Text Files15/34

Character-by-character:

FILE *inf, *outf;
int ch;
while ((ch = getc(inf)) != EOF) { // end-of-file char
	putc(ch, outf);
}

Line-by-line:

FILE *inf, *outf;
char line[BUFSIZE];
while (fgets(line, BUFSIZE, inf) != NULL) {
	puts(line, outf);
}

Assumes inf open for reading, outf open for writing


Exercise: Display Text16/34

Write a program that emulates what cat does

Usage:

$ ./mycat < xyz
$ ./mycat xyz
$ ./mycat abc def ghi


Exercise: Two-way File Merge17/34

Write a program that


Buffering and fflush18/34

The stdio.h library buffers input/output

int fflush(FILE *outf);


Binary Files


Binary Files20/34

Binary files are different to text files

So, functions like getc(), fgets() don't work properly To manipulate binary files, use:


... Binary Files21/34

But why do we need binary files?

We can write all kinds of data as encoded text.

Problems with this approach:

i.e. binary files are more compact and efficient for binary data


... Binary Files22/34

Disadvantages of binary files:

Despite this, binary files are useful for e.g.


The od command23/34

The Unix od command provides

Usage:   od   Format   File

Dumps the contents of File in the specified Format

See   man od   for many more options (e.g. N-byte rather than 2-byte)


... The od command24/34

Examples of od use:

$ cat text
abcABC123!@#
$ od --format=c text
0000000  a  b  c  A  B  C  1  2  3  !  @  #  \n
0000015
$ od --format=d1 text
0000000   97   98   99   65   66   67   49   50   51   33   64   35   10
0000015
$ od --format=x1 text
0000000 61 62 63 41 42 43 31 32 33 21 40 23 0a
0000015
$ od --format=x4 text
0000000 41636261 32314342 23402133 0000000a
0000015

(default is octal data format, hence the name od = "octal dump")


The fwrite function25/34

int fwrite(void *b, size_t z, size_t n, FILE *f);


... The fwrite function26/34

Examples (dump several data structures):

FILE *outf;
int array[50];
struct { float x; float y; } point;
// ... set values in array[] and point

outf = fopen("myDataFile","w");

// ... write array to file
fwrite(array, sizeof(int), 50, outf);

// ... write struct to file
fwrite(point, sizeof(point), 1, outf);


The fread function27/34

int fread(void *b, size_t z, size_t n, FILE *f);


... The fread function28/34

Examples (read in data written above):

FILE *inf;
int array[50], n;
struct { float x; float y; } point;

inf = fopen("myDataFile","r");

// ... read array from file
if (fread(array, sizeof(int), 50, outf) != 50)
	fprintf(stderr, "Can't read array\n");

// ... read struct from file
if (fread(point, sizeof(point), 1, outf) != 1)
	fprintf(stderr, "Can't read struct\n");

For a more extensive example:
testfread.c


Reading/Writing Dynamic Structures29/34

You cannot write-then-read pointer values.

Pointer values refer to memory configuration in one process.

Subsequent processes may have different configuration.

What you can do for linked structures:


Exercise: Persistent Linked-List30/34

Write a program to maintain a list of ints in a file


Random-access to Files31/34

Files are typically sequential data structures.

Most common access pattern:

Two operations (fseek() and ftell()) provide random access.


The fseek function32/34

int fseek(FILE *f, long int offset, int whence);


... The fseek function33/34

Examples of fseek() usage:

FILE *fp;  // open for reading and/or writing

// move cursor to start of file (rewind)
fseek(fp, 0L, SEEK_SET);

// move cursor to end of file
fseek(fp, 0L, SEEK_END);

// backup one byte in file
fseek(fp, -1L, SEEK_CUR);

For a more extensive example:
testseek.c


The ftell function34/34

long ftell(FILE *f);

Example of use:

FILE *fp; // open for reading and/or writing
... add some data to stream fp ...
long here = ftell(fp);      // save current location
... add more text to stream fp ...
fseek(fp, here, SEEK_SET);  // return to known location


Produced: 5 Oct 2016