Data format

Data format

The data is stored in several directories and files:

  • original - here are the acqired xml-files.
  • lineStrokes - the devided lines in on-line format (example)
  • lineImages - the devided lines in off-line format (example)
  • writers.xml - a file containing information of the writers
  • ascii - the transcriptions as plain text-files
  • forms.txt - a file containing the mapping of forms to writers

Format of the XML-files

The xml-files contain the following informations:
  • Form
    • id - The unique form id
    • writerID - the id of the writer
    • saveTime - the Time of saving return ((((systemTime.wYear*12+systemTime.wMonth)*31+systemTime.wDay)*24 + systemTime.wHour)*60 + systemTime.wMinute)*60 + systemTime.wSecond + (systemTime.wMilliseconds)/1000.f;
  • CaptureTime - the time information in a more readable format
  • Setting
    • location - location of the recording
    • producer - responsible transcriber
    • system - the used recording software
  • Text - the full ascii-transcription of the written data
  • TextLine - a text-line with a unique id
  • Word - a word with a unique id
  • Char - a character with a unique id
  • WhiteboardDescription
    • SensorLocation - the corner, where the eBeam-sensor has been
    • DiagonallyOppositeCoords - hte coordinates of the opposite corner (marked by the acquiring person)
    • VerticallyOppositeCoords - hte coordinates of the opposite corner (marked by the acquiring person)
    • HorizontallyOppositeCoords - hte coordinates of the opposite corner (marked by the acquiring person)
  • StrokeSet
    • Stroke - A Stroke with its coulour and time information
    • Point - A Point with x, y and time value

See xml-format and xml-format.dtd for a DTD-file of the format.

See strokesz.xml for an example of an xml-file.

Document Actions