Some Computer Hints


Perl Script - Subtitle Converter

These days DVD rips are very popular ways to store and watch movies on PCs. High quality video and sound can be stored on one or two conventional data CDs and you have similar playback options as in standard DVDs. One of these options is to display subtitles on the screen. There are different subtitle formats. Most subtitle files are normal text files. However, sometimes the subtitle’s timing does match exactly to the sound of your movie at hand. There are several tools on Internet which can be used to fix such problems.

If you do not want to use these rather complex tools, you can use the following simple Perl script:

#!/usr/bin/perl
# st_conv.pl - Subtitle conversion tool.
#	See usage below.
# Fedon Kadifeli, November 2002 - February 2005.

use strict;
use warnings;

MAIN:
{
my $shift = 0;
my $shift_time_mode = 0;
my $mult = 1.0;
my $framerate = 1000;
my $fromsub;
my $tosub;
my $total = 0;
my $conv = 0;
my $wconv = 0;
my $o1;
my $o2;
my $subtitle;

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

sub CheckLines {
  my @lines = split /\|/, shift;
  my $nl = 0;
  my $ret = "";
  for (@lines) {
    my $ll = length;
    if ($ll > 50) {
      my $hl = int ($ll / 2);
      s/^(.{$hl,}?.*?) (.*)$/$1|$2/;
      warn "*** At $total. Too long ($ll char) subtitle split!\n";
      $nl++;
    }
    $ret .= "|$_";
    $nl++;
  }
  warn "*** At $total. $nl-line subtitle!\n" if ($nl > 3);
  return substr($ret,1);
} # CheckLines

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

sub TimetoString {
  my $t = shift;
  my $dp = shift;
  my $hh = int ($t / 3600);
  my $mm = int (($t - $hh * 3600) / 60);
  my $ss = int ($t - $hh * 3600 - $mm * 60);
  my $ttt = int (($t - int ($t)) * 1000);
  my $tstr = sprintf ("%02d:%02d:%02d%s%03d", $hh, $mm, $ss, $dp, $ttt);
  return $tstr;
} # TimetoString

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

sub ReadSubtitle {
  return 0 if (eof FH);
  my $ret = 1;
  if ($fromsub) {
    $total++;
    $_ = <FH>;
    if (/^\{(\d*)}\{(\d*)}(.*?)\r?\n?$/) {
      $o1 = $1;
      $o2 = $2;
      $subtitle = $3;
      $conv++;
    } else {
      warn "*** At $total. Incorrect .sub format!\n";
      $ret++;
    }
  } else {
    $total++;
    $_ = <FH>;
    unless (/^ *\d+ *\s*$/) {
      warn "*** At $total. Number expected. $_ found!\n";
      $ret++;
    }
    return 0 if (eof FH);
    $_ = <FH>;
    if (/^(\d\d):(\d\d):(\d\d)[\.,](\d\d\d) --> (\d\d):(\d\d):(\d\d)[\.,](\d\d\d).*$/) {
      $o1 = $1*3600 + $2*60 + $3 + $4 / 1000.0;
      $o2 = $5*3600 + $6*60 + $7 + $8 / 1000.0;
      $subtitle = "";
      while (<FH>) {
        last if (/^\s*$/);
        $_ =~ s/\r?\n?$//;
        $subtitle .= "|" . $_;
      }
      $subtitle =~ s#^\|##;
      $conv++;
    } else {
      warn "*** At $total. .srt info expected. $_ found!\n";
      $ret++;
    }
  }
  return $ret;  # ==1 if OK; >1 if warning(s)
} # ReadSubtitle

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

sub WriteSubtitle {
  $o1 *= $mult;
  $o2 *= $mult;
  $subtitle = "." if $subtitle eq "";
  $subtitle = CheckLines($subtitle);
  if ($tosub) {
    if (!$fromsub) { # .srt -> .sub
      $o1 *= $framerate;
      $o2 *= $framerate;
    }
    if ($shift) {
      my $shift_frames = $shift;
      $shift_frames *= $framerate if $shift_time_mode;
      $o1 += $shift_frames;
      $o2 += $shift_frames;
    }
    if ($o1 > 0 && $o2 > 0) {
      $o1 = int ($o1 + 0.5);
      $o2 = int ($o2 + 0.5);
      $wconv++;
      print OFH "{$o1}{$o2}$subtitle\r\n";
    }
  } else {
    if ($fromsub) { # .sub -> .srt
      $o1 /= $framerate;
      $o2 /= $framerate;
    }
    if ($shift) {
      my $shift_time = $shift;
      $shift_time /= $framerate if ! $shift_time_mode;
      $o1 += $shift_time;
      $o2 += $shift_time;
    }
    if ($o1 > 0 && $o2 > 0) {
      $o1 = TimetoString ($o1, ',');
      $o2 = TimetoString ($o2, ',');
      $subtitle =~ s#\|#\r\n#g;
      $wconv++;
      print OFH "$wconv\r\n$o1 --> $o2\r\n$subtitle\r\n\r\n";
    }
  }
} # WriteSubtitle

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

my $usage='
Usage:	st_conv.pl infile outfile [action(s)]

action:	(+|-)(frames|hh:mm:ss.ttt)
	*ddd.ddd
	@framerate [eg. 23.976, 25, 29.97]
';

die "$usage" if (($#ARGV < 1) || ($#ARGV > 4));

my $inf = shift;
my $outf = shift;

for my $action (@ARGV) {
  if ($action =~ /^([-+])(\d\d):(\d\d):(\d\d)[\.,](\d\d\d)$/) {
    $shift = $2*3600 + $3*60 + $4 + $5 / 1000.0;
    $shift = - $shift if ($1 eq "-");
    $shift_time_mode = 1;
  } elsif ($action =~ /^([-+])(\d*)$/) {
    $shift = $action * 1.0;
    $shift_time_mode = 0;
  } elsif ($action =~ /^\*(\d*\.\d*)$/) {
    $mult = $1 * 1.0;
  } elsif ($action =~ /^\@(\d+.?\d*)$/) {
    $framerate = $1 * 1.0;
  } else {
    die "Format of action ($action) is incorrect!\n$usage";
  }
} # for

if ($inf =~ /\.sub$/i) {
  $fromsub = 1;
} elsif ($inf =~ /\.srt$/i) {
  $fromsub = 0;
} else {
  die "File extension of first file must be .sub or .srt!\n$usage";
}
if ($outf =~ /\.sub$/i) {
  $tosub = 1;
} elsif ($outf =~ /\.srt$/i) {
  $tosub = 0;
} else {
  die "File extension of second file must be .sub or .srt!\n$usage";
}
my $frdisp = "";
if ( ($fromsub != $tosub) ||
     ($shift && $fromsub && $tosub && $shift_time_mode) ||
     ($shift && !$fromsub && !$tosub && !$shift_time_mode) ) {
  die "Framerate is not between 10 and 50!\n$usage" if ($framerate < 10) || ($framerate > 50);
  $frdisp = " - At $framerate frames/sec";
}

die "Cannot open input file!\n" unless (open(FH, "<$inf"));
die "Output file $outf exists!\n" if (open(OFH, "<$outf"));
close OFH;
die "Cannot open output file!\n" unless (open(OFH, ">$outf"));

warn "$inf  ==>  $outf\n\n";
warn "Action:\t" .
  "Shift by $shift " . ($shift_time_mode?"seconds":"frames") .
  " - Multiply by $mult" .
  $frdisp . "\n\n";

while (my $ret = ReadSubtitle) {
  WriteSubtitle if $ret == 1;
}

close FH;
close OFH;

warn "*** Conversion error! Input file probably in wrong format!\n" if ($conv != $total);
warn "$wconv out of $total lines converted.\n";
} # MAIN:

This script recognizes two different formats in input files:

  1. The frame format consists of subtitles specified in a single line. This type of files must have the extension .sub. For example:
    {33642}{33686}- I have to talk to you.|- Talk later?
    Here, the first two numbers represent the starting and ending frame numbers in the movie when the corresponding text (the rest of the line) will be displayed.
  2. The time format consists of paragraphs each separated by a blank line. This type of files must have the extension .srt. For example:
    172
    00:16:21,170 --> 00:16:24,129
    - Can't we go somewhere else?
    - Where?
    Here the first line is the subtitle sequence number, the second line specifies the starting and ending time in the movie when the corresponding text (the remaining lines up to the first empty line) will be displayed.

The script takes at least two parameters: The first one specifies the name of the input file (the file that will be read), the second one specifies the name of the output file (the file that will be created). The file extensions should be either .sub or .srt as described above. If they are different, then the necessary file format conversion will be done.

If you specify more than two parameters, these will indicate the actions to be done on subtitles. You can shift forward or backward or multiply the subtitle time information. Allowable formats for actions are:
+frame
-frame
+hh:mm:ss.ttt
-hh:mm:ss.ttt
*ddd.ddd
@framerate (e.g. 23.976, 25, 29.97)

The framerate is necessary if you convert a file from .sub to .srt or vice versa or if the shift amount is given in a form that is not compatible with the format of the files.

Frame (which must be an integer number) will be added to or subtracted from both frame numbers of each line in the file. Similarly, hh:mm:ss.ttt (which must consist of the four numbers: hours, minutes, seconds, and milliseconds) will be added to or subtracted from both times in the file. The multiplier ddd.ddd (which must be a decimal number like 1.00125) will be multiplied with the subtitle time information in the file.