I made this slide to conclude some tips and useful script we learned*. We use Perl in bioinformatics field, so there are some simple bioinformatics program examples in this slide.
Scalar Data
Lists and Arrays
In the World of Regular Expressions
Matching with Regular Expressions
Processing Text with Regular Expressions
More Control Structures
Strings and Sorting
(Larning Perl: Chapter 2)
1.25
255.00
7.25e45 # 7.25 times 10 to the 45th power (a big number)
-12.E-23 # another way to say that - the E may be uppercase
0
-40
3249089384 # also can write as 3_249_089_384
0377 # 377 octal, same as 255 decimal
0xff # FF hex, also 255 decimal
0b11111111 # also 255 decimal
# + plus, - minus, * times, / divided
10.2 / 0.3 # 10.2 divided by 0.3, or 34
10 / 3 # always floating-point divide, so 3.3333333...
Single-Quoted String Literals
'fred' # those four characters: f, r, e, and d
'' # the null string (no characters)
'hello\n' # hello followed by backslash followed by n
Double-Quoted String Literals
"hello world\n" # hello world, and a newline
"coke\tsprite" # cook, a tab, and sprite
String values can be concatenated with the . operator. (Yes, that’s a single period.)
"hello" . "world" # same as "helloworld"
"hellow" . '' . "world" # same as 'hello world'
'hello world' . "\n" # same as "hello world\n"
A special string operator is the string repetition operator, consisting of the single lowercase letter x. This operator takes its left operand (a string) and makes as many concatenated copies of that string as indicated by its right operand (a number).
fred" x 3 # is "fredfredfred"
"barney" x (4+1) # is "barney" × 5, or "barneybarneybarneybarneybarney"
5 × 4 # is really "5" × 4, which is "5555"
Perl can be told to warn you when it sees something suspicious going on in your pro- gram. To run your program with warnings turned on, use the -w option on the com- mand line:
$ perl -w my_program
Or, if you always want warnings, you may request them on the #! line:
#!/usr/bin/perl -w
A variable is a name for a container that holds one or more values. A scalar variable holds a single scalar value.
Scalar variable names begin with a dollar sign followed by what we’ll call a Perl identifier:
a letter or underscore, and then possibly more letters, or digits, or underscores.
Another way to think of it is that it’s made up of alphanumerics and underscores,
but can’t start with a digit. Upper- and lowercase letters are distinct:
the variable $Fred
is a different variable from $fred
.
$fred = 17; # give $fred the value of 17
$barney = 'hello'; # give $barney the five-character string 'hello'
$barney = $fred + 3; # give $barney the current value of $fred plus 3 (20)
$barney = $barney * 2; # $barney is now $varney multiplied by 2 (40)
Expressions like $fred = $fred + 5 (where the same variable appears on both sides of an assignment) occur frequently enough that Perl (like C and Java) has a shorthand for the operation of altering a variable—the binary assignment operator.
$fred = $fred + 5; # without the binary assignment operator
$fred += 5; # with the binary assignment operator
The print
operator takes a scalar argument and puts it out without any embellishment onto standard
output.
print "hello world\n"; # say hello world, followed by a newline
This means that any scalar variable† name in the string is replaced with its current value.
$meal = "brontosaurus steak";
$barney = "fred ate a $meal"; # $barney is now "fred ate a brontosaurus steak"
$barney = 'fred ate a ' . $meal; # another way to write that
eiπ+1=0
Test LATEX
(Larning Perl: Chapter 3)
(Larning Perl: Chapter 4)
To define your own subroutine, use the keyword sub
, the name of the subroutine
(without the ampersand), then the indented block of code (in curly braces), which
makes up the body of the subroutine,
sub marine {
$n += 1; # Global variable $n
print "Hello, sailor number $n!\n";
}
&marine; # or just marine;
As Perl is chugging along in a subroutine, it is calculating values as part of its series of actions. Whatever calculation is last performed in a subroutine is automatically also the return value.
sub sum_of_fred_and_barney {
print "Hey, you called the sum_of_fred_and_barney subroutine!\n";
$fred + $barney; # That's the return value
}
# This subroutine returns the larger value of $fred or $barney
sub larger_of_fred_or_barney {
if ($fred > $barney) {
$fred;
} else {
$barney;
}
}
Perl has subroutine arguments. To pass an argument list to the subroutine, simply place the list expression, in parentheses, after the subroutine invocation.
$n = &max(10,15); # This sub call has two parameters
sub max {
# Compare this to &larger_of_fred_or_barney
if ($_[0] > $_[1]) {
$_[0];
} else {
$_[1];
}
}
$n = &max(10, 15, 27); # Oops! can not compare
By default, all variables in Perl are global variables; that is,
they are accessible from every part of the program.
But you can create private variables called lexical variables
at any time with the my
operator:
sub max {
my($m, $n); # new, private variables for this block
($m, $n) = @_; # give names to the parameters
if ($m > $n) { $m } else { $n }
} # These variables are private (or scoped) to the enclosing block;
# any other $m or $n is totally unaffected by these two.
my($m, $n) = @_; # Name the subroutine parameters
my($num) = @_; # list context, same as ($num) = @_;
my $num = @_; # scalar context, same as $num = @_;
In the first one, $num gets the first parameter, as a list-context assignment; in the second, it gets the number of parameters, in a scalar context.
my $fred, $barney; # WRONG! Fails to declare $barney
my($fred, $barney); # declares both
Perl tends to be a pretty permissive language.* But maybe you want Perl to impose a
little discipline; that can be arranged with the use strict
pragma.
A pragma is a hint to a compiler, telling it something about the code. In this case,
the use strict
pragma tells Perl’s internal compiler that
it should enforce some good programming rules for the rest of this block or source file.
Most people recommend that programs that are longer than a screenful of text generally
need use strict
.
The return
operator immediately returns a value from a subroutine
Declaring our variable with state tells Perl to retain the variable’s value between calls to the subroutine and to make the variable private to the subroutine
use 5.010;
sub marine {
state $n = 0; # private, persistent variable $n
$n += 1;
print "Hello, sailor number $n!\n";
}
A Better &max Routine
$maximum = &max(3, 5, 10, 4, 6);
sub max {
my($max_so_far) = shift @_;
foreach (@_) {
if ($_ > $max_so_far) {
$max_so_far = $_;
}
}
$max_so_far;
}
(Larning Perl: Chapter 5)
while (defined($_ = <STDIN>)) {
print "I saw $_";
}
we’re reading the input into a variable, checking that it’s defined, and if it is (meaning that we haven’t reached the end of the input) we’re running the body of the while loop.
print @array; # print a list of items
print "@array"; # print a string (containing an interpolated array)
if @array holds qw/ fred barney betty /
,
the first one prints fredbarneybetty
,
while the second prints fred barney betty
separated by spaces.
Useful script
print <>; # source code for 'cat'
print sort <>; # source code for 'sort'
* Perl makes it very easy for you to read input,
from either the keyboard or a file,
with the Diamond Operator <>
.
Each call to this operator will return one line from the current input source,
which can be stored in a variable for later use.
printf "%g %g %g\n", 5/2, 51/17, 51 ** 17; #2.5 3 1.0683e+29
printf "in %d days!\n", 17.85; #in 17 days!
printf "%6d\n", 42; #output like ~~~~42 (the ~ symbol stands for a space)
printf "%10s\n", "wilma"; #looks like ~~~~~wilma
printf "%-15s\n", "flintstone"; #looks like flintstone~~~~~
printf "%12f\n", 6 * 7 + 2/3; #looks like ~~~42.666667
printf "%12.3f\n", 6 * 7 + 2/3; #looks like ~~~~~~42.667
printf "%12.0f\n", 6 * 7 + 2/3; #looks like ~~~~~~~~~~43
A filehandle is the name in a Perl program for an I/O connection between your Perl process and the outside world. That is, it’s the name of a connection, not necessarily the name of a file. Filehandles are named like other Perl identifiers (with letters, digits, and underscores, but they can’t start with a digit).
But there are also six special filehandle names that Perl already uses for its own pur- poses: STDIN, STDOUT, STDERR, DATA, ARGV, and ARGVOUT.* Although you may choose any filehandle name you’d like,
$ ./your_program <dino >wilma
That command tells the shell that the program’s input should be read from the file dino, and the output should go to the file wilma.
open CONFIG, "dino"; # only open file dino
open CONFIG, "<dino"; # the same as above
# open dino for writing someting in this file.
# we’re sending the output to a new file called fred.
# If there’s already a file of that name,
# we’re asking to wipe it out and replace it with this new one.
open BEDROCK, ">fred";
open LOG, ">>logfile"; # open dino for appending
close BEDROCK;
Perl will automatically close a filehandle if you reopen it (that is, if you reuse the filehandle name in a new open) or if you exit the program.
In general, it’s best to close each filehandle soon after you’re done with it, though the end of the program often arrives soon enough.
The die function prints out the message you give it (to the standard error stream, where such messages should go) and makes sure that your program exits with a nonzero exit status.
if ( ! open LOG, ">>logfile") {
die "Cannot create logfile: $!";
}
If the open fails, die will terminate the program and tell you that it cannot create the logfile.
*Traditionally, it is 0 for success and a nonzero value for failure.
*In general, when the system refuses to do something we’ve requested (like
opening a file), $!
will give you a reason (perhaps “permission denied” or “file not
found,” in this case).
#!/usr/bin/perl -w
$thepath="/home/yulj/project";
$gen_file="$thepath/mouse.fastq";
open DATA, $gen_file or die "cannot open file";
while (<DATA>) {
$out_file="$thepath/mouse.txt";
open LOG,">$out_file" or die "Couldn't open $out_file: $!";
print LOG $_;
close LOG;
}
close DATA;
(Larning Perl: Chapter 6)
The idea here is that you want to know how often each word appears in a given document. We can use hashes to count the nucleotide percentage for DNA/RNA sequence.
# This is similar to what we used for array access,
# but here we use curly braces instead of square brackets around the subscript (key).
$hash{$some_key}
$family_name{"fred"} = "flintstone";
$family_name{"barney"} = "rubble";
foreach $person (qw< barney fred >) {
print "I've heard of $person $family_name{$person}.\n";
}
#The name of the hash is like any other Perl identifier (letters, digits, and underscores,
but can’t start with a digit).
$family_name{"wilma"} = "flintstone"; # adds a new key (and value)
$family_name{"betty"} .= $family_name{"barney"}; # creates the element if needed
To refer to the entire hash, use the percent sign (%) as a prefix. So, the hash we’ve been using for the last few pages is actually called %family_name.
For convenience, a hash may be converted into a list, and back again. Assigning to a hash is a list-context assignment, where the list is made of key-value pairs:
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
"wilma", 1.72e30, "betty", "bye\n");
The value of the hash (in a list context) is a simple list of key-value pairs:
@any_array = %some_hash;
print "@any_array\n";
# might give something like this:
# betty bye (and a newline) wilma 1.72e+30 foo 35 2.5 hello bar 12.4
The order is jumbled because
%new_hash = %old_hash; # assignment
%inverse_hash = reverse %any_hash; # inverse hash
This takes %any_hash and unwinds it into a list of key-value pairs, making a list like (key, value, key, value, key, value, ...). Then reverse turns that list end-for-end, making a list like (value, key, value, key, value, key, ...).
This will work properly only if the values in the original hash were unique—otherwise, we’d have duplicate keys in the new hash, and keys are always unique. Here’s the rule that Perl uses: the last one in wins.
When assigning a list to a hash, sometimes it’s not obvious which elements are keys and which are values.
# not clearly
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
"wilma", 1.72e30, "betty", "bye\n");
Here, it’s easy (or perhaps at least easier) to see whose name pairs with which value, even if we end up putting many pairs on one line.
my %last_name = ( # a hash may be a lexical variable
"fred" => "flintstone",
"dino" => undef,
"barney" => "rubble",
"betty" => "rubble",
);
# @k will contain "a", "b", and "c", and @v will contain 1, 2, and 3
my %hash = ("a" => 1, "b" => 2, "c" => 3);
my @k = keys %hash;
my @v = values %hash;
my $count = keys %hash; # gets 3, meaning three key-value pairs
In practice, the only way to use each is in a while loop, something like this:
while ( ($key, $value) = each %hash ) {
print "$key => $value\n";
}
if (exists $books{"dino"}) {
print "Hey, there's a library card for dino!\n";
} # That is to say, exists $books{"dino"} will return a true value if (and only if) dino is
# found in the list of keys from keys %books.
delete $book{"dino"}; # after a delete, the key can’t exist in the hash,
#but after storing undef, the key must exist
Here is a example of double hash.
#! /bin/perl -w
$hash{'orange'}{'red'} = 'red orange';
$hash{'orange'}{'green'} = 'green orange';
$hash{'apple'}{'red'} = 'red apple';
$hash{'apple'}{'green'} = 'green apple';
# $kkk=\%hash;
foreach $fruit (keys %hash)
{
foreach $color (keys %{$hash{$fruit}})
{
print $hash{$fruit}{$color},"\n";
# print $kkk->{$fruit}->{$color} , "\n";
}
}
(Larning Perl: Chapter 7)
A regular expression, often called a pattern in Perl, is a template that either matches or doesn’t match a given string.
To match a pattern (regular expression) against the contents of $_
, simply put the pat-
tern between a pair of forward slashes (/), like we do here:
$_ = "yabba dabba doo";
if (/abba/) {
print "It matched!\n";
}
The expression /abba/ looks for that four-letter string in $_; if it finds it, it returns a true value.
Because the pattern match is generally being used to return a true or false value, it is almost always found in the conditional expression of if or while.
All of the usual backslash escapes that you can put into double-quoted strings are available in patterns, so you could use the pattern /coke\tsprite/ to match the 11 characters of coke, a tab, and sprite.
I made this slide to conclude some tips and useful script we learned*. We use Perl in bioinformatics field, so there are some simple bioinformatics program examples in this slide.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
f | Toggle fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
w | Pause/Resume the presentation |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |