Processing math: 100%
+ - 0:00:00

Learning Perl

(5th edition)

Lijia Yu

Go to directly to slide source

1 / 47

What is it?

I have read the Llama book many times, and in this year (2014) I began help friends to learn Perl language with this book.

I made this slide to conclude some tips and useful script we learned*. We use Perl in bioinformatics field, so there are some simple bioinformatics program examples in this slide.

* All chapters of the Learning Perl book are useful, but we only learned the functions, structures (and others) in common usage.
2 / 47

What is it?

Chapters

  • Scalar Data

  • Lists and Arrays

  • Subroutines

  • Input and Output

  • Hashes

  • In the World of Regular Expressions

  • Matching with Regular Expressions

  • Processing Text with Regular Expressions

  • More Control Structures

  • Strings and Sorting

3 / 47

Scalar Data

(Larning Perl: Chapter 2)

4 / 47

* Numbers

  • Floating-Point Literals
    1.25
    255.00
    7.25e45 # 7.25 times 10 to the 45th power (a big number)
    -12.E-23 # another way to say that - the E may be uppercase
    
  • Integer Literals
    0
    -40
    3249089384 # also can write as 3_249_089_384
    
  • Nondecimal Integer Literals
    0377 # 377 octal, same as 255 decimal
    0xff # FF hex, also 255 decimal
    0b11111111 # also 255 decimal
    
  • Numeric Operators
    # + plus, - minus, * times, / divided
    10.2 / 0.3 # 10.2 divided by 0.3, or 34
    10 / 3 # always floating-point divide, so 3.3333333...
    
5 / 47

* Strings

  • Single-Quoted String Literals

    'fred' # those four characters: f, r, e, and d
    '' # the null string (no characters)
    'hello\n' # hello followed by backslash followed by n
    
    Note that the \n within a single-quoted string is not interpreted as a newline, but as the two characters backslash and n.
  • Double-Quoted String Literals

    "hello world\n" # hello world, and a newline
    "coke\tsprite" # cook, a tab, and sprite
    
6 / 47
  • String Operators

String values can be concatenated with the . operator. (Yes, that’s a single period.)

"hello" . "world" # same as "helloworld"
"hellow" . '' . "world" # same as 'hello world'
'hello world' . "\n" # same as "hello world\n"

A special string operator is the string repetition operator, consisting of the single lowercase letter x. This operator takes its left operand (a string) and makes as many concatenated copies of that string as indicated by its right operand (a number).

fred" x 3        # is "fredfredfred"
"barney" x (4+1) # is "barney" × 5, or "barneybarneybarneybarneybarney"
5 × 4            # is really "5" × 4, which is "5555"
7 / 47

* Perl’s Built-in Warnings

Perl can be told to warn you when it sees something suspicious going on in your pro- gram. To run your program with warnings turned on, use the -w option on the com- mand line:

$ perl -w my_program

Or, if you always want warnings, you may request them on the #! line:

#!/usr/bin/perl -w

8 / 47

* Scalar Variables

A variable is a name for a container that holds one or more values. A scalar variable holds a single scalar value.

Scalar variable names begin with a dollar sign followed by what we’ll call a Perl identifier: a letter or underscore, and then possibly more letters, or digits, or underscores. Another way to think of it is that it’s made up of alphanumerics and underscores, but can’t start with a digit. Upper- and lowercase letters are distinct: the variable $Fred is a different variable from $fred.

9 / 47
  • Scalar Assignment
$fred = 17; # give $fred the value of 17
$barney = 'hello'; # give $barney the five-character string 'hello'
$barney = $fred + 3; # give $barney the current value of $fred plus 3 (20)
$barney = $barney * 2; # $barney is now $varney multiplied by 2 (40)
  • Binary Assignment Operators

Expressions like $fred = $fred + 5 (where the same variable appears on both sides of an assignment) occur frequently enough that Perl (like C and Java) has a shorthand for the operation of altering a variable—the binary assignment operator.

$fred = $fred + 5; # without the binary assignment operator
$fred += 5; # with the binary assignment operator
  • Output with print

The print operator takes a scalar argument and puts it out without any embellishment onto standard output.

print "hello world\n"; # say hello world, followed by a newline
10 / 47
  • Interpolation of Scalar Variables into Strings

This means that any scalar variable† name in the string is replaced with its current value.

$meal = "brontosaurus steak";
$barney = "fred ate a $meal"; # $barney is now "fred ate a brontosaurus steak"
$barney = 'fred ate a ' . $meal; # another way to write that
11 / 47

Introduction

eiπ+1=0

Test LATEX

12 / 47

Lists and Arrays

(Larning Perl: Chapter 3)

13 / 47
14 / 47

Subroutines

(Larning Perl: Chapter 4)

15 / 47

* Defining a Subroutine

To define your own subroutine, use the keyword sub, the name of the subroutine (without the ampersand), then the indented block of code (in curly braces), which makes up the body of the subroutine,

sub marine {
  $n += 1; # Global variable $n
  print "Hello, sailor number $n!\n";
}

* Invoking a Subroutine

&marine; # or just marine;
16 / 47

* Return Values

As Perl is chugging along in a subroutine, it is calculating values as part of its series of actions. Whatever calculation is last performed in a subroutine is automatically also the return value.

sub sum_of_fred_and_barney {
  print "Hey, you called the sum_of_fred_and_barney subroutine!\n";
  $fred + $barney; # That's the return value
}

# This subroutine returns the larger value of $fred or $barney
sub larger_of_fred_or_barney {
  if ($fred > $barney) {
    $fred;
  } else {
    $barney;
  }
}
17 / 47

* Arguments

Perl has subroutine arguments. To pass an argument list to the subroutine, simply place the list expression, in parentheses, after the subroutine invocation.

$n = &max(10,15); # This sub call has two parameters

sub max {
  # Compare this to &larger_of_fred_or_barney
  if ($_[0] > $_[1]) {
    $_[0];
  } else {
    $_[1];
  }
}

$n = &max(10, 15, 27); # Oops! can not compare
18 / 47

* Private Variables in Subroutines

By default, all variables in Perl are global variables; that is, they are accessible from every part of the program. But you can create private variables called lexical variables at any time with the my operator:

sub max {
  my($m, $n); # new, private variables for this block
  ($m, $n) = @_; # give names to the parameters
  if ($m > $n) { $m } else { $n }
} # These variables are private (or scoped) to the enclosing block; 
# any other $m or $n is totally unaffected by these two. 

my($m, $n) = @_; # Name the subroutine parameters
19 / 47

* Notes on Lexical (my) Variables

The my operator doesn’t change the context of an assignment

my($num) = @_; # list context, same as ($num) = @_;
my $num = @_; # scalar context, same as $num = @_;

In the first one, $num gets the first parameter, as a list-context assignment; in the second, it gets the number of parameters, in a scalar context.

20 / 47

* Notes on Lexical (my) Variables

The my operator doesn’t change the context of an assignment

Without the parentheses, my only declares a single lexical variable

my $fred, $barney; # WRONG! Fails to declare $barney
my($fred, $barney); # declares both
21 / 47

* The use strict Pragma

Perl tends to be a pretty permissive language.* But maybe you want Perl to impose a little discipline; that can be arranged with the use strict pragma.

A pragma is a hint to a compiler, telling it something about the code. In this case, the use strict pragma tells Perl’s internal compiler that it should enforce some good programming rules for the rest of this block or source file.

Most people recommend that programs that are longer than a screenful of text generally need use strict.

* The return Operator

The return operator immediately returns a value from a subroutine

22 / 47

* Persistent, Private Variables

Declaring our variable with state tells Perl to retain the variable’s value between calls to the subroutine and to make the variable private to the subroutine

use 5.010;
sub marine {
  state $n = 0; # private, persistent variable $n
  $n += 1;
  print "Hello, sailor number $n!\n";
}
23 / 47

Subroutines example

A Better &max Routine

$maximum = &max(3, 5, 10, 4, 6);
sub max {
  my($max_so_far) = shift @_;
  foreach (@_) {
    if ($_ > $max_so_far) {
      $max_so_far = $_;
    }
  }
  $max_so_far;
}
24 / 47

Input and Output

(Larning Perl: Chapter 5)

25 / 47

* Input from Standard Input

while (defined($_ = <STDIN>)) {
print "I saw $_";
}

we’re reading the input into a variable, checking that it’s defined, and if it is (meaning that we haven’t reached the end of the input) we’re running the body of the while loop.

* Output to Standard Output

print @array;  # print a list of items
print "@array";  # print a string (containing an interpolated array)

if @array holds qw/ fred barney betty /, the first one prints fredbarneybetty, while the second prints fred barney betty separated by spaces.

26 / 47

Useful script

print <>; # source code for 'cat'
print sort <>; # source code for 'sort'

* Perl makes it very easy for you to read input, from either the keyboard or a file, with the Diamond Operator <> . Each call to this operator will return one line from the current input source, which can be stored in a variable for later use.

27 / 47

* Formatted Output with printf

printf "%g %g %g\n", 5/2, 51/17, 51 ** 17;  #2.5 3 1.0683e+29

printf "in %d days!\n", 17.85;  #in 17 days!

printf "%6d\n", 42;  #output like ~~~~42 (the ~ symbol stands for a space)

printf "%10s\n", "wilma";  #looks like ~~~~~wilma

printf "%-15s\n", "flintstone";  #looks like flintstone~~~~~

printf "%12f\n", 6 * 7 + 2/3;  #looks like ~~~42.666667

printf "%12.3f\n", 6 * 7 + 2/3;  #looks like ~~~~~~42.667

printf "%12.0f\n", 6 * 7 + 2/3;  #looks like ~~~~~~~~~~43
28 / 47

* Filehandles

A filehandle is the name in a Perl program for an I/O connection between your Perl process and the outside world. That is, it’s the name of a connection, not necessarily the name of a file. Filehandles are named like other Perl identifiers (with letters, digits, and underscores, but they can’t start with a digit).

But there are also six special filehandle names that Perl already uses for its own pur- poses: STDIN, STDOUT, STDERR, DATA, ARGV, and ARGVOUT.* Although you may choose any filehandle name you’d like,

you shouldn’t choose one of those six unless you intend to use that one’s special properties.
$ ./your_program <dino >wilma

That command tells the shell that the program’s input should be read from the file dino, and the output should go to the file wilma.

29 / 47

* Opening a Filehandle


open CONFIG, "dino";  # only open file dino
open CONFIG, "<dino"; # the same as above

# open dino for writing someting in this file.  
# we’re sending the output to a new file called fred. 
# If there’s already a file of that name, 
# we’re asking to wipe it out and replace it with this new one.
open BEDROCK, ">fred"; 

open LOG, ">>logfile"; # open dino for appending
30 / 47

* Closing a Filehandle

close BEDROCK;
  • Perl will automatically close a filehandle if you reopen it (that is, if you reuse the filehandle name in a new open) or if you exit the program.

  • In general, it’s best to close each filehandle soon after you’re done with it, though the end of the program often arrives soon enough.

31 / 47

* Fatal Errors with die

The die function prints out the message you give it (to the standard error stream, where such messages should go) and makes sure that your program exits with a nonzero exit status.


if ( ! open LOG, ">>logfile") {
die "Cannot create logfile: $!";
}

If the open fails, die will terminate the program and tell you that it cannot create the logfile.

*Traditionally, it is 0 for success and a nonzero value for failure.

*In general, when the system refuses to do something we’ve requested (like opening a file), $! will give you a reason (perhaps “permission denied” or “file not found,” in this case).

32 / 47

I/O Example

#!/usr/bin/perl -w 

$thepath="/home/yulj/project"; 
$gen_file="$thepath/mouse.fastq";

open DATA, $gen_file or die "cannot open file";

while (<DATA>) { 
   $out_file="$thepath/mouse.txt";
   open LOG,">$out_file" or die "Couldn't open $out_file: $!";
   print LOG $_; 
   close LOG; 
} 

close DATA;
33 / 47

Hashes

(Larning Perl: Chapter 6)

34 / 47

What Is a Hash?

A hash is a data structure, not unlike an array in that it can hold any number of values and retrieve them at will. But instead of indexing the values by number, as we did with arrays, we’ll look up the values by name. That is, the indices (here, we’ll call them keys) aren’t numbers, but instead they are arbitrary unique strings. Hashes
35 / 47

What Is a Hash?

Why Use a Hash?

Word, count of number of times that word appears

The idea here is that you want to know how often each word appears in a given document. We can use hashes to count the nucleotide percentage for DNA/RNA sequence.

36 / 47

* Hash Element Access

# This is similar to what we used for array access, 
# but here we use curly braces instead of square brackets around the subscript (key).
$hash{$some_key}

$family_name{"fred"} = "flintstone";
$family_name{"barney"} = "rubble";

foreach $person (qw< barney fred >) {
print "I've heard of $person $family_name{$person}.\n";
}

#The name of the hash is like any other Perl identifier (letters, digits, and underscores,
but can’t start with a digit). 

$family_name{"wilma"} = "flintstone"; # adds a new key (and value)
$family_name{"betty"} .= $family_name{"barney"}; # creates the element if needed
37 / 47

* The Hash As a Whole

To refer to the entire hash, use the percent sign (%) as a prefix. So, the hash we’ve been using for the last few pages is actually called %family_name.

For convenience, a hash may be converted into a list, and back again. Assigning to a hash is a list-context assignment, where the list is made of key-value pairs:

%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
              "wilma", 1.72e30, "betty", "bye\n");

The value of the hash (in a list context) is a simple list of key-value pairs:

@any_array = %some_hash;

print "@any_array\n";
# might give something like this:
# betty bye (and a newline) wilma 1.72e+30 foo 35 2.5 hello bar 12.4

The order is jumbled because

Perl keeps the key-value pairs in an order that’s conven- ient for Perl so that it can look up any item quickly.
38 / 47

* Hash Assignment

%new_hash = %old_hash; # assignment
%inverse_hash = reverse %any_hash; # inverse hash

This takes %any_hash and unwinds it into a list of key-value pairs, making a list like (key, value, key, value, key, value, ...). Then reverse turns that list end-for-end, making a list like (value, key, value, key, value, key, ...).

This will work properly only if the values in the original hash were unique—otherwise, we’d have duplicate keys in the new hash, and keys are always unique. Here’s the rule that Perl uses: the last one in wins.

39 / 47

* The Big Arrow

When assigning a list to a hash, sometimes it’s not obvious which elements are keys and which are values.

# not clearly
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
              "wilma", 1.72e30, "betty", "bye\n");

Here, it’s easy (or perhaps at least easier) to see whose name pairs with which value, even if we end up putting many pairs on one line.

my %last_name = ( # a hash may be a lexical variable
  "fred"    =>  "flintstone",
  "dino"    =>  undef,
  "barney"  =>  "rubble",
  "betty"   =>  "rubble",
);
40 / 47

* Hash Functions

  • The keys and values Functions
# @k will contain "a", "b", and "c", and @v will contain 1, 2, and 3
my %hash = ("a" => 1, "b" => 2, "c" => 3);
my @k = keys %hash;
my @v = values %hash;
my $count = keys %hash; # gets 3, meaning three key-value pairs
  • The each Function

In practice, the only way to use each is in a while loop, something like this:

while ( ($key, $value) = each %hash ) {
  print "$key => $value\n";
}
  • The exists Function and delete Function
if (exists $books{"dino"}) {
  print "Hey, there's a library card for dino!\n";
} # That is to say, exists $books{"dino"} will return a true value if (and only if) dino is
# found in the list of keys from keys %books.

delete $book{"dino"}; # after a delete, the key can’t exist in the hash, 
#but after storing undef, the key must exist
41 / 47

Hash example

Here is a example of double hash.

#! /bin/perl -w
$hash{'orange'}{'red'} = 'red orange';
$hash{'orange'}{'green'} = 'green orange';
$hash{'apple'}{'red'} = 'red apple';
$hash{'apple'}{'green'} = 'green apple';
# $kkk=\%hash; 
foreach $fruit (keys %hash)
{
    foreach $color (keys %{$hash{$fruit}})
    {
        print $hash{$fruit}{$color},"\n";
        # print $kkk->{$fruit}->{$color} , "\n";
    }
}
42 / 47

In the World of Regular Expressions

(Larning Perl: Chapter 7)

43 / 47

* What Are Regular Expressions?

A regular expression, often called a pattern in Perl, is a template that either matches or doesn’t match a given string.

  • Don’t confuse regular expressions with shell filename-matching patterns, called globs. A typical glob is what you use when you type .pm to the Unix shell to match all filenames that end in .pm. The previous example uses a glob of chapter.txt. (You may have noticed that you had to quote the pattern to prevent the shell from treating it like a glob.)
44 / 47

* Using Simple Patterns

To match a pattern (regular expression) against the contents of $_, simply put the pat- tern between a pair of forward slashes (/), like we do here:

$_ = "yabba dabba doo";
  if (/abba/) {
  print "It matched!\n";
}

The expression /abba/ looks for that four-letter string in $_; if it finds it, it returns a true value.

Because the pattern match is generally being used to return a true or false value, it is almost always found in the conditional expression of if or while.

All of the usual backslash escapes that you can put into double-quoted strings are available in patterns, so you could use the pattern /coke\tsprite/ to match the 11 characters of coke, a tab, and sprite.

45 / 47

* About Metacharacters

46 / 47

Thanks for listening.

Any questions?

Please send me an email

Slideshow created using remark.

47 / 47

What is it?

I have read the Llama book many times, and in this year (2014) I began help friends to learn Perl language with this book.

I made this slide to conclude some tips and useful script we learned*. We use Perl in bioinformatics field, so there are some simple bioinformatics program examples in this slide.

* All chapters of the Learning Perl book are useful, but we only learned the functions, structures (and others) in common usage.
2 / 47
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
f Toggle fullscreen mode
c Clone slideshow
p Toggle presenter mode
w Pause/Resume the presentation
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow