Lesson 5, Bit 4: Searching and Replacing

Now we have some methods which can help us to locate (search) for specific characters or strings, and even replace them with something different.

The find Method

The  find method searches for the position of one string within another.  It returns the index value which represents the beginning of the found string.

Code Output
fruit = 'banana'
index = fruit.find('a')
print(index)
1

In this example, we invoke find on word and pass the letter we are looking for as a parameter.

The find method can find substrings as well as characters:

Code Output
fruit = 'banana'
index = fruit.find('na')
print(index)
2

It can take as a second argument the index where it should start.  In this example, we are starting with the 3rd index (or 4th character).

Code Output
fruit = 'banana'
index = fruit.find('na', 3)
print(index)
4

If the substring is not found, then it will return a value of -1.

Code Output
fruit = 'banana'
index = fruit.find('apple')
print(index)
-1

The find method can also has an optional stop parameter. With stop, you can indicate where you wish to stop comparing the string.  The numbers used for start and stop and the index values in the string.  You cannot use stop without start, but you can use start by itself. When you use start and stop, it means "from and including the start index to, and excluding, the stop index".

Code Output Notes
word = 'banana'
index = word.find('na', 2, 5)
print(index)
2

We are once again starting at the 2nd index, but we're stopping at the 5th index. This means we are only searching the substring 'nan' (since we are excluding the 6th index).  'nan' does contain 'na' therefore find returns as the index position of the phrase found in the substring as it relates to the original string.

word = 'banana'
index = word.find('na', 4, 5)
print(index)
-1 Now we are only looking at the 4th index which is 'n'. Obviously, 'na' is not found in 'n' so a -1 is returned.

The replace Method

The  replace  method will return a copy of the string with all occurrences of old substring replaced by new.  By default it will replace all occurrences.

Code Result Notes
fruit = 'banana'

print(fruit.replace('a', 'i'))

print(fruit)
binini
banana

We replaced all instances of the letter 'a' in 'banana' with the letter 'i'.  Notice that we did not overwrite the original fruit variable.

If the optional argument count is given, only the first count occurrences are replaced.

Code Result Notes
fruit = 'banana'

print(fruit.replace('a', 'i', 2))
binina

We set count=2, so only the first two instances of the letter 'a' were replaced with the letter 'i'

As with the find method, the replace method can also find and replace substrings as well as characters.

Code Output
fruit = 'banana'

print(fruit.replace('a', '123', 2))
b123n123na
print(fruit.replace('a', '\nhi\n', 2)) b
hi
n
hi
na

The startswith Method

While this one does not begin with the word "is" it still returns a Boolean value.

The startswith method returns True if the string starts with the specified prefix, False otherwise.

Code Result Notes
line = 'Please have a nice day'
line.startswith('Please')
True

The string begins with 'Please', therefore startswith returns as True.

line = 'Please have a nice day'
line.startswith('p')
False

The string begins with a capital P, not a lowercase 'p' – therefore startswith returns as False.

You will note that startswith requires the case to match!

The startswith method has two optional arguments: start and stop.  With start, you can indicate where in the string you wish to start searching.  With stop, you can indicate where you wish to stop comparing the string.  The numbers used for start and stop and the index values in the string.  You cannot use stop without start, but you can use start by itself.

When you use start and stop, it means "from and including the start index to, and excluding, the stop index".

Code Result Notes
line = 'Please have a nice day'
line.startswith('have', 7)
True

The start parameter is set at 7, so we start looking at the 7 index of the string.  The stop parameter is not set, so the default is to look until the end of the string.  This means we are looking for the substring 'have a nice day' to start with the string 'have'.  It does, therefore startswith returns as True.

line = 'Please have a nice day'
line.startswith('have', 7, 9)
False

We are once again starting at the 7th index, but we're stopping at the 9th index. This means we are only searching the substring 'ha' (since we are excluding the 9th index).  'ha' does not begin with 'have' therefore startswith returns as False.

The endswith Method

The endswith method returns True if the string ends with the specified prefix, False otherwise.

Code Result Notes
file = 'Lesson 4 Lab.py'
file.endswith('.py')
True

The string ends with '.pyc', therefore endswith returns as True.

file = 'Lesson 4 Lab.py'
file.endswith('.PY')
False

The string ends with a lowercase '.py', not an upper-case '.PY' – therefore endswith returns as False.

You can also use an optional start and stop parameters just like with startswith to narrow your full string down to a substring:

Code Result Notes
file = 'Lesson 4 Lab.py'
file.endswith('.py', 9)
True

The start parameter is set at 9, so we start looking at the 9 index of the string.  The stop parameter is not set, so the default is to look until the end of the string.  This means we are looking for the substring 'Lab.py' to end with the string '.py'.  It does, therefore endswith returns as True.

file = 'Lesson 4 Lab.py'
file.endswith('.py', 9, 12)
False

We are once again starting at the 9th index, but we're stopping at the 12th index. This means we are only searching the substring 'Lab' (since we are excluding the 12th index).  'Lab' does not end with '.py' therefore endswith returns as False.

Launch Exercise

Chaining Multiple Methods

There are times when you need to call multiple methods on a single string.  You can do that line by line:

Line Code Result
1 line = 'Please have a nice day' Please have a nice day
2 line.lower() please have a nice day
3 line.startswith('p') True

In the last example, the method lower is called and then we use startswith to see if the resulting lowercase string starts with the letter "p".

As long as we are careful with the order, we can make multiple method calls in a single expression (look closely at line 2 below).

Line Code Result
1 line = 'Please have a nice day' Please have a nice day
2 line.lower().startswith('p') True

Line 2 in the above example combined Lines 2 and 3 in the previous example into a single statement.  This is called chaining.

Example: Parsing Strings using find

Often, we want to look into a string and find a substring. For example if we were presented a series of lines formatted as follows:

From stephen.marquard@ uct.ac.za Sat Jan  5 09:14:16 2008

and we wanted to pull out only the second half of the address (i.e., uct.ac.za) from each line, we can do this by using the find method and string slicing.

First, we will find the position of the at-sign in the string. Then we will find the position of the first space after the at-sign. And then we will use string slicing to extract the portion of the string which we are looking for.  Start with this line of code:

data = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'

Now, let's try to only get the domain of the email address:

Code Output Notes
start_pos = data.find('@')   First we use the find method to get the index value of the @ sign. We assign this index value to a variable called start_pos (for start position).

print(start_pos) 21

When we display the value of start_pos, we get 21.  Since indices begin with the number 0, this means that the @ sign is the 21st index or 22nd character of the string.

end_post = data.find(' ', start_pos)   Now we want to find where the domain ends.  The first character after the domain is a space, so we can search for that.  But there is a space before the email address.  We use the second, optional argument of start, to pass a starting position.  In this case, we want to start where the "@" was found.

print(end_post) 31

When we display the value of end_pos, we get 31.  Since indices begin with the number 0, this means that the first space after the @ sign is the 31st index or 32nd character of the string.

domain = data[start_pos+1:end_pos]   Now we can slice our string from the starting index, start_pos, + 1 (since we don't actually need the @ sign) to and excluding the ending index of end_pos.

print(domain) uct.ac.za

When we display the value of domain, we get uct.ac.za as expected.

Splitting Strings

Finally, we can separate a single string into multiple strings using the split method.  The split method returns a list of the words or substrings in the string, using sep as the delimiter string.  If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

Code Result Notes
file = 'Lesson 5 Lab.py'
words = file.split()
print(words)
['Lesson', '5', 'Lab.py']

We did not specify sep, so it defaulted to splitting the string using spaces.  We now have a list with these items:

Lesson
5
Lab.py

file = 'Lesson 5 Lab.py'
words = file.split('.')
print(words)
['Lesson 5 Lab', 'py']

This time we specified sep, We chose to split this string using the period as the delimiter. We now have a list with these items:

Lesson 5 Lab
py

A second argument, maxsplit, allows us to only split the string a certain number of times.  Let's say we only wanted to pull the first word out.  We would set maxsplit equal to 1 so only one split occurs.  Note that if you are going to use maxsplit, you must also explicitly set sep, even if you want to use its default value.

Code Result Notes
file = 'Lesson 5 Lab.py'
words = file.split(' ', 1)
print(words)
['Lesson', '5 Lab.py']

We only want to split this string once, so we specify maxsplit to equal 1. Because we're using maxsplit, we have to set the first argument sep.  We now have a list with these items:

Lesson
5 Lab.py

Launch Exercise