Lesson 5, Bit 4: Searching and Replacing
Now we have some methods which can help us to locate (search) for specific characters or strings, and even replace them with something different.
The find Method
The find method searches for the position of one string
within another. It returns the index value which represents the beginning
of the found string.
| Code | Output |
|---|---|
fruit = 'banana' |
1 |
In this example, we invoke find on word and pass the letter we are looking for as a parameter.
The find method can find substrings as well as characters:
| Code | Output |
|---|---|
fruit = 'banana' |
2 |
It can take as a second argument the index where it should
start. In this example, we are starting with the
3rd index (or 4th character).
| Code | Output |
|---|---|
fruit = 'banana' |
4 |
If the substring is not found, then it will return a value of
-1.
| Code | Output |
|---|---|
fruit = 'banana' |
-1 |
The find method can also has an optional stop parameter. With
stop, you can indicate where you wish to stop comparing the
string. The numbers used for start and stop and
the index values in the string. You cannot use stop without
start, but you can use start by itself. When you use
start and stop, it means "from and
including the start index to, and
excluding, the stop index".
| Code | Output | Notes |
|---|---|---|
word = 'banana' |
2 |
We are once again starting at the 2nd index, but we're
stopping at the 5th index. This means we are only searching the
substring |
word = 'banana' |
-1 |
Now we are only looking at the 4th index which is
'n'. Obviously, 'na' is not found in
'n' so a -1 is returned.
|
The replace Method
The replace method will return a copy of the string
with all occurrences of old substring replaced by new. By default it will
replace all occurrences.
| Code | Result | Notes |
|---|---|---|
fruit = 'banana' |
binini |
We replaced all instances of the letter 'a' in 'banana' with the letter
'i'. Notice that we did not overwrite the original
|
If the optional argument count is given, only the first
count occurrences are replaced.
| Code | Result | Notes |
|---|---|---|
fruit = 'banana' |
binina |
We set |
As with the find method, the replace method can
also find and replace substrings as well as characters.
| Code | Output |
|---|---|
fruit = 'banana' |
b123n123na |
print(fruit.replace('a', '\nhi\n', 2)) |
b |
The startswith Method
While this one does not begin with the word "is" it still returns a Boolean value.
The startswith method returns True if the string
starts with the specified prefix, False otherwise.
| Code | Result | Notes |
|---|---|---|
line = 'Please have a nice day' |
True |
The string begins with 'Please', therefore |
line = 'Please have a nice day' |
False |
The string begins with a capital P, not a lowercase 'p' – therefore
|
You will note that startswith requires the case to match!
The startswith method has two optional arguments:
start and stop. With start, you can
indicate where in the string you wish to start searching. With
stop, you can indicate where you wish to stop comparing the
string. The numbers used for start and stop and
the index values in the string. You cannot use stop without
start, but you can use start by itself.
When you use start and stop, it means "from and
including the start index to, and
excluding, the stop index".
| Code | Result | Notes |
|---|---|---|
line = 'Please have a nice day' |
True |
The |
line = 'Please have a nice day' |
False |
We are once again starting at the 7th index, but we're
stopping at the 9th index. This means we are only searching the
substring |
The endswith Method
The endswith method returns True if the string ends
with the specified prefix, False otherwise.
| Code | Result | Notes |
|---|---|---|
file = 'Lesson 4 Lab.py' |
True |
The string ends with |
file = 'Lesson 4 Lab.py' |
False |
The string ends with a lowercase |
You can also use an optional start and stop
parameters just like with startswith to narrow your full string
down to a substring:
| Code | Result | Notes |
|---|---|---|
file = 'Lesson 4 Lab.py' |
True |
The |
file = 'Lesson 4 Lab.py' |
False |
We are once again starting at the 9th index, but we're
stopping at the 12th index. This means we are only searching
the substring |
Chaining Multiple Methods
There are times when you need to call multiple methods on a single string. You can do that line by line:
| Line | Code | Result |
|---|---|---|
1 |
line = 'Please have a nice day' |
Please have a nice day |
2 |
line.lower() |
please have a nice day |
3 |
line.startswith('p') |
True |
In the last example, the method lower is called and then we use
startswith to see if the resulting lowercase string starts with the
letter "p".
As long as we are careful with the order, we can make multiple method calls in a single expression (look closely at line 2 below).
| Line | Code | Result |
|---|---|---|
1 |
line = 'Please have a nice day' |
Please have a nice day |
2 |
line.lower().startswith('p') |
True |
Line 2 in the above example combined Lines 2 and 3 in the previous example into a single statement. This is called chaining.
Example: Parsing Strings using find
Often, we want to look into a string and find a substring. For example if we were presented a series of lines formatted as follows:
From stephen.marquard@ uct.ac.za Sat Jan 5 09:14:16 2008
and we wanted to pull out only the second half of the address (i.e.,
uct.ac.za) from each line, we can do this by using the find method
and string slicing.
First, we will find the position of the at-sign in the string. Then we will find the position of the first space after the at-sign. And then we will use string slicing to extract the portion of the string which we are looking for. Start with this line of code:
data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
Now, let's try to only get the domain of the email address:
| Code | Output | Notes |
|---|---|---|
start_pos = data.find('@') |
|
First we use the find method to get the index value of
the @ sign. We assign this index value to a variable called
start_pos (for start position).
|
print(start_pos) |
21 |
When we display the value of |
end_post = data.find(' ', start_pos) |
|
Now we want to find where the domain ends. The first character after the domain is a space, so we can search for that. But there is a space before the email address. We use the second, optional argument of start, to pass a starting position. In this case, we want to start where the "@" was found. |
print(end_post) |
31 |
When we display the value of |
domain = data[start_pos+1:end_pos] |
|
Now we can slice our string from the starting index,
start_pos, + 1 (since we don't actually need the @ sign) to
and excluding the ending index of end_pos.
|
print(domain) |
uct.ac.za |
When we display the value of |
Splitting Strings
Finally, we can separate a single string into multiple strings using the
split method. The split method returns a list of the words or
substrings in the string, using sep as the delimiter string.
If sep is not specified or is None, any whitespace string is a
separator and empty strings are removed from the result.
| Code | Result | Notes |
|---|---|---|
file = 'Lesson 5 Lab.py' |
['Lesson', '5', 'Lab.py'] |
We did not specify
|
file = 'Lesson 5 Lab.py' |
['Lesson 5 Lab', 'py'] |
This time we specified Lesson 5 Lab |
A second argument, maxsplit, allows us to only split the string
a certain number of times. Let's say we only wanted to pull the first word
out. We would set maxsplit equal to 1 so only one split
occurs. Note that if you are going to use maxsplit, you must
also explicitly set sep, even if you want to use its default
value.
| Code | Result | Notes |
|---|---|---|
file = 'Lesson 5 Lab.py' |
['Lesson', '5 Lab.py'] |
We only want to split this string once, so we specify
Lesson |
