COLLECTED BY

Organization: Archive Team

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.

The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.

This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.

Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.

The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.

Collection: Archive Team: URLs

TIMESTAMPS

The Wayback Machine - https://web.archive.org/web/20201216131441/https://www.geeksforgeeks.org/python-check-url-string/

Tutorials
- Algorithms
- Data Structures
  - Arrays
  - Linked List
  - Stack
  - Queue
  - Binary Tree
  - Binary Search Tree
  - Heap
  - Hashing
  - Graph
  - Advanced Data Structure
  - Matrix
  - Strings
  - All Data Structures
- Languages
  - C
  - C++
  - Java
  - Python
  - C#
  - Javascript
  - jQuery
  - SQL
  - PHP
  - Scala
  - Perl
  - Go Language
  - HTML
  - CSS
  - Kotlin
- Interview Corner
- GATE
- ISRO CS
- UGC NET CS
- CS Subjects
- Web Technologies
  - HTML
  - CSS
  - Javascript
  - jQuery
  - PHP
Student
Courses
Jobs
- Apply for Jobs
- Post a Job

Home
Courses

Python | Check for URL in a String

Last Updated: 11-05-2020

Prerequisite : Pattern matching with Regular Expression

In this article, we will need to accept a string and we need to check if the string contains any URL in it. If the URL is present in the string, we will say URL’s been found or not and print the respective URL present in the string. We will use the concept of Regular Expression of Python to solve the problem.

Examples:

Input : string = 'My Profile: 
https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles 
in the portal of http://www.geeksforgeeks.org/'

Output : URLs :  ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles',
'http://www.geeksforgeeks.org/']

Input : string = 'I am a blogger at https://geeksforgeeks.org'
Output : URL :  ['https://geeksforgeeks.org']

Recommended: Please try your approach on {IDE} first, before moving on to the solution.

To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

# Python code to find the URL from an input string 
# Using the regular expression 
import re 
  
def Find(string): 
  
    # findall() has been used  
    # with valid conditions for urls in string 
    regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    url = re.findall(regex,string)       
    return [x[0] for x in url] 
      
# Driver Code 
string = 'My Profile: https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles in the portal of http://www.geeksforgeeks.org/'
print("Urls: ", Find(string)) 

Output:

Urls:  ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles',
'http://www.geeksforgeeks.org/']

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes

Recommended Posts:

Chinmoy Lenka

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.

Improved By : rajasekharreddydonthireddy

Writing code in comment? Please use ide.geeksforgeeks.org, generate link and share the link here.