Save wikipedia article in txt format with python and wikipedia API

For focus based study its essential to get rid of all unnecessary context like wonderful design, and other ui effect. Whats necessary only text like the text book.

Wikipedia is one of my most visited website at this time. Though I love the content , i wll love more if its more minimalist..

So there is the solution… I can read the Wikipedia article in the Linux terminal now!… Too minimalist!!!

Also we can save the article as txt format for offline reading. May be I will read it in vs code!!!!

Now back to the program..

We will apply wikipedai api. here

We will request through wikipedia api. in response get the json…

And then prepare the text format with BeautifulSoup library . That’s it!

pip install beautifulsoup4

The complete Code



import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse


WIKIPEDIA_API_URL = "https://en.wikipedia.org/w/api.php"

def get_wikipedia_content(title):
    params = {
        "action": "query",
        "format": "json",
        "prop": "extracts",
        "titles": title,
        "explaintext": True 
    }

    response = requests.get(WIKIPEDIA_API_URL, params=params)
    data = response.json()
    page_id = list(data["query"]["pages"].keys())[0]
    content = data["query"]["pages"][page_id]["extract"]

    return content

def get_wikipedia_content_format_with_beautifulsoup4(title):
    params = {
        "action": "query",
        "format": "json",
        "prop": "extracts",
        "titles": title,
    }

    response = requests.get(WIKIPEDIA_API_URL, params=params)
    data = response.json()
    page_id = list(data["query"]["pages"].keys())[0]
    content = data["query"]["pages"][page_id]["extract"]

    return content


# Wikipedia Article URL
url = "https://en.wikipedia.org/wiki/Life"
parsed_url = urlparse(url)
title = parsed_url.path.split('/')[-1]
print(title)



wiki_content = get_wikipedia_content(title)
print(wiki_content)


# Save plain text content to a .txt file
with open("wikipedia_content.txt", "w", encoding="utf-8") as file:
    file.write(wiki_content)

print("Plain text content saved")

#with beautifulsoup
wiki_content_page=get_wikipedia_content_format_with_beautifulsoup4(title)

soup = BeautifulSoup(wiki_content_page, 'html.parser')
plain_text = soup.get_text()

print(plain_text)
with open("wikipedia_content_plain_text_beautifulsoup.txt", "w", encoding="utf-8") as file:
    file.write(plain_text)

print("Plain text content saved")






You can get the txt directly with API and without BeautifulSoup..

I also released an app for this in snap store
https://snapcraft.io/wiki-txt

You can also try the API in the browser
https://noncommercial.purnorup.com/wiki-txt

Related Posts

Print with python f string format
July 3, 2024

We use make the python print more wonderful with f string (formatted string) x=1 y=2 z=x+y print(f’result {x}+{y}={z}’) Copy Code this will print We can make it more wonderful directly adding expression x=1 y=2 print(f'{x} times {y} is {x * y}’) Copy Code We can also set the decimal places in the print number = […]

Check if a key exist or not in a python dictionary
July 3, 2024

We can check it with “in” operator and “get()” method In operator here returns the value True or False For example dictionary={“one”:1, “two”:2} if “two” in dictionary: print(“two exist in the dictionary”) Copy Code The get() method returns the value if that key exists. If not exists it returns None dictionary_data={“one”:1, “two”:2} #Here the key […]

Find out median of a list with python
July 3, 2024

Python has wonderful built-in library “statistics” We can find out the median of a list with this library easily with less code import statistics List = [1,2,3,4,5,6,7,8,9,10] result= statistics.median(List) print(result) Copy Code We can also generate list randomly with random module List = [randint(1, 101) for _ in range(50)] this code create a list of […]