Web-scraping practice with BeautifulSoup Python

Recently, I underwent a project which requires some additional information that a bit troublesome to obtain them manually. Then I decided to make use of one of the infamous skill for obtaining data, yeah, scraping!

So here is the webpage link .

I know, it may a bit unpleasing to saw the web-design. lol. But there are lots of valuable information. It's only sad that there is nowhere to get the download button there. Below is the front-page of the site. 


OK, now get back into business. 
I need to get below informations :

    > Kecamatan/Distrik (district name in Indonesia)
    > Kode Pos (postal code)
    > Kode Wilayah (area code based on the district name) 
    > Kota/Kabupaten (area type, either a city or county) 
    > Nama Kota/Kabupaten (city or county name)
    > Nama Provinsi (province name, in US it is similar with 'state')

So there are just a couple of columns needed. Here is the specific webpage I would like to scrape from.  


If it is only a page, I'd only copy the table and paste-value to excel, then ged rid of the columns I don't want to use. The maximum it can show is only 1000 row, in total 8 pages. But yeah, there are 'still' some pages, and that's why it will save lots of patience to get the data by using this technique.


Now move to python, I will import some libraries and pull a request to the website. 



Take a look for a pattern, so that we can get the 'key' that we will able to iterate over. It can be almost any element. Will explain this further below. 

Right click above the table on the web-page, then choose inspect element. In this case, I am using <tr>, with a specific background colour, which is tagged with 'bgcolor'. The color is coded with '#ccffff' ( specifically, <tr bgcolor':'#ccffff'>). It is where all the data needed lies.


I save it within the variable trs

Now I want to test if the 'key' I am using is OK to get what I want. So for the first experiment, I will obtain the data from the first page only. I store all the data to an array. Whereas every row are stored in dictionary type. 

Then display the result to a data-frame, for a proper looking.

Here it looks.

Yippee!! 

Now how can we get the data from all the pages?

Well, easy-peasy, I'll inspect the web-page again to get the link of the pages (this time, I only get the 2nd page until 8th page).  The link lies inside the <a>,  with specific tag {'class':'tpage'}


It will get the links just like below :


Then we can get only the link (stored in an array).

Then just combine the previous 1st page code with another for loop.


Last but not least, import to a csv file, and we're good to go using them! 


Happy scraping! <3 <3 

Comments

  1. This post is a fantastic walk‑through of real‑world web scraping using BeautifulSoup! I love how you demonstrate identifying HTML patterns—
    . The progression from handling a single page to looping through pagination shows practical problem-solving in action. Saving results to a CSV is a smart move that bridges coding with data usability. Your friendly tone—“Easy‑peasy!”—makes a technical process feel approachable and even fun. Thanks for sharing this clear, hands-on example; it’s a great resource for anyone starting with scraping dynamic tables.
    Get the latest and updated Apify coupons by Wadav 2025 to enjoy significant savings on advanced web scraping and automation services. Wadav lists over 37 current deals—including annual-plan discounts, free trials, and exclusive promo codes—so you can find the perfect offer to suit your data-extraction needs.

    ReplyDelete

Post a Comment

Popular Posts