Sora Innosia

Sora Innosia provides Free Softwares


24 Oct 2014 - Software baru bernama SundulKaskusDroid
Salam pengguna, Kami telah membuat software terbaru bernama SundulKaskusDroid yang dapat diunduh dari http://www.innosia.com/SundulKaskusDroid Software ini berguna untuk menyundul thread anda di...

Sora Innosia WebScrapper - Overview

What is Sora Innosia WebScrapper library?

WebScrapper is a library to quickly get data from webpage using only a single syntax! Yes, only a single syntax. Imagine developer have to scrap a website for certain information, developer has to write code such as

WebClient wc = new WebClient();
string googlecom = wc.DownloadString("http://www.google.com");
string[] links = GetLinks(googlecom);            
string LINK1 = GetLink(links[5]);
string LINK2 = links[6];
string newData = wc.DownloadString(LINK1);

public string[] GetLinks(string content)
{
// ... Parsing of links
}

public string GetLink(string content)
{
string search = "onclick=gbar.logger.il(1,{t:5}); class=gbzt id=gb_5 href=\"";
int startIndex = content.IndexOf(search);
int endIndex = content.IndexOf("\"", startIndex + search.Length);
return content.Substring(startIndex + search.Length, endIndex - (startIndex + search.Length));
}

There are lots of lines which I even omit the method implementation to return list of links. By using WebScrapper library, you only need these lines

string syn = "SetResult('LINK1,LINK2', TagMatch(Filter(TagMatch(Download('http://www.google.com'), '<a', ''), '5,6'), 'onclick=gbar.logger.il(1,{t:5}); class=gbzt id=gb_5 href=\"', '\"'));Download(GetResult('LINK1'))";
WebScrapper.Scrapper scr = new WebScrapper.Scrapper();
string[] result = scr.Multiple(syn);

Simple? It is only 3 lines? Though the first lines look compact but the function itself is purely for retrieving web purposes, which including
1. Downloading from http://www.google.com
2. Searching tag that matches '<a' and ''
3. Filter the result and return only item in index 5 and 6
4. Filter the result above that matches 'onclick=gbar.logger.il(1,{t:5}); class="gbzt" id=gb_5 href=\"' and '\"' and it item 5 matches while item 6 does not match which will return empty string
5. Assign variable LINK1 equal to filtered item 5 and variable LINK2 equal to filtered item 6 (empty)
6. Download from variable Link1

The separation of concern meaning that the compact syntax serves only for one single purpose, which is Web Scrapping, which we have no control over the content since the content belong to other entity. We need a strong research and testing so that assumption is made that certain searches return certain result.

Many developer when doing Web Scrapping, assumes a lot of things, because there is no definite way that a website will stay as it is, the company behind the site, or entity behind the site, might do renovation, or works, upgrading or maintenance that cause the web changes. If we specifically write a .NET assembly such as DLL or EXE to get data based on our research, our DLL or EXE is easily outdated once the website doing changes, thus we have to analyze the website again and update our DLL or EXE code and doing recompilation and publish our code to our user or website. It is a tedious cycle that often happens.

By using WebScrapper, parsing of web is done using a single syntax which is a single string consisting of recursive statements. A string can be stored in database or configuration files, which makes it easy to modify without the need to recompile any code. When the target website changes, developer only needs to update the scrapping syntax and the scrapping works again!

Benefit of WebScrapper
1. Single Syntax in a string, thus can be stored in database or configuration files. Updating of Single Syntax is easy.
2. No need to compile the syntax, as it is being interpreted on the fly.
3. One instance of the Class uses one single WebClient control that maintain the Cookies state, thus downloading multiple page will keep the Cookies intact.
4. Support Regex
5. Built in string finder

And much more benefit when using WebScrapper instead of manual hard coding and compiling codes!
Download Now to test it!




Chatango

News
24 Oct 2014 - Software baru bernama SundulKaskusDroid
Salam pengguna, Kami telah membuat software terbaru bernama SundulKaskusDroid yang dapat diunduh...
31 Oct 2014 - Sora Innosia release auto bump for Singapore Forum
Dear Users, I have released Sora Innosia - HWZAutoBump a software used to bump thread in...
16 September 2014 - Sora Innosia release InnoVN (Free Visual Novel Builder)
Hi users, Ever want to create your own Visual Novel but tools such as Ren'Py requires knowledge on...
5 September 2014 - Nyaa Disruption with Encode Delay and Sword Art Online episode 10
Hi Users, I am here to announce that nyaa has been fixed.As some of you may have already known, nyaa...
21 August 2014 - Sora Innosia has continued to encode
Hi Users, We glad to notify you that Sora Innosia has continued encoding anime, so you can find your...