java - Best way to grab website content externally -


is a search site whose search result is dynamically generated by JavaScript. The user enters a query, and the site displays without refreshing content on the page

I need to catch those search results programmatically (a Java program or a Pearl / Python From script).

So ideally, I can launch my program with 100 queries as user input, and then the program will hit that website with every query and given by the website on my screen Removes all search results. / P>

The obvious problem is that the site is in Javascript instead of ordinary HTML, so the resultant output for sending a URL request and parsing (this page is not going to work as the source code is always going Is a group of different JS files).

Looking at the above conditions, what are my options?

Javascript makes HTTP request almost like a browser, once you find out what you do , You can try to recreate them in perl / python / etc. With Firefox + Firebug you can see the request in the 'net' panel.

The user-agent string, the cookies, the fact is that sometimes the returned data means run / interpreted by Javascript etc. The language of your choice can use a good httpbrowser class ?


Just put a vision, search for IBM, took post data from Firebug, replace new-rows with & amp;

  [http://bcode.bloomberg.com/sym/dwr/call/plaincall/searchMgr.search.dwr?callCount=1&windowName=&= and put the request URL after ; C0-sCRIPTNAME script = searchMgr & amp; C0-methodName = Search & amp; C0-id = 0 & amp; C0-E1 = string: IBM & amp; C0-E2 = string: & amp; C0-E3 = Number: 100 & amp; C0-E4 = Number: 0 & amp; C0-E5 = Boolean: False & amp; C0-param0 = Object_SearchCriteria: {Search: Reference: C0-E1,% 20filter: Reference: C0-e2,% 20limit: Reference: C0-E3 & amp;,% 20start: Reference: c0- E4,% 20allSources: Reference: C0-E5} & amp; BatchId = 4 & amp; Page =% 2Fsym% 2F & amp; HttpSessionId = & amp; ScriptSessionId = FBC68693A4E1BC08D6E0DDFBDF6D0860]  

but it returns

throw 'allowScriptTagRemoting is incorrect.'; // #DWR-answer if (window.dwr) dwr.engine.remote.handleBatchException ({name: 'java.lang.SecurityException', message: 'this is non-permitted'}); Else if (window.parent.dwr) window.parent.dwr.engine.remote.handleBatchException ({name: 'java.lang.SecurityException', Message: 'This is non-permitted'});

No more data. It seems that you have a post request script, considering your restrictions and guidelines, you should just go in contact and ask if there is a public API ?


Comments

Popular posts from this blog

c++ - Linux and clipboard -

What is expire header and how to achive them in ASP.NET and PHP? -

sql server - How can I determine which of my SQL 2005 statistics are unused? -