regex - Is it possible to parse this nightmare using Perl? -


I am working on some doctor file, when the text file is copied and pasted, then the following sample 'output' ARTA215 Advanced Life Diagram (30 million) (2: 2) + Studio This advanced study in drawing with 1 hour life .... Prerequisite: ARTA 150 Lab Fee Required ARTA 220 Ceramics II (30 million) (2: 2) + Studio 1 Hour In this course, It is ... Lab Fee Required Special subject in ARTA250 Arts focuses on the selected subject in this course .... ARTA260 Portfolio Development (3 million) (3: 0) The aim of this course is to pre ... BIOS010 Introduction Biological Concepts (3 IC) (2: 2) This course is an introductory course designed to start ... BIOS101 General Biology (4 CR) (3: 3) .. Lab fees required anthropology (4 million) (3: 3) Required for BIOS102 Introduction: This course is an intrusion .... Lab fee is required

I want to be able to parse it so that 3 fields can be generated and I can output the value to a .csv file.

Line brakes, spacing, etc ... How can this be any point during this file?

My best estimate is to get 4 capitalized alpha characters for a regex after 3 digits, then find out whether the next 2 characters have been capitalized or not. (This course is account for #, but as the first entry, where it can say "prerequisite", the possibility of travel is not included). After this, the regex finds the break of the first line and afterwards receives everything until it gets the next course. 3 fields will be a course number, a course title and a course description. The course numbers and titles are always on the same line and the details below is everything.

The last outcome of the sample consists of 3 fields which I guess is stored in 3 arrays:

  "ARTA 215", "Advanced Life Driving (30 million) (2: 2) + studio 1 hour. "," This advanced study in drawing with life .... Prerequisites: RTV 150 required Lab " > Like I said, this is a lot of nightmare, but after each time I have got the file ready, do it automatically after cleaning anyway Received want.  

Consider the following example that according to the paragraph of the Perl, the fully contained course depends on the block of details. Does:

  #! / Usr / bin / perl $ / = ""; My $ record_start = qr / ^ # starts with a newline * s * # Optional leading white location ([AZ] + \ d +) # Capture Course tag, for example, ARTA 215 \ s + # Separation Whitspace (. + ?) # Course Title Rest line \ s * \ n # Consumption of white space / MX on the rear; While (<> ($ (curriculum, $ title); If (s / \ A $ record_start //) {# focus stack overflow highlighting ($ course, $ title) = ($ 1, $ 2); } Elecef (s / (? S: ^.?) (? = $ Record_start)) // {#Detto / Redo; } And {next; } My $ desc; Unless s / ^ (. ++) (== record_start | \ s * $) // s; (My $ desc = $ 1) = ~ s / \ s * \ n \ s * / / g; ($ Course, $ title, $ desc) for {s ​​/ ^ \ s + //; S / \ r + $ //; S / \ s + / / g; } Print included ("," => gt; qq {"$ _"} = & gt; $ course, $ headline, $ desc), "\ n"; Redo if $ _; }  

When your sample input is fed, it outputs

 "ARTA215", "Advanced Life Diagram (30 million) (2: 2) + Studio 1 Hour. " , "This advanced study in drawing with life .... Prerequisites:" Aarta 150 Lab Fee Required "" ARTA 220 "," Symix II (3Cr) (2: 2) + Studio 1 Hour. "," This Course The student gives opportunity to the former ... Lab fee is required "" ARTA 250 "," Special Topics in Art "," This course focuses on the selected topic .... "" ARTA 260 "," Portfolio Development 30 million) (3: 0) "," The purpose of this course is ex ... .... "" BIOS010 "," Biological Concept Introduction to AAS (3 IC) (2: 2) "," This course is an introductory course designed to start ... "BOIS 101", "General Biology (4 CR) (3: 3) "This course introduces the principles of mo to the student ... Lab fees are required" "BIOS102", "Introduction to Human Biology (4 million) (3: 3)", "This course is an intro .. .. Lab fee is required "

Comments

Popular posts from this blog

c++ - Linux and clipboard -

What is expire header and how to achive them in ASP.NET and PHP? -

sql server - How can I determine which of my SQL 2005 statistics are unused? -