.net - How can I quickly create large (>1gb) text+binary files with "natural" content? (C#) -


For testing communications purposes, I should be able to create large files, ideally in text, binary and mixed formats In.

  • The contents of the files should not be completely random or equal.
    A binary file with all zeros is not good. A binary file with not completely random data is also not good. For text, a file with completely random sequences of ASCII is not good - text files should have patterns and frequencies that emulate natural language, or source code (XML, C #, etc.). Pseudo-real lesson
  • The size of each individual file is not important, but for a set of files, I need a total of ~ 8GB.
  • I want to keep the number of files at the manageable level, we say that o (10)

to create binary files, I have a large buffer I can and do the system. Random FileStream.Write followed by NextBytes in a loop, like this:

  Int64 bytes = size; Byte [] buffer = new byte [sz]; (Stream file stream = new flamestream (file name, file mode.creative, write.access write)) (while bytesreading gt; 0) {int sizeOfChunkToWrite = (bytes me & gt; buffer length)? Buffer Long: (int) bitesarming; If (! Zero) _rnd.NextBytes (buffer); FileStream.Write (buffer, 0, sizeofflinecountry); BytesReming - = SizeOffCancerite; } FileStream.Close (); }  

With a large enough buffer, we say that 512k, it is relatively fast, even for files over 2 or 3 GB. But the content is completely random, which I do not want.

For text files, I have to use the approach I have taken, and often drop it into a text file via a streamer. The material is non-random and non-identical, but there are several identical repeating blocks, which are unnatural. Also, because the Lorem Ispum block is very small (& lt; 1k), it takes several loops and a lot, too long

None of these is quite satisfying for me

< P> I have seen the answers. Those methods are very fast, but I think they just fill the file with zero, or random data, neither what I want. If necessary, I have no problem running any external process like contig or fsutil.

The tests go on Windows.
Instead of creating new files, does it only make sense to use only those files which are already present in the file system? I do not know about any that are sufficient enough

An existing file (probably c: \ windows \ Microsoft.NET \ framework \ v2.0.50727 \ config \ enterprisesec.config.cch a text file ) About starting with and about copying its contents many times? This will work with a text or binary file.

Currently I have a kind of work that works but it takes a lot of time to run.

Has anyone else solved this?

Is there a faster way to write a text file via Streamwriters?

Suggestions?

Suggestion> Edit : I liked the idea of ​​a Markov chain to create a more natural text. Still have to face the issue of speed, though.

I think you can search for something Is stochastic (random), but it is also structured, it is based on one.

In fact, Markov Chen has been used to create semi-realistic text in human languages. In general, they are not small things to properly analyze, but the fact that they exhibit some qualities should be enough for you. (Again, see the section of the page.) Hopefully how should you design one, however - to implement, this is actually a simple concept. Your best bet might be to prepare a framework for a normal Markov process. And then analyze the natural language or source code (whichever you want to simulate your random data) to "train" your Markov process will do. Finally, it should give you very high quality data according to your needs. Good value for the efforts is definitely, if you want a heavy length of test data.


Comments

Popular posts from this blog

c++ - Linux and clipboard -

Visual Studio 2005: How to speed up builds when a VSMDI is open? -

booting ubuntu from usb using virtualbox -