Category Archives: Unileben

Extract plain body text from tex document

I needed to post just the body text as plain ASCII for a submission to a form and I thought well, that should be easy. But in fact, I struggled a lot to get even decent results. Thus, I decided to post this little how-to, to get the best version of plain text of the body text. I am only interested in the body text which means, that figures, footnotes, equations, headers, page numbers, … should be not be included in the final result. Thus the first section explains some ways to get rid of text/equations/figures/… which do not belong to the body text, and the second section explains the plain text production.

Getting rid of unwanted text

The goal of this section is, to remove all passages with text/figures/tables/etc. that do not belong to the body text.

Pruning environments

First I want to get rid of the beforementioned footnotes, equations, etc. Since a lot of stuff we want to get rid of is written in environments like \begin{} ... \end{} , a useful package called comment already exists. It can be put in the preamble like this:

The example removes the output defined in the given environments (figure, table, …) from the document.

Note: The references cannot longer be resolved and thus, many ?? will appear in the final output. But we are just interested in the body text, right?

Pruning commands

Some commands (footnote, bibliography, …) produce extra text in the document, which belongs not to the real body text. But again, if removed referenced or cited entries will be lost and appear as  ?? in the final output. The following example gives just a snapshot of commands one wants to prune. These are very project and also document class depended, and might be altered. So the final approach is, to get rid of the commands be just renewing the definition by an empty one like {} . If the original command has a number # of parameters, these number needs to be given in the redefinition in  [] .

 

Remove header, footer, page numbers, …

The pagestyle command can be applied as follows, to remove header and footer. This command needs set into the preamble:

Produce plain text output from tex

This section gives examples of how to convert a tex document to plain text.

Flattening the project structure

I have experienced many problems with hierarchical project structures, where the main tex document is split into separate files and included with commands like input, include or import. Thus, I’ve adapted a python script called master2single.py to produce a complete flattened tex file, where all includes, etc. are resolved. It can be executed as follows:

Note 1: While some compiler complain about some wired comments, it is safer to remove all comments on the flattened output via the -c  option.

Note 2: The import command is for me the most convenient version because it seems to have no limitations regarding the project’s folder depth.

Solution 1: pandoc

pandoc is a grate too when it comes to markup language parsing and other text operations. And thus, pandoc does not really produces plain text from the tex document, but a markdown version of it. This is very nice if you want to post the whole document on a web page, but it disrespects the exclusion of environments via the comment  package. Thus, equations and tables, but not figures, are included in the output:

Solution 2: latex2rtf (not really)

For very simple documents, latex2rtf, which comes with the standard latex installation on most of your OS flavours, might be a solution. Unfortunately, it cannot handle most of the simple latex commands and thus results an error.

Solution 3: Convert PDF to plain text (best output)

It sounds counter-intuitive, but first producing a PDF from the tex file and then convert the PDF to plain text seems to produce the most satisfying results. Program solutions are ebook-convert  from Calibre or Pdftotext. I’ve only used Pdftotex, which produces a very satisfying output:

Note 1: The file needs of course not to be flattened with this solution.

Note 2: The  -layout argument preserves the sections’ layout and thus, produces even better results.

Other solutions

There might exist a bunch of other solutions out there, but some of them are outdated, are hard to handle or I haven’t found them yet. Here is an incomplete list of other tools and explanations which might be helpful:

 

Back from Brazil

We struggled and fought and reached the 4th place at the RoboCup@Home competition. More posts are coming again from now on.

Using vncviewer without typing in the password

Hi,

it is more convenient to use vncviewer without typing in the password all the time.
To do so, just create an own password-file in the following way:

After you typed in the password twice in plain text, just use the created file to connect to the server:

Greetz!

Matlab: Build-In vs. MEX vs. “naiv” loops

Ich wurde vor kurzem nach einem Minimalbeispiel für eine MEX-Funktion in MATLAB gefragt, da ich so davon geprahlt habe. Wer MEX nicht kennt, dass sind “MATLAB EXecutables”, welche in C/C++ oder Fortran geschrieben werden können. Sofern man diese dann mit dem MEX-Compiler übersetzen kann, ruft man sie straight-forward aus einem Matlab-Skript auf. Dies führt unter Umständen zu einem gewaltigen Geschwindigkeitsschub, in Bezug auf eine gleiche Implementierung in MATLAB.

Als Minimalbeispiel habe ich mir die Spaltensummenfunktion überlegt, welche alle Zeileneinträge einer Spalte subsummiert. In Matlab ist diese bereits implementiert und kann wie folgt genutzt werden: Continue reading Matlab: Build-In vs. MEX vs. “naiv” loops

LaTeX symbol classifier

I am writing my masterthesis right now and after a few non-everyday equations, I was tired of searching for the right LaTeX command in Google to generate the proper math symbol.
But then I remembered a tool called “LaTeX symbol classifier” which link I have stored deep down in my bookmarks. With this, you just need to print the symbol with your mouse, and you will be pleased will the most probable commands in LaTeX.

->LaTeX symbol classifier<-

This is a tool which shows IT at-it’s-best ;-). I’m wondering if they use HMMs for recognising like the Hanzi recognisers do.

Building a wireless router for a wireless network with a Raspberry Pi

Hi there,

there is a lack of detailed information about to building a system, that shares it’s wireless connection which has internet (like eduroam or any other network) via an own wireless AP with it’s own setup configuration. Because of this, I’ll publish the manual how to do so with an RPI (or any other Debian system). To have a system like this can be really helpfull if you have an old system which only supports WEP, to connect with an AP which only allows devices wia WPA or certain cerificates to connect.

So what I literally want to do is the following

The setup

  • Raspberry Pi (256 MB SDRAM) with “2012-12-16-wheezy-raspbian”
  • 2GB Kingston microSD card with Kingston microSD-to-SD adapter
  • DeLOCK powered USB 2.0 HUB (B/N61393)
  • 2x LogiLink W-LAN USB with a Ralink RT5370 chipset

The Manual

I will devide the manual into three parts, while in the first part I will describe how to Continue reading Building a wireless router for a wireless network with a Raspberry Pi