php

Access Database 2010 Portability

Since the number of users are not many people can cause them to VPN connect to the PC that stores the database containing the tables. The linked tables in the database that contains the application (forms, reports, etc) are linked as if they were working on a LAN.

Should be optimized bandwidth, ensuring that users have access application forms and work right through consultations, for example to make the form is open to add data, or if you need to work with a data set in place controls restrict the form and then by a button or something similar charge filled the form's record source query by limiting the information in accordance with the provisions. In this way which ensures that data is to be sent from the server and from version to access the tables as low as possible.

The other option is to set up a web server to which you can use Apache, it is very likely to have to migrate the database tables to MySQL or SQL Server (express version is free) and away from here may set the forms in PHP. There are applications that make the Access to PHP though often then must make adjustments.

Taggings:

Use PHP to extract textual information from RTF documents

When trying to view the document of this format we get not-readable information, like

{\rtf1\ansi\ansicpg1251\deff0\deflang1049{\fonttbl{\f0\froman\fcharset204
Calibri;}{\f1\fnil\fcharset0 Calibri;}}
\fs36\'cf\'c0\'d0\'d3\'d1\par
\b0\fs22\'c1\'e5\'eb\'e5\'e5\'f2 \'ef\'e0\'f0\'f3\'f1 \'ee\'e4\'e8\'ed\'ee\'ea\'ee\'e9

The more detailed screenshot see in the attachments.

So what we see is 8-bit data format, which is good, as that means that it will not be that difficult to extract the information. Generally RTF consists of control words, that can be grouped in nested sets. Control words begin with backslash ('\') and the group is limited with figure brackets ('{' and '}'). Control word can be any a-z word and could be followed by a number value, and also could contain one non-digit-letter ascii symbol.

So, sequence like \rtf1\ansi\ansicpg1251 is divided in 3 control words: rtf with parameter 1 (format major version), ansi (current encoding) and ansicpg with parameter 1251 (codepage Windows-1251).

Nested set define the scope of control words, so everything defined in {} works in it and all children sets. So in order to keep the current set of working control words, a stack is needed - adding elements with opening bracket and remove with closing. Some control words can be closed not only with closing bracket { but also with adding the parameter 0. For example: This is \b bold \b0 text.

As the initial encoding from RTF is ANSI, then english text will be saved without any special operations. But as I am interested in more general approach, at least getting Russian, and even better - Unicode. So, RTF gives a possibility to encode the rest of ASCII table (more than 128), taking in account the current codepage, of course (\ansicpg). A special sequence is used: \'hh, where hh - is a hex-code of the symbol from ASCII table. The unicode symbols are encoded as a sequence \uABCD, where ABCD is a decimal code of the unicode-symbol.

But, during the testing it appeared that it is not htat easy with Unicode as it seems. The problem is that RTF has another control word: \ucN, which is tightly coupled with Unicode. The thing is that Unicode is strongly supporting the old standards and systems, which could be used to read the rtf-file. Fro example, PC with Windows 3.11 will not be able to read Unicode. In order to let him show at least something in this case, after every unicode-symbol encoded with control word \u several symbols can be define , which should be displayed in case rtf-viewer cannot show of recognize current information.

Because of this, most of the current text processors put '?' after each unicode-control word, as a symbol to be showed instead of the current symbeol, in extreme cases. But some variations are also possible, like \u915FValue. So the keyword \ucN is used to tell, how many symbols should we display if we cann't show unicode. So, if in front of unicode data we get something like \uc1, that means we should skip one symbol after each unicode-symbol control word.

So, after mining this information from the specifications of the format, we can already write the code. The source code is provided in ther attachments.

So the suggested algorithm will work with most of RTF document, but there are some way to enhance it if you like. For example, a good thing would be to cut all non-textual data, in my implementation I cut out only fonts, colors, themes, binary data and everything marked as "don't read me if you can't" (\*). Another good option would be to parse the encoding and codepage in order to better display keywords as \'hh.

References
[1] www.latex2rtf.sourceforge.net/RTF-Spec-1.0.txt

[2] www.microsoft.com/downloads/details.aspx?FamilyId=DD422B8D-FF06-4207-B47...

Extracting textual information from RTF documents

In order to provide indexing of office documents of a big company and to enable web-crawler to gather the necessary information the tool is required to get the textual information from non-textual sources, like PDF, DOCX, ODT, RTF, etc Another requirement is to use PHP without third-party tools, such as antiword, xpdf, or at least OLE under Windows. This requirement is grounded on the fact that, for example, OLE is incredibly slow, even is the task can be solved by it. Another reason is have the imndependent solution, not using any of existing tools and not to depend on the platfrom used. Here the task is to study Rich Text Format, which while evolution till the current 1.9.1 version has more than 300 pages of specifications, that are surely not heping in parsing this format.

Use PHP to extract textual information from DOCX and ODT documents

Actually this task appeared to be not that hard as I first thought. To make the task more real, I was working with Russian language, i.e. with CP1251.

So at first I tried to open the documents with simple Lister. The screenshot of the experiment is provided in the attachments. Both files, odt and docx, look like binary files from this point of view. But the most important detail to notice (and it's quite small I would say) is th letters "PK" in the beginning of the data. That actually means that both files are, somewhere deep in their soul, just a zip-archive, which extension was renamed either to odt or to docx.

If we open any of the files in Total Commander using Ctrl+PageDown (Open element under the cursor) we will get a structured content with some folders and XML documents. The screenshot of the experiment is provided in the attachments.

The content that we need is situated in the file content.xml (in ODT) and word/document.xml (in DOCX).

So, in order to extract textual information from ODT ot DOCX formats, we will have to use the standard ZipArchive class and some functions to work with it.
The source code is provided in the attachments. The solution works under PHP 5.2+ and requires php_zip.dll for Windows or --enable-zip key for Linux. In case of unavailability of ZipArchive (like old PHP or lack of libraries) one could use PclZip (http://www.phpconcept.net/pclzip/index.en.php) .

References:
www.msdn.microsoft.com/en-us/library/aa338205.aspx
http://www.i-rs.ru/content/download/1447/8162/file/OpenDocument-v1.0-os.pdf

Extracting textual information from DOCX and ODT documents

<p>In order to provide indexing of office documents of a big company and to enable web-crawler to gather the necessary information the tool is required to get the textual information from non-textual sources, like PDF, DOCX, ODT, RTF, etc</p><p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p><p>Another requirement is to use PHP without third-party tools, such as antiword, xpdf, or at least OLE under Windows. This requirement is grounded on the fact that, for example, OLE is incredibly slow, even is the task can be solved by it. Another reason is have the imndependent solution, not using any of existing tools and not to depend on the platfrom used.</p><p>&nbsp;</p><p>&nbsp;</p> <p>Here the task is to study Office Open XML format as know as Microsoft's DOCX and another similar format, that is OpenDocument Format, as know as ODT from ODF Alliance.</p><p>&nbsp;</p> <p>The first fomat - Office Open XML: DOCX - can be a real problem even if you are simply working in document management system in one company. As this format is not compatible with the old versions of Microsoft Word, it could really be a problem if you receive such a document from a client, having somthing like Microsoft Office 2003. So the ability to extract important textual infromation from such document without purchasing the new Office licence would be a nice idea.</p><p>&nbsp;</p> <p>The same thing is with the ODT format, however this problem is solved much easier, as Open Office is open-source software, so if you get a document in such a format - you just have to download the free software. But anyway, the indexing software will not be able to install all the necessary software, so this task is really important.</p>

Use flash and actionscript

After hours of search in the internet I figured out that flash in combination with actionscript 3 is able to provide mutli-select file dialogs. So I included a small flash file in my website to read the given input files and send them via a HTTP post request to a PHP file which moves the uploaded files iteratively to the specified destination location. In the constructor of the actionscript file I defined a file reference object and added some events that invoke certain methods automatically:<code>fileRef = new FileReferenceList();fileRef.addEventListener(Event.OPEN, fileRefListener_onOpen);fileRef.addEventListener(Event.CANCEL, fileRefListener_onCancel);fileRef.addEventListener(Event.SELECT, fileRefListener_onSelect);fileRef.addEventListener(ProgressEvent.PROGRESS, fileRefListener_onProgress);fileRef.addEventListener(Event.COMPLETE, fileRefListener_onComplete);</code>After that I specified a bunch of methods. The first one will be called if the user hits the “Upload-Files” button. It specifies the sort of files allowed to be uploaded (filter) and opens the upload dialog:private function browseClick(event:Event) {                var fileFilter:FileFilter = new FileFilter("Images", "*.jpg;*.jpeg;*.png");    fileRef.browse([fileFilter]);}Now I wrote the methods defined in the constructor that will be called automatically because of the event listeners. The most important is the “fileRefListener_onSelect”-method. It stores all file references in an array:public function fileRefListener_onSelect(event:Event) {        var fileRefList:FileReferenceList = FileReferenceList(event.target);    var list:Array = fileRefList.fileList;    var tempList = new Array();        for (var i:uint = 0; i < list.length; i++) {                                        tempList.push(list[i])    }        filesToUpload = filesToUpload.concat(tempList);}The other listener methods can be specified in the same way. They will be called if the file upload has started, ended or is still in progress (progress diagrams). In one last method I told the script to start the upload on the collection of filereferences . The variable “pathUploadScript” contains the URL of the PHP script:public function uploadFiles(event:Event){                filesUploaded = 0;        var item;    for(var i:Number = 0; i < filesToUpload.length; i++) {                item = filesToUpload[i];        item.addEventListener(Event.OPEN, fileRefListener_onOpen);        item.addEventListener(Event.CANCEL, fileRefListener_onCancel);                        item.addEventListener(ProgressEvent.PROGRESS, fileRefListener_onProgress);        item.addEventListener(Event.COMPLETE, fileRefListener_onComplete);        var url:URLRequest = new URLRequest(pathUploadScript);                if(!item.upload(url)) {            status_txt.htmlText = "Upload dialog failed to open.";        }    }}    The smallest version of this PHP file has to look similar to this:$uploadFile = $dir."/".$_FILES['Filedata']['name'];move_uploaded_file($_FILES['Filedata']['tmp_name'], $uploadFile);It moves the file from the temporary upload directory to a location I specified in $uploadFile.

Taggings:

Use the apache mod_rewrite

The default CodeIgniter URL is composed in the following way:http://domain.com/index.php?myController/myMethod/param1/param2/param3This URL loads the file myController.php in the controller subdirectory and calls the public method myMethod(“param1”, “param2”, “param3”) with the given strings as parameters. What I wanted to do was hiding the substring “index.php?” from the users to make the URL look better. The website runs under the apache webserver so I used the rewrite module to finish this task. So before you continue, make sure that this module is running properly. The first step I had to do was writing an .htaccess file which contains those few lines of code and save it in the directory where the index.php is located: <code>    RewriteEngine On    RewriteBase /    RewriteCond %{REQUEST_FILENAME} !-f    RewriteCond %{REQUEST_FILENAME} !-d    RewriteRule ^(.*)$ index.php?/$1 [L] </code>The first line activates the rewrite module. The second line states the scope of the rewrite rule. Since the index.php and the .htaccess files are located in the same folder I used a backslash to use the current directory as the scope. The RewriteCond commands can be perceived as conditions that have to be met until the rewrite rule is activated. My two conditions test if the URL calls another file (f) or directory (d) that exist in the current directory. If so, the URL want be changed and the desired resources will be loaded. This is important if there are any other resources than those provided by CodeIgniter in your root directory. If there are no such resources, the rule will be activated which is depicted in the last line. It says that the complete URL string after the domain plus toplevel domain (left term in the regular expression) has to be copied behind the term “index.php?/”. The $1 represents this copied value which in the upper example is the string “myController/myMethod/param1/param2/param3”. The [L] says that this is the last rule (this is just importand if we would have used a sequence of rules). In the last step we have to tell the framework that it should not use the term “index.php” for the creation of links anymore (this step is now done by the apache module implicitly). Therefore we open the file system/application/config/config.php and change the variable$config['index_page'] = "index.php"; to$config['index_page'] = '';   Thats it! After an apache restart the thing should be working!

Taggings:

Access to MySQL by third party administration tool

The solution to "Eays access to MySQL data on shared hosting service" is the following, and has already performed and tested at a shared hosting service:

  • The third party tool "PhpMyAdmin" is copied into a separate folder on the shared hosting service.
  • The tool is either password protected, or (if folder listing is disabled) located at a hidden url, e.g. a long passphrase added to the usual "phpMyAdmin" address string.
  • MySQL manipulations can be done directly in phpMyAdmin.

Taggings:

Online Fan-Voting Security by Email-based Confirmation

The solution to the problem "Online Fan-Voting" is the following concept, which has already been implemented successfully (on shared hosting environment, with PHP and MySQL): 

  • Every voter is registered with her email address and the relevant voting information in the MySQL database, and a hash value is also generated and saved, which is hashed from the concatenation of a secret passphrase and the entered email address (so the hash-base can not be found out).
  • An email is sent to the voter's email address, with a link to a confirmation page, with the generated hash value as GET parameter
  • When the voter clicks onto the link with the hash parameter, the voting is confirmed in the database, for the entry matching the hash value. Automatically engineering the hash value would not work out for spam bots within reasonable time, since a long hash value is used.
  • From the moment the voting is submitted (still unconfirmed), the voter's email address can no longer be used for votings.
  • Additionally, the voting's datetime is saved, so strange chronological accumulations of votings can be found after the voting period is up.

 

Taggings:

Using PHP-IDS to secure php web pages

The use of PHP-IDS is pretty simple. Not much coding has to be done. The tool is nicely tested and easy to configure. Once you have downloaded PHP-IDS from http://php-ids.org/downloads/ you can start securing your user-input. After including PHP-IDS with

  • require_once 'IDS/Init.php';

you can define which arrays should be checked by the tool

  • $request = array( 'REQUEST' => $_REQUEST, 'GET' => $_GET, 'POST' => $_POST, 'COOKIE' => $_COOKIE);

initialise and run PHP-IDS with your config

  • $init = IDS_Init::init('IDS/Config/Config.ini');$ids = new IDS_Monitor($request, $init);$result = $ids->run();

finally you can look at the $result object to determine the content

  • if (!$result->isEmpty()) { echo $result;}

PHP-IDS is not 100% secure but it provides help to make your web page safer. 

Taggings:

Pages

Subscribe to php